-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TsExtractor bug #278
Comments
It is probably caused by TsExtractor that reads the HTTP data very fast and holds the samples in memory. |
@mine260309 read HTTP data very fast? why other ts stream normal? |
Depend on the ts stream (and codec info), TsExtract may discard samples or hold the samples to be rendered. Your case could be different, but you may analyze the heap usage to find which object consume too much memory. |
I would agree with @mine260309 that it is most likely caused by TsExtractor reading too many samples "ahead" in the TS segment. Occurs especially at high bitrates. There is also an issue where there are too many Samples being pooled (in the SamplePool) - this is marked as "We need to fix this" in the sources. |
@JohanLindquist agree. I trace the GC, found alway allocate |
any easy fix for this? |
Also, the TsExtractor$Sample will use a lot of byte arrays, which, if pooled correctly, can reduce GC. This is especially noticeable when the bitrate (and thus) samples are larger - the samples are all expanded at bitrate switching time. Not sure what the ideal situation here is as pooling too many bytes will have a big impact on memory footprint. But, considering they are being used at quite a fast rate, maybe that is ok? |
Sample objects are already pooled. The ideal solution would probably be to keep data in TsPacket sized buffers right up until the samples are actually read from the extractor. Rather than copying sample data out of TsPacket buffers into intermediate Sample objects, we'd instead only parse out the information we need, such as where the sample data boundaries are within these buffers. We'd then copy directly from TsPacket buffers into SampleHolders on demand. This is a pretty big change, and may be pretty complicated to pull off. But it would solve the issue. |
@ojw28 do you have any approx time when this bug will be fixed? |
FWIW it's only "almost unusable" for lower powered devices running pre-L versions of Android, combined with high bitrate streams. L does much better due to Art being the default VM. I plan to start looking at it this week. How soon the fix will arrive will depend on how difficult it is to fix. |
yep, we have 4.4 devices and for testing purpose now have switched from Dalvik to Art and result is much better. Thanks. |
Yep, I use Nexus 5 Android 5.0 test, it work well. Waiting for Dalvik work well. |
I started looking at this today. I've made good progress, and hope to have something ready by the end of the week. |
Happy to hear that :) thanks |
I'm not quite sure how good performance will end up being. It should be noted that HLS, specifically its use of MPEG-TS as a container format, is horrendously inefficient, both for bandwidth and in terms of computational cost for the client. If you have a 1MB keyframe in your stream, it ends up being fragmented into over 5000 pieces in MPEG-TS, which the client then has to merge back together again before feeding the frame to the decoder. You should really consider switching to DASH or SmoothStreaming using the FMP4 container format if you have any control at all over what format you're using, where a 1MB keyframe will be in exactly one piece, and can simply be copied directly into the decoder. |
@ojw28 thanks for advice, we will check if its possible to switch to DASH, the problem is that iOS supports only HLS. will wait for you fix and check performance. |
This is the start of a sequence of changes to fix the ref'd github issue. Currently TsExtractor involves multiple memory copy steps: DataSource->Ts_BitArray->Pes_BitArray->Sample->SampleHolder This is inefficient, but more importantly, the copy into Sample is problematic, because Samples are of dynamically varying size. The way we end up expanding Sample objects to be large enough to hold the data being written means that we end up gradually expanding all Sample objects in the pool (which wastes memory), and that we generate a lot of GC churn, particularly when switching to a higher quality which can trigger all Sample objects to expand. The fix will be to reduce the copy steps to: DataSource->TsPacket->SampleHolder We will track Pes and Sample data with lists of pointers into TsPackets, rather than actually copying the data. We will recycle these pointers. The following steps are approximately how the refactor will progress: 1. Start reducing use of BitArray. It's going to be way too complicated to track bit-granularity offsets into multiple packets, and allow reading across packet boundaries. In practice reads from Ts packets are all byte aligned except for small sections, so we'll move over to using ParsableByteArray instead, so we only need to track byte offsets. 2. Move TsExtractor to use ParsableByteArray except for small sections where we really need bit-granularity offsets. 3. Do the actual optimization. Issue: #278
- TsExtractor is now based on ParsableByteArray rather than BitArray. This makes is much clearer that, for the most part, data is byte aligned. It will allow us to optimize TsExtractor without worrying about arbitrary bit offsets. - BitArray is renamed ParsableBitArray for consistency, and is now exclusively for bit-stream level reading. - There are some temporary methods in ParsableByteArray that should be cleared up once the optimizations are in place. Issue: #278
1. AdtsReader would previously copy all data through an intermediate adtsBuffer. This change eliminates the additional copy step, and instead copies directly into Sample objects. 2. PesReader would previously accumulate a whole packet by copying multiple TS packets into an intermediate buffer. This change eliminates this copy step. After the change, TS packet buffers are propagated directly to PesPayloadReaders, which are required to handle partial payload data correctly. The copy steps in the extractor are simplified from: DataSource->Ts_BitArray->Pes_BitArray->Sample->SampleHolder To: DataSource->Ts_BitArray->Sample->SampleHolder Issue: #278
- Remove TsExtractor's knowledge of Sample. - Push handling of Sample objects into SampleQueue as much as possible. This is a precursor to replacing Sample objects with a different type of backing memory. Ideally, the individual readers shouldn't know how the sample data is stored. This is true after this CL, with the except of the TODO in H264Reader. - Avoid double-scanning every H264 sample for NAL units, by moving the scan for SEI units from SeiReader into H264Reader. Issue: #278
Use of Sample objects was inefficient for several reasons: - Lots of objects (1 per sample, obviously). - When switching up bitrates, there was a tendency for all Sample instances to need to expand, which effectively led to our whole media buffer being GC'd as each Sample discarded its byte[] to obtain a larger one. - When a keyframe was encountered, the Sample would typically need to expand to accommodate it. Over time, this would lead to a gradual increase in the population of Samples that were sized to accommodate keyframes. These Sample instances were then typically underutilized whenever recycled to hold a non-keyframe, leading to inefficient memory usage. This CL introduces RollingBuffer, which tightly packs pending sample data into a byte[]s obtained from an underlying BufferPool. Which fixes all of the above. There is still an issue where the total memory allocation may grow when switching up bitrate, but we can easily fix that from this point, if we choose to restrict the buffer based on allocation size rather than time. Issue: #278
Hi. Could you test this again using the latest dev branch? Thanks! |
Note: If you were previously using the dev-hls branch, you should now be using dev (dev-hls has been merged to dev). |
Tested for a little bit and as it seems issue is fixed! |
I think this is the limit of how far we should be pushing complexity v.s. efficiency. It's a little complicated to understand, but probably worth it since the H264 bitstream is the majority of the data. Issue: #278
This prevents excessive memory consumption when switching to very high bitrate streams. Issue: #278
Device android 4.4, Test , about 3~4 second, the logcat print many below line, frequently gc, and result in the UI get stuck,(you can add progressbar on the top of surfaceview)
if the link no valid for u, you can download from dropbox, and use local http server test.
The text was updated successfully, but these errors were encountered: