Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TsExtractor bug #278

Closed
crossle opened this issue Feb 4, 2015 · 19 comments
Closed

TsExtractor bug #278

crossle opened this issue Feb 4, 2015 · 19 comments
Assignees
Labels

Comments

@crossle
Copy link

crossle commented Feb 4, 2015

Device android 4.4, Test , about 3~4 second, the logcat print many below line, frequently gc, and result in the UI get stuck,(you can add progressbar on the top of surfaceview)

 D/dalvikvm( 5562): GC_FOR_ALLOC freed 4962K, 19% free 81591K/100428K, paused 74ms, total 74ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 4971K, 19% free 81592K/100428K, paused 73ms, total 73ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 4979K, 19% free 81593K/100428K, paused 72ms, total 72ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 4988K, 19% free 81594K/100428K, paused 58ms, total 58ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 4997K, 19% free 81596K/100428K, paused 56ms, total 56ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5006K, 19% free 81597K/100428K, paused 58ms, total 58ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5015K, 19% free 81598K/100428K, paused 61ms, total 61ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5023K, 19% free 81599K/100428K, paused 67ms, total 67ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5032K, 19% free 81601K/100428K, paused 62ms, total 62ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5041K, 19% free 81602K/100428K, paused 61ms, total 61ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5051K, 19% free 81604K/100428K, paused 62ms, total 62ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5059K, 19% free 81605K/100428K, paused 59ms, total 59ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5068K, 19% free 81606K/100428K, paused 61ms, total 61ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5076K, 19% free 81607K/100428K, paused 63ms, total 63ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5085K, 19% free 81608K/100428K, paused 59ms, total 59ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5094K, 19% free 81609K/100428K, paused 57ms, total 57ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5103K, 19% free 81611K/100428K, paused 58ms, total 58ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5111K, 19% free 81612K/100428K, paused 62ms, total 62ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5120K, 19% free 81613K/100428K, paused 59ms, total 59ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5129K, 19% free 81615K/100428K, paused 59ms, total 59ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5139K, 19% free 81616K/100428K, paused 71ms, total 71ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5147K, 19% free 81617K/100428K, paused 61ms, total 61ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5155K, 19% free 81619K/100428K, paused 58ms, total 58ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5164K, 19% free 81620K/100428K, paused 58ms, total 58ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5173K, 19% free 81621K/100428K, paused 57ms, total 57ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5182K, 19% free 81622K/100428K, paused 57ms, total 57ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5191K, 19% free 81624K/100428K, paused 55ms, total 55ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5200K, 19% free 81625K/100428K, paused 57ms, total 57ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5208K, 19% free 81626K/100428K, paused 59ms, total 59ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5217K, 19% free 81627K/100428K, paused 58ms, total 58ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5226K, 19% free 81629K/100428K, paused 58ms, total 58ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5235K, 19% free 81630K/100428K, paused 56ms, total 56ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5244K, 19% free 81631K/100428K, paused 56ms, total 56ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5252K, 19% free 81632K/100428K, paused 62ms, total 62ms
D/dalvikvm( 5562): GC_FOR_ALLOC freed 5261K, 19% free 81634K/100428K, paused 72ms, total 72ms

if the link no valid for u, you can download from dropbox, and use local http server test.

@mine260309
Copy link
Contributor

It is probably caused by TsExtractor that reads the HTTP data very fast and holds the samples in memory.
You can control the behavior by modify the SampleSource who uses TxExtractor and keep the available samples in a threshold, that limits the memory usage of TsExtractor.

@crossle
Copy link
Author

crossle commented Feb 4, 2015

@mine260309 read HTTP data very fast? why other ts stream normal?

@mine260309
Copy link
Contributor

Depend on the ts stream (and codec info), TsExtract may discard samples or hold the samples to be rendered.
In my experience, if a chunk of HLS is large (e.g. 100MiB), TsExtract will read as many samples as HTTP downloads, and thus consume too much memory. But in my case there's no such GC logs.

Your case could be different, but you may analyze the heap usage to find which object consume too much memory.

@nahojkap
Copy link

nahojkap commented Feb 4, 2015

I would agree with @mine260309 that it is most likely caused by TsExtractor reading too many samples "ahead" in the TS segment. Occurs especially at high bitrates.

There is also an issue where there are too many Samples being pooled (in the SamplePool) - this is marked as "We need to fix this" in the sources.

@crossle
Copy link
Author

crossle commented Feb 5, 2015

@JohanLindquist agree. I trace the GC, found alway allocate Sample.expand, result in GC.

@Guro
Copy link

Guro commented Feb 6, 2015

any easy fix for this?
i have same problem

@nahojkap
Copy link

nahojkap commented Feb 9, 2015

Also, the TsExtractor$Sample will use a lot of byte arrays, which, if pooled correctly, can reduce GC.

This is especially noticeable when the bitrate (and thus) samples are larger - the samples are all expanded at bitrate switching time.

Not sure what the ideal situation here is as pooling too many bytes will have a big impact on memory footprint. But, considering they are being used at quite a fast rate, maybe that is ok?

@ojw28
Copy link
Contributor

ojw28 commented Feb 9, 2015

Sample objects are already pooled. The ideal solution would probably be to keep data in TsPacket sized buffers right up until the samples are actually read from the extractor. Rather than copying sample data out of TsPacket buffers into intermediate Sample objects, we'd instead only parse out the information we need, such as where the sample data boundaries are within these buffers. We'd then copy directly from TsPacket buffers into SampleHolders on demand.

This is a pretty big change, and may be pretty complicated to pull off. But it would solve the issue.

@Guro
Copy link

Guro commented Feb 9, 2015

@ojw28
Currently because of this bug Exo Player for HLS is almost unusable, especially when bitrate is high.

do you have any approx time when this bug will be fixed?
or we should try it doing ourselves?

@ojw28
Copy link
Contributor

ojw28 commented Feb 9, 2015

FWIW it's only "almost unusable" for lower powered devices running pre-L versions of Android, combined with high bitrate streams. L does much better due to Art being the default VM. I plan to start looking at it this week. How soon the fix will arrive will depend on how difficult it is to fix.

@Guro
Copy link

Guro commented Feb 9, 2015

yep, we have 4.4 devices and for testing purpose now have switched from Dalvik to Art and result is much better.
but as you know there are lot of devices with Dalvik, please give us updates when you will look at the issue and will know approximate time of fixing.

Thanks.

@crossle
Copy link
Author

crossle commented Feb 10, 2015

Yep, I use Nexus 5 Android 5.0 test, it work well. Waiting for Dalvik work well.

@ojw28 ojw28 self-assigned this Feb 10, 2015
@ojw28
Copy link
Contributor

ojw28 commented Feb 10, 2015

I started looking at this today. I've made good progress, and hope to have something ready by the end of the week.

@Guro
Copy link

Guro commented Feb 10, 2015

Happy to hear that :) thanks

@ojw28
Copy link
Contributor

ojw28 commented Feb 10, 2015

I'm not quite sure how good performance will end up being. It should be noted that HLS, specifically its use of MPEG-TS as a container format, is horrendously inefficient, both for bandwidth and in terms of computational cost for the client. If you have a 1MB keyframe in your stream, it ends up being fragmented into over 5000 pieces in MPEG-TS, which the client then has to merge back together again before feeding the frame to the decoder.

You should really consider switching to DASH or SmoothStreaming using the FMP4 container format if you have any control at all over what format you're using, where a 1MB keyframe will be in exactly one piece, and can simply be copied directly into the decoder.

@Guro
Copy link

Guro commented Feb 11, 2015

@ojw28 thanks for advice, we will check if its possible to switch to DASH, the problem is that iOS supports only HLS.

will wait for you fix and check performance.

ojw28 added a commit that referenced this issue Feb 13, 2015
This is the start of a sequence of changes to fix the ref'd
github issue. Currently TsExtractor involves multiple memory
copy steps:

DataSource->Ts_BitArray->Pes_BitArray->Sample->SampleHolder

This is inefficient, but more importantly, the copy into
Sample is problematic, because Samples are of dynamically
varying size. The way we end up expanding Sample objects to
be large enough to hold the data being written means that we
end up gradually expanding all Sample objects in the pool
(which wastes memory), and that we generate a lot of GC churn,
particularly when switching to a higher quality which can
trigger all Sample objects to expand.

The fix will be to reduce the copy steps to:

DataSource->TsPacket->SampleHolder

We will track Pes and Sample data with lists of pointers into
TsPackets, rather than actually copying the data. We will
recycle these pointers.

The following steps are approximately how the refactor will
progress:

1. Start reducing use of BitArray. It's going to be way too
complicated to track bit-granularity offsets into multiple packets,
and allow reading across packet boundaries. In practice reads
from Ts packets are all byte aligned except for small sections,
so we'll move over to using ParsableByteArray instead, so we
only need to track byte offsets.

2. Move TsExtractor to use ParsableByteArray except for small
sections where we really need bit-granularity offsets.

3. Do the actual optimization.

Issue: #278
ojw28 added a commit that referenced this issue Feb 13, 2015
- TsExtractor is now based on ParsableByteArray rather than BitArray.
  This makes is much clearer that, for the most part, data is byte
  aligned. It will allow us to optimize TsExtractor without worrying
  about arbitrary bit offsets.
- BitArray is renamed ParsableBitArray for consistency, and is now
  exclusively for bit-stream level reading.
- There are some temporary methods in ParsableByteArray that should be
  cleared up once the optimizations are in place.

Issue: #278
ojw28 added a commit that referenced this issue Feb 13, 2015
1. AdtsReader would previously copy all data through an intermediate
adtsBuffer. This change eliminates the additional copy step, and
instead copies directly into Sample objects.

2. PesReader would previously accumulate a whole packet by copying
multiple TS packets into an intermediate buffer. This change
eliminates this copy step. After the change, TS packet buffers
are propagated directly to PesPayloadReaders, which are required
to handle partial payload data correctly. The copy steps in the
extractor are simplified from:

DataSource->Ts_BitArray->Pes_BitArray->Sample->SampleHolder

To:

DataSource->Ts_BitArray->Sample->SampleHolder

Issue: #278
ojw28 added a commit that referenced this issue Feb 13, 2015
- Remove TsExtractor's knowledge of Sample.
- Push handling of Sample objects into SampleQueue as much
  as possible. This is a precursor to replacing Sample objects
  with a different type of backing memory. Ideally, the
  individual readers shouldn't know how the sample data is
  stored. This is true after this CL, with the except of the
  TODO in H264Reader.
- Avoid double-scanning every H264 sample for NAL units, by
  moving the scan for SEI units from SeiReader into H264Reader.

Issue: #278
ojw28 added a commit that referenced this issue Feb 13, 2015
Use of Sample objects was inefficient for several reasons:

- Lots of objects (1 per sample, obviously).
- When switching up bitrates, there was a tendency for all Sample
  instances to need to expand, which effectively led to our whole
  media buffer being GC'd as each Sample discarded its byte[] to
  obtain a larger one.
- When a keyframe was encountered, the Sample would typically need
  to expand to accommodate it. Over time, this would lead to a
  gradual increase in the population of Samples that were sized to
  accommodate keyframes. These Sample instances were then typically
  underutilized whenever recycled to hold a non-keyframe, leading
  to inefficient memory usage.

This CL introduces RollingBuffer, which tightly packs pending sample
data into a byte[]s obtained from an underlying BufferPool. Which
fixes all of the above. There is still an issue where the total
memory allocation may grow when switching up bitrate, but we can
easily fix that from this point, if we choose to restrict the buffer
based on allocation size rather than time.

Issue: #278
@ojw28
Copy link
Contributor

ojw28 commented Feb 13, 2015

Hi. Could you test this again using the latest dev branch? Thanks!

@ojw28
Copy link
Contributor

ojw28 commented Feb 13, 2015

Note: If you were previously using the dev-hls branch, you should now be using dev (dev-hls has been merged to dev).

@Guro
Copy link

Guro commented Feb 13, 2015

Tested for a little bit and as it seems issue is fixed!
Big Thanks!

ojw28 added a commit that referenced this issue Feb 13, 2015
I think this is the limit of how far we should be pushing complexity
v.s. efficiency. It's a little complicated to understand, but probably
worth it since the H264 bitstream is the majority of the data.

Issue: #278
ojw28 added a commit that referenced this issue Feb 13, 2015
This prevents excessive memory consumption when switching to
very high bitrate streams.

Issue: #278
@ojw28 ojw28 closed this as completed Feb 13, 2015
@google google locked and limited conversation to collaborators Jun 28, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants