Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support seamless audio adaptation even if sample presentation timestamps aren't aligned #3971

Closed
ronak2121 opened this issue Mar 9, 2018 · 27 comments

Comments

Projects
None yet
4 participants
@ronak2121
Copy link

commented Mar 9, 2018

Hi,

I’m using Exoplayer 2.5.3 and Im trying to test how Exoplayer adaptive Streaming support works with HLS.

My adaptive manifest has a 44kHz/32 Kbps HE-AAC, a 44/64 HE-AAC, and a 44/128kbps LC-AAC stream (yes it’s audio only). I’ve noticed that the segments line up perfectly for the two HE-AAC streams, but the LC-AAC segments are slightly off by 0.1 second each time.

When I tested this stream in Exoplayer, I saw that it mentioned the adaption would be enabled but not seamless.

I was wondering why that was, why Exoplayer can’t do adaption seamlessly (iOS can) and, most importantly, what kind of audio artifacts can we expect to hear when Exoplayer switches streams in this situation.

Thanks

Ronak

@ojw28

This comment has been minimized.

Copy link
Contributor

commented Mar 9, 2018

Are the sample timestamps actually misaligned between the segments in the two variants, or is it just that the segment boundaries aren't aligned exactly?

If the first of these, I think that's a content issue. If the second of these then I'd expect adaptation to be pretty seamless. Please attach a sample we can test with if so.

@ojw28 ojw28 added the need more info label Mar 9, 2018

@ronak2121

This comment has been minimized.

Copy link
Author

commented Mar 9, 2018

Here's the URL: https://d206c4y6cx10lo.cloudfront.net/adaptive_he_aac.m3u8 the source audio is exactly the same in all cases, just encoded in different qualities/codecs using ffmpeg.

@ojw28 ojw28 changed the title Not seamless Adaptive Streaming HLS audio - Non-seamless adaptive switching Mar 9, 2018

@ojw28

This comment has been minimized.

Copy link
Contributor

commented Mar 9, 2018

The streams aren't all 44100Hz. Looks to me like that lowest two qualities are 22050Hz and only the highest quality stream is 44100Hz. The non-seamless adaptations are adaptations to/from the highest quality stream, since this involves a change in Hz that we're not able to handle in a completely seamless way. If all streams were the same Hz, I think we'd adapt seamlessly.

For DASH/SS where the sample rate of audio streams is properly specified in the manifest, we avoid adaptive switching between tracks whose sample rates are different. In the HLS case it looks like there's no way to specify in the master playlist the sample rate of each variant. Although it may be possible to determine this from the CODECS tag?

@ojw28

This comment has been minimized.

Copy link
Contributor

commented Mar 9, 2018

--> @AquilesCanta to see if there's any way of figuring out the sample rate of variants from the master playlist (we could suggest an enhancement to the HLS spec to allow this, if not?).

@ronak2121

This comment has been minimized.

Copy link
Author

commented Mar 9, 2018

I would beg to differ on that analysis. All streams are 44100Hz if I check the individual MP4s using ffprobe:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'long_output_64_heaac.m4s':
Metadata:
major_brand : iso5
minor_version : 512
compatible_brands: iso6mp41
encoder : Lavf57.82.101
Duration: 01:26:13.82, start: 0.000000, bitrate: 64 kb/s
Stream #0:0(und): Audio: aac (HE-AACv2) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 62 kb/s (default)
Metadata:
handler_name : SoundHandler

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'long_output_32_heaac.m4s':
Metadata:
major_brand : iso5
minor_version : 512
compatible_brands: iso6mp41
encoder : Lavf57.82.101
Duration: 01:26:13.82, start: 0.000000, bitrate: 32 kb/s
Stream #0:0(und): Audio: aac (HE-AACv2) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 30 kb/s (default)
Metadata:
handler_name : SoundHandler

Seems to me like Exoplayer fails to detect the HE-AACv2 profile?

@ojw28

This comment has been minimized.

Copy link
Contributor

commented Mar 9, 2018

Hmm, you're right that there's more going on here. For the streams we think are 22050Hz, it seems we're parsing out two different sample rates from the container. In AtomParsers.parseAudioSampleEntry we parse out a value of 44100, but in AtomParsers.parseAacAudioSpecificConfig we parse out a value of 22050.

It's possible that the content is inconsistent with itself, and that what sample rate a tool ends up deciding the content is depends on which part of the container it chooses to believe. It's also possible that we have a bug in one of the methods mentioned above. We'll need to take a closer look to determine which of these is the case.

@ojw28 ojw28 assigned andrewlewis and unassigned AquilesCanta Mar 9, 2018

@ojw28 ojw28 removed the need more info label Mar 9, 2018

@ronak2121

This comment has been minimized.

Copy link
Author

commented Mar 10, 2018

Sounds good. I’ll also have to bring this up to the ffmpeg group to see what’s going on.

I tried to check this very thing on iOS; and AVURLAsset shows 22kHz but AVAudioFile shows 44100khz.

@andrewlewis andrewlewis assigned ojw28 and unassigned andrewlewis Mar 12, 2018

@ojw28

This comment has been minimized.

Copy link
Contributor

commented Mar 12, 2018

I think all the streams are 44100kHz. There's possibly a bug in whatever you're using to package the content that's causing part of the container to have the sample rate set incorrectly.

I don't think that's the root cause of the issue though. Further down ExoPlayer's pipeline it is seeing all three streams as 44100kHz. Things to look at would be:

  1. Whether the sample presentation timestamps in the files are exactly aligned (e.g. same part of audio given exactly the same sample presentation timestamp).
  2. It looks like samples have different durations in one of the streams than in the other two (~23ms vs ~46ms). This might be a problem, since it means we can't switch seamlessly at all sample boundaries. At some points we'd have to decode a sample and then discard half of it after the decoder to switch completely seamlessly. Is it possible to make it so that all three streams contain the same number of samples, with identical sample presentation timestamps?
@ronak2121

This comment has been minimized.

Copy link
Author

commented Mar 13, 2018

Regarding number 2, are you sure your code is looking at the entire HE-AAC profile, with the SBR information? If that's ignored, then you would see the audio as having half the sampling rate (and thus half the frequency response) and twice the sample duration.

Regarding the content packaging, I'm using the latest release of ffmpeg 3.4.2 and the Fraunhofer codec libraries there.

I tried to peek into the mpeg atoms but I could not find any atom that listed the sampling rate as 22050. Do you know which exact atom you're looking at?

@ojw28

This comment has been minimized.

Copy link
Contributor

commented Mar 13, 2018

Regarding number 2, are you sure your code is looking at the entire HE-AAC profile, with the SBR information? If that's ignored, then you would see the audio as having half the sampling rate (and thus half the frequency response) and half the sample duration.

I'm just looking at the sample timestamps in the FMP4 container. These are independent to whatever the codec is.

@ronak2121 ronak2121 changed the title HLS audio - Non-seamless adaptive switching HLS audio w/mixed audio codecs LC/HE-AAC - Non-seamless adaptive switching Mar 13, 2018

@ronak2121

This comment has been minimized.

Copy link
Author

commented Mar 21, 2018

Do you mean the time stamps in the sidx segment? They are going to be different because of he aac having a different compression scheme. This is what I meant by the segments do not line up exactly between the two codecs.

@ojw28

This comment has been minimized.

Copy link
Contributor

commented Mar 25, 2018

I was looking at the actual sample presentation timestamps, which I think are stored somewhere under the moof rather than in the sidx. That aside, I think we know what the problem is, which is that ExoPlayer assumes it can switch variant at a sample boundary, but this is not true here because the sample boundaries are not aligned between variants. To fix this we'd probably need to trim a sample after decoding to remove the overlap. Marking this as an enhancement, but it'll likely be considered low priority since we've not seen any other reports where content has been prepared in this way.

As a side note, it doesn't seem optimal to have segments with slightly different durations in the different representations, as is the case here. If a player is playing the first segment of the variant whose first segment is 10.0078s and wants to adaptively switch to the variant whose first segment is 10.03102s long, then it needs to download the first segment of the new variant just to play the final 0.02s of it. Alternatively it needs to download the second segment of the old variant just to play the first 0.02s of it. Switching cost is significantly reduced if segment boundaries are properly aligned and if the EXT-X-INDEPENDENT-SEGMENTS tag is used. Doing this should also avoid the glitch that's being tracked here.

@ojw28 ojw28 added the enhancement label Mar 25, 2018

@ojw28 ojw28 changed the title HLS audio w/mixed audio codecs LC/HE-AAC - Non-seamless adaptive switching Support seamless audio adaptation even if sample presentation timestamps aren't aligned Mar 25, 2018

@ronak2121

This comment has been minimized.

Copy link
Author

commented Mar 26, 2018

I see. Have you seen content prepared that mix audio codecs? That's essentially what we're doing here. We would like to use HE-AAC where possible at the lower end of the spectrum, and LC-AAC for the higher ones.

@ojw28

This comment has been minimized.

Copy link
Contributor

commented Apr 2, 2018

I'm not sure, to be honest. All I can say is we've never done anything "special" to handle that use case, so either it "just works" for mixes being used for other content, or other content does not mix.

@ronak2121

This comment has been minimized.

Copy link
Author

commented May 30, 2018

I realized the problem was in ffmpeg. Fragmenting Audio should always be done in exact frame boundaries otherwise ffmpeg produces this weird behavior.

I’ll put up a test stream and verify that we can switch codecs between heaac and lc aac to be sure before closing this.

@ojw28

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2018

Thanks for the update! Closing for now, but if you discover there's still an issue please respond here and we'll re-open it.

@ojw28 ojw28 closed this Jun 6, 2018

@ronak2121

This comment has been minimized.

Copy link
Author

commented Jun 11, 2018

So I tested this again and we’re still seeing Not Seamless Adaptive Streaming. The fragment sizes are all the same at exactly 42 aac frames regardless of codec.

The sample rates and channel counts are also the same.

I’ll dig into the moof header in more detail soon.

@ronak2121

This comment has been minimized.

Copy link
Author

commented Jun 21, 2018

So I debugged this some more.

In the HE-AACv2 fMP4 files, I'm seeing a 44100 sampling rate inside the moov atom (moov -> trak -> mdia -> mdhd atom & moov -> trak -> mdia -> minf -> stbl -> stsd -> mp4a atom), and 44100 inside of the timescale in the sidx atoms.

I see the defaultSampleDurations in the two moof atoms are 2048 for HE-AACv2 and 1024 for LC-AAC. I'll find out more about why this is from the ffmpeg mailing list.

In the meantime, can you please help figure out why Exoplayer doesn't report the true sampling rate? Which atom are you looking at that gave you 22050?

@ojw28

This comment has been minimized.

Copy link
Contributor

commented Jun 22, 2018

Your question is already answered in #3971 (comment), isn't it?

@ronak2121

This comment has been minimized.

Copy link
Author

commented Jun 22, 2018

I’d have to pull up the debugger and see which atom. Was hoping you already knew.

@ronak2121

This comment has been minimized.

Copy link
Author

commented Jun 29, 2018

Hi,

So I made a new stream that now ensures the sample rates are always 44100, with the proper HE-AAC codec information and the proper defaultSampleDuration in the moof boxes.

However, Exoplayer still shows YES_NOT_SEAMLESS. Would you be able to help figure out why?

https://d1v9in513d8d86.cloudfront.net/adaptive-tests/master.m3u8

Ronak

@ojw28

This comment has been minimized.

Copy link
Contributor

commented Jul 2, 2018

That doesn't really mean anything. The audio renderer always reports that (we should probably change it).

@ronak2121

This comment has been minimized.

Copy link
Author

commented Jul 2, 2018

Can you please change it? If you are sure adaption would be seamless now, then can we make this ticket be the one that fixes the sampling rate and codec reporting as well as adaptive mode?

@ronak2121

This comment has been minimized.

Copy link
Author

commented Jul 12, 2018

Should I open a new issue for this? Please let me know.

@ojw28

This comment has been minimized.

Copy link
Contributor

commented Jul 13, 2018

Looking at this again, I think returning ADAPTIVE_NOT_SEAMLESS is correct. RendererCapabilities.supportsFormat is documented to say that the adaptive part of the returned value indicates:

The level of support for adapting from the format to another format of the same mime type

Whether an audio adaptation will be completely seamless or not doesn't depend just on the mime type, but also on whether the sample rate or channel count changes. If the sample rate or channel count changes, adaptation will not be completely seamless. So returning ADAPTIVE_NOT_SEAMLESS is the right thing to do.

It's unclear why this is causing an issue for you. DefaultTrackSelector treats ADAPTIVE_NOT_SEAMLESS in the same way as ADAPTIVE_SEAMLESS by default, and there's no particular reason for you to deviate from that behavior.

@ronak2121

This comment has been minimized.

Copy link
Author

commented Jul 13, 2018

Hey thanks for the reply.

Can you test to see why this particular stream is still not seamless adaption? It should have been, since I can confirm the defaultSampleDuration and the codec information is being read properly.

@ojw28

This comment has been minimized.

Copy link
Contributor

commented Jul 13, 2018

We don't really have any more time to spend on this, given it's (so far) specific to you and your content, and also quite difficult to debug. I'm not aware of anyone else who has this issue.

@google google locked and limited conversation to collaborators Nov 23, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
You can’t perform that action at this time.