-
Notifications
You must be signed in to change notification settings - Fork 64
Fix audio decoding bug with FFmpeg4 by setting channel_layout #865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
srcNumChannels, | ||
". If you are hitting this, it may be because you are using " | ||
"a buggy FFmpeg version. FFmpeg4 is known to fail here in some " | ||
"valid scenarios. Try to upgrade FFmpeg?"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should still have this check, but perhaps relax it if getNumChannels(srcAVFrame)
returns 0 while getNumChannels(streamInfo.codecContext)
returns something < 1 (which is what is happening in #843
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added back the check, and excluded the case where getNumChannels(srcAVFrame)
returns 0 while getNumChannels(streamInfo.codecContext)
returns a valid fallback.
// to allow successful initialization of SwrContext | ||
if (numChannels == 0 && avFrame->channels > 0) { | ||
avFrame->channel_layout = av_get_default_channel_layout(avFrame->channels); | ||
return avFrame->channels; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just do this instead, so we can have a single return
statement
return avFrame->channels; | |
numChannels = avFrame->channels; |
int srcNumChannelsFromCodec = getNumChannels(streamInfo.codecContext); | ||
// Use number of channels from codec if 0 returned from frame | ||
if (srcNumChannels == 0 && srcNumChannelsFromCodec > 0) { | ||
srcNumChannels = srcNumChannelsFromCodec; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we still need this if
branch, now that we are returning the hopefully-correct avFrame->channels
instead of the incorrect numChannels
in getNumChannels()
.
Can you double check if that's still needed? Perhaps we can leave the entire TORCH_CHECK()
the way it was!
test/test_decoders.py
Outdated
test_frames = decoder.get_samples_played_in_range() | ||
assert test_frames.data.shape[0] == decoder.metadata.num_channels | ||
assert test_frames.sample_rate == decoder.metadata.sample_rate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the term "samples" instead of frame for the stuff that comes out of the decoder. For the asset, the use of
reference_frames = asset.get_frame_data_by_range
is still correct since this is really a frame-based API.
test/test_decoders.py
Outdated
start=0, stop=1, stream_index=0 | ||
) | ||
torch.testing.assert_close( | ||
test_frames.data[0], reference_frames, atol=0, rtol=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised by the use of [0]
here - can you help me understand why that's needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decoded audio has the shape [1, 64000]
, where the first dimension is the number of channels (1, in the case of SINE_MONO_S16
), and the second dimension is the samples.
The checked in tensor is returned by get_frame_data_by_index
in utils.py
, which returns only the samples at index=1
for the one stream_index
.
So to make the comparison, the test must use only the samples from the decoded audio at [0]
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I think eventually we may want to update get_frame_data_by_index
to always return 2D audio samples, but that's not urgent. Using [0]
like yo did is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NICE.
Thank you @Dan-Flores for the fix, and for finding a cleaner fix than my original hacky workaround!
I'm labeling this as "bug" so that we can list it in the bug-fix section of our release notes. It might be worth updating the PR title to something more "user-facing", like "Fix audio decoding bug with FFmpeg4" or something like that.
// Handle FFmpeg 4 bug where channel_layout is 0 or unset | ||
// Set it to the default layout for the number of channels | ||
// to allow successful initialization of SwrContext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, I think the code has slightly changed since the comment was written, so this edit may be closer to what's happening (feel free to edit!)
// Handle FFmpeg 4 bug where channel_layout is 0 or unset | |
// Set it to the default layout for the number of channels | |
// to allow successful initialization of SwrContext | |
// Handle FFmpeg 4 bug where channel_layout and numChannels are 0 or unset | |
// Set them based on avFrame->channels which appears to be correct | |
// to allow successful initialization of SwrContext |
This PR avoids the error message in #843 when usinf FFmpeg4 by setting the missing
channel_layout
field avFrame. FFmpeg documentation indicates that this field may be 0 if it is unknown or unspecified, so it is set to the default layout for the number of channels present usingav_get_default_channel_layout
.The test was updated to compare the decoded frames to the checked in samples from
sine_mono_s16.wav
. These checked in samples were generated using torchcodec with FFmpeg6.