Skip to content

Conversation

Dan-Flores
Copy link
Contributor

This PR avoids the error message in #843 when usinf FFmpeg4 by setting the missing channel_layout field avFrame. FFmpeg documentation indicates that this field may be 0 if it is unknown or unspecified, so it is set to the default layout for the number of channels present using av_get_default_channel_layout.

The test was updated to compare the decoded frames to the checked in samples from sine_mono_s16.wav. These checked in samples were generated using torchcodec with FFmpeg6.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 2, 2025
@Dan-Flores Dan-Flores marked this pull request as ready for review September 2, 2025 15:00
srcNumChannels,
". If you are hitting this, it may be because you are using "
"a buggy FFmpeg version. FFmpeg4 is known to fail here in some "
"valid scenarios. Try to upgrade FFmpeg?");
Copy link
Contributor

@NicolasHug NicolasHug Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still have this check, but perhaps relax it if getNumChannels(srcAVFrame) returns 0 while getNumChannels(streamInfo.codecContext) returns something < 1 (which is what is happening in #843

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added back the check, and excluded the case where getNumChannels(srcAVFrame) returns 0 while getNumChannels(streamInfo.codecContext) returns a valid fallback.

// to allow successful initialization of SwrContext
if (numChannels == 0 && avFrame->channels > 0) {
avFrame->channel_layout = av_get_default_channel_layout(avFrame->channels);
return avFrame->channels;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just do this instead, so we can have a single return statement

Suggested change
return avFrame->channels;
numChannels = avFrame->channels;

int srcNumChannelsFromCodec = getNumChannels(streamInfo.codecContext);
// Use number of channels from codec if 0 returned from frame
if (srcNumChannels == 0 && srcNumChannelsFromCodec > 0) {
srcNumChannels = srcNumChannelsFromCodec;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we still need this if branch, now that we are returning the hopefully-correct avFrame->channels instead of the incorrect numChannels in getNumChannels().

Can you double check if that's still needed? Perhaps we can leave the entire TORCH_CHECK() the way it was!

Comment on lines 1695 to 1697
test_frames = decoder.get_samples_played_in_range()
assert test_frames.data.shape[0] == decoder.metadata.num_channels
assert test_frames.sample_rate == decoder.metadata.sample_rate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the term "samples" instead of frame for the stuff that comes out of the decoder. For the asset, the use of

reference_frames = asset.get_frame_data_by_range

is still correct since this is really a frame-based API.

start=0, stop=1, stream_index=0
)
torch.testing.assert_close(
test_frames.data[0], reference_frames, atol=0, rtol=0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised by the use of [0] here - can you help me understand why that's needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decoded audio has the shape [1, 64000], where the first dimension is the number of channels (1, in the case of SINE_MONO_S16), and the second dimension is the samples.

The checked in tensor is returned by get_frame_data_by_index in utils.py, which returns only the samples at index=1 for the one stream_index.

So to make the comparison, the test must use only the samples from the decoded audio at [0].

Copy link
Contributor

@NicolasHug NicolasHug Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I think eventually we may want to update get_frame_data_by_index to always return 2D audio samples, but that's not urgent. Using [0] like yo did is fine.

Copy link
Contributor

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NICE.

Thank you @Dan-Flores for the fix, and for finding a cleaner fix than my original hacky workaround!

I'm labeling this as "bug" so that we can list it in the bug-fix section of our release notes. It might be worth updating the PR title to something more "user-facing", like "Fix audio decoding bug with FFmpeg4" or something like that.

Comment on lines 65 to 67
// Handle FFmpeg 4 bug where channel_layout is 0 or unset
// Set it to the default layout for the number of channels
// to allow successful initialization of SwrContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I think the code has slightly changed since the comment was written, so this edit may be closer to what's happening (feel free to edit!)

Suggested change
// Handle FFmpeg 4 bug where channel_layout is 0 or unset
// Set it to the default layout for the number of channels
// to allow successful initialization of SwrContext
// Handle FFmpeg 4 bug where channel_layout and numChannels are 0 or unset
// Set them based on avFrame->channels which appears to be correct
// to allow successful initialization of SwrContext

@NicolasHug NicolasHug added the bug Something isn't working label Sep 3, 2025
@Dan-Flores Dan-Flores changed the title Set default channel_layout in avFrame Fix audio decoding bug with FFmpeg4 by setting channel_layout Sep 3, 2025
@Dan-Flores Dan-Flores merged commit 1a88391 into meta-pytorch:main Sep 3, 2025
46 of 50 checks passed
@Dan-Flores Dan-Flores deleted the fix_ffmpeg4_bug branch September 3, 2025 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants