Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

symphonia-check fails with 'Test interrupted by error: malformed stream: mp3: invalid main_data_begin' #68

Closed
Shnatsel opened this issue Nov 25, 2021 · 7 comments

Comments

@Shnatsel
Copy link
Contributor

Running symphonia-check results in an immediate error on roughly 40% of MP3 files from the MySpace Dragon Hoard dataset:

$ symphonia-check '1/std_cd687347995a4b159c43783fe74f46e2.mp3'
Input Path: 1/std_cd687347995a4b159c43783fe74f46e2.mp3

Test interrupted by error: malformed stream: mp3: invalid main_data_begin

However, both ffplay and symphonia-play seem to play the file without issue.

Tested on latest master, commit a723c15

P.S. I have only run check on about a third of the hoard, but so far the highest deviation reported with -q flag was 0.00001782, so the decoder seems to be in a pretty good shape overall.

@Shnatsel
Copy link
Contributor Author

A sample file triggering the issue: std_cd687347995a4b159c43783fe74f46e2.mp3.zip

pdeljanov added a commit that referenced this issue Nov 25, 2021
Since a decode error will create a discontinuity in the output
produced by Symphonia (a packet is effectively being dropped),
it is expected that this will create subsequent false failures if
the reference decoder also does not drop this packet since the
pcm output will no longer be aligned.

For codecs that don't specify how to handle corrupted or malformed
streams, this doesn't necessarily constitute an error.

Addresses #68.
@pdeljanov
Copy link
Owner

I added an option to symphonia-check to keep going after a decode error (--keep-going).

In a player application, continuing after a decode error is the correct action to take, however, be warned that symphonia-check will likely produce a slew of failed samples in these instances. This is because a decode error effectively drops the malformed packet from the output. If ffmpeg chooses to output silence, or do something different, then the PCM output will no longer be "aligned" and most every sample after the decode error will fail. If the PCM output were to be re-aligned, then the two decoders would match again.

This is the case with the file you provided. Unfortunately, for the error encountered in this file, the MP3 standard doesn't specify what the behaviour should be.

Not really sure if it's desirable to copy ffmpeg's behaviour in this case... 🤔

@Shnatsel
Copy link
Contributor Author

Got it. So this is working as intended, then.

I'll make a full check pass on the Hoard without --keep-going to avoid false positives and report results.

@Shnatsel
Copy link
Contributor Author

The highest deviation on the entirety of the Dragon Hoard was 0.00001782, so we're probably good as far as that dataset is concerned.

@pdeljanov
Copy link
Owner

Thanks for testing. That's an insane collection!

I studied how ffmpeg handles these files and I think it makes a lot of sense. I implemented a similar strategy and I think it will fix the vast majority of that 40% you mentioned. The changes are in the mp3-partial-decode branch. I'm not certain I necessarily want to switch to this just yet, but some data would help me make a more informed decision.

Do you mind giving this a go?

@Shnatsel
Copy link
Contributor Author

Sure, I'll leave a run overnight. The full run takes 3 hours on my CPU. Hmm, I'll build with target-cpu=native and perhaps that way it'll be faster.

@Shnatsel
Copy link
Contributor Author

Yep, at a glance it appears to be fixed. Doesn't report errors anymore, and I'm not seeing any dramatic mismatches in the first 1000 files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants