Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Non-matching audios when converting videos from Senselab to HuggingFace #71

Open
wilke0818 opened this issue Jun 17, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@wilke0818
Copy link
Collaborator

Description

Videos in SenselabDatasets that have audio do not convert to the exact same audio in HuggingFace. This likely is caused by a few different factors: the extraction of audio from a video using torchvision does not result in the same audio as using ffmpeg directly, additionally, converting to HuggingFace datasets using their Audio feature uses Soundfile under the hood which also causes additional distortions at points.

Steps to Reproduce

In dataset_test.py, we test a video and its extracted audio and currently check the tensors when converting are close enough to each other (defined here as atol=1e-4), but the issue can be seen by checking if they are equal instead.

Expected Results

We would expect that no matter what library was used to decode the audio from a video that when converting to a HuggingFace dataset and then back to a SenselaDataset should result in the same audio throughout the process since the audio waveform is just a 2D tensor.

Actual Results

The audios when converting to HuggingFace and then back to Senselab for videos diverge with around a maximal absolute difference of 5e-5 though notably not every value diverges. It's possible that how different libraries handle silence, or near silence, cause differences in the encodings.

Additional Notes

Interestingly, this issue has not been seen in the conversion of existing audio files leading me to believe that it occurs as a result of different encodings of the audio from a video (floating point precision and bit depth).

@wilke0818 wilke0818 added the bug Something isn't working label Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant