-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
duration is wrong when used m4a audios but works for flac #1121
Comments
Maybe Piotr can answer about the m4a, but to set that flag to false you can do: from lhotse.audio import set_ffmpeg_torchaudio_info_enabled
set_ffmpeg_torchaudio_info_enabled(false) |
Can you show an example of what's wrong with the duration, and provide your versions of lhotse and torch/torchaudio? Also can you run and show the output of: import torchaudio
path = "path/to/problematic_rec.m4a"
for backend in (torchaudio.backend.soundfile_backend, torchaudio.backend.sox_io_backend):
info = backend.info(path)
print(info.__dict__)
sr, audio = backend.load(path)
print(audio.shape) Regarding which format to use for distributing data: FLAC is lossless which might be more suitable for speech synthesis / enhancement etc.; for speech or speaker recognition, lossy compression is probably OK. I haven't seen any speech data distributed as m4a, some groups used mp3 in the past, but more recent releases seem to favor OPUS for better compression rate and quality (you can use ffmpeg to convert to/from OPUS). |
my torch and torchaudio versions: The output for the above code is:
After converting m4a to flac I am getting this output from code above [duration was correct with flac files]
|
python3 -c "import soundfile; print(soundfile.version)" |
That last error is a mistake on my side; it should be I checked locally with an m4a file: it looks like both sox and libsoundfile do not support it, but ffmpeg does. That means it should work if you update pytorch to version 2.0 (together with torchaudio) and call the script with env variable |
Update: But m4a files are still producing incorrect duration.
here is my code
|
It seems it's an issue with torchaudio + m4a support, I made a PR with a workaround (#1124), please try it out and see if it helps. BTW you might want to post an issue in torchaudio (try |
Hi,
I was using this script https://github.com/k2-fsa/text_search/blob/master/examples/libriheavy/run.sh
and the first stage is prepare_manifest.py https://github.com/k2-fsa/text_search/blob/master/examples/libriheavy/prepare_manifest.py
it works well when audio files are ".flac" but when I am using dataset that is m4a audio format it is computing duration wrong.
My understanding is that script using lhotse functions to do that.
Question 1: Why can't I get accurate duration for m4a files?
Question 2: I thought that somehow I will be able to disable this flag FFMPEG_TORCHAUDIO_INFO_ENABLED: bool = True here https://github.com/lhotse-speech/lhotse/blob/master/lhotse/audio.py but couldn't figure it out how. The logic was that may be soxi will do it right.
How can I get correct duration for m4as?
The text was updated successfully, but these errors were encountered: