Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duration is wrong when used m4a audios but works for flac #1121

Open
npovey opened this issue Aug 17, 2023 · 7 comments
Open

duration is wrong when used m4a audios but works for flac #1121

npovey opened this issue Aug 17, 2023 · 7 comments

Comments

@npovey
Copy link

npovey commented Aug 17, 2023

Hi,
I was using this script https://github.com/k2-fsa/text_search/blob/master/examples/libriheavy/run.sh
and the first stage is prepare_manifest.py https://github.com/k2-fsa/text_search/blob/master/examples/libriheavy/prepare_manifest.py
it works well when audio files are ".flac" but when I am using dataset that is m4a audio format it is computing duration wrong.
My understanding is that script using lhotse functions to do that.
Question 1: Why can't I get accurate duration for m4a files?

Question 2: I thought that somehow I will be able to disable this flag FFMPEG_TORCHAUDIO_INFO_ENABLED: bool = True here https://github.com/lhotse-speech/lhotse/blob/master/lhotse/audio.py but couldn't figure it out how. The logic was that may be soxi will do it right.
How can I get correct duration for m4as?

@desh2608
Copy link
Collaborator

Maybe Piotr can answer about the m4a, but to set that flag to false you can do:

from lhotse.audio import set_ffmpeg_torchaudio_info_enabled

set_ffmpeg_torchaudio_info_enabled(false)

@pzelasko
Copy link
Collaborator

pzelasko commented Aug 17, 2023

Can you show an example of what's wrong with the duration, and provide your versions of lhotse and torch/torchaudio? Also can you run and show the output of:

import torchaudio

path = "path/to/problematic_rec.m4a"

for backend in (torchaudio.backend.soundfile_backend, torchaudio.backend.sox_io_backend):
  info = backend.info(path)
  print(info.__dict__)
  sr, audio = backend.load(path)
  print(audio.shape)

Regarding which format to use for distributing data: FLAC is lossless which might be more suitable for speech synthesis / enhancement etc.; for speech or speaker recognition, lossy compression is probably OK. I haven't seen any speech data distributed as m4a, some groups used mp3 in the past, but more recent releases seem to favor OPUS for better compression rate and quality (you can use ffmpeg to convert to/from OPUS).

@npovey
Copy link
Author

npovey commented Aug 17, 2023

my torch and torchaudio versions:
torch: 1.12.1+cu113
torchaudio: 0.12.1+cu113
lhotse: 1.16.0

The output for the above code is:

(venv) np@np-INTEL:/mnt/speech1/anna/text_search/examples/libriheavy$ python test_np.py 
Traceback (most recent call last):
  File "test_np.py", line 6, in <module>
    info = backend.info(path)
  File "/home/np/anna/venv/lib/python3.8/site-packages/torchaudio/backend/soundfile_backend.py", line 103, in info
    sinfo = soundfile.info(filepath)
  File "/home/np/anna/venv/lib/python3.8/site-packages/soundfile.py", line 467, in info
    return _SoundFileInfo(file, verbose)
  File "/home/np/anna/venv/lib/python3.8/site-packages/soundfile.py", line 412, in __init__
    with SoundFile(file) as f:
  File "/home/np/anna/venv/lib/python3.8/site-packages/soundfile.py", line 658, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "/home/np/anna/venv/lib/python3.8/site-packages/soundfile.py", line 1216, in _open
    raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening '/mnt/speech2/max_25_v2/step15/dummy_m4a/audio/0/id_001/audio_001.m4a': File contains data in an unknown format.
(venv) np@np-INTEL:/mnt/speech1/anna/text_search/examples/libriheavy$ 

After converting m4a to flac I am getting this output from code above [duration was correct with flac files]

(venv) np@np-INTEL:/mnt/speech1/anna/text_search/examples/libriheavy$ python test_np.py 
{'sample_rate': 44100, 'num_frames': 2102272, 'num_channels': 2, 'bits_per_sample': 24, 'encoding': 'FLAC'}
Traceback (most recent call last):
  File "test_np.py", line 13, in <module>
    print(audio.shape)
AttributeError: 'int' object has no attribute 'shape'
(venv) np@np-INTEL:/mnt/speech1/anna/text_search/examples/libriheavy$ 

@npovey
Copy link
Author

npovey commented Aug 18, 2023

python3 -c "import soundfile; print(soundfile.version)"
0.12.1
python3 -c "import soundfile; print(soundfile.libsndfile_version)"
1.2.0

@pzelasko
Copy link
Collaborator

pzelasko commented Aug 18, 2023

That last error is a mistake on my side; it should be audio, sr = torchaudio.load(...).

I checked locally with an m4a file: it looks like both sox and libsoundfile do not support it, but ffmpeg does. That means it should work if you update pytorch to version 2.0 (together with torchaudio) and call the script with env variable TORCHAUDIO_USE_BACKEND_DISPATCHER=1 python my_script.py (and in torch 2.1 ffmpeg is going to be default and the env var will no longer be needed, see pytorch/audio#2950).

@npovey
Copy link
Author

npovey commented Aug 22, 2023

Update:
I am getting correct duration for mp3 audio files after using newer torch, it was producing zeroes before torch update:
new torch version:
torch 2.0.1
torchaudio 2.0.2

But m4a files are still producing incorrect duration.
I also tried to change flag as below but it didn't work.

from lhotse.audio import set_ffmpeg_torchaudio_info_enabled
set_ffmpeg_torchaudio_info_enabled(false)

here is my code

if [ $stage -le 1 ] && [ $stop_stage -ge 1 ]; then
  # We will get librilight_raw_cuts_{subset}.jsonl.gz
  # saved in $output_dir/manifests
  log "Stage 1: Prepare LibriLight manifest"
  TORCHAUDIO_USE_BACKEND_DISPATCHER=1 python prepare_manifest_np_m4a.py \
    --corpus-dir $corpus_dir \
    --books-dir $text_dir \
    --output-dir $output_dir/manifests \
    --num-jobs 5
fi

@pzelasko
Copy link
Collaborator

It seems it's an issue with torchaudio + m4a support, I made a PR with a workaround (#1124), please try it out and see if it helps. BTW you might want to post an issue in torchaudio (try torchaudio.info(m4a_path), regardless of version it seems to return wrong results).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants