duration is wrong when used m4a audios but works for flac #1121

npovey · 2023-08-17T05:39:18Z

Hi,
I was using this script https://github.com/k2-fsa/text_search/blob/master/examples/libriheavy/run.sh
and the first stage is prepare_manifest.py https://github.com/k2-fsa/text_search/blob/master/examples/libriheavy/prepare_manifest.py
it works well when audio files are ".flac" but when I am using dataset that is m4a audio format it is computing duration wrong.
My understanding is that script using lhotse functions to do that.
Question 1: Why can't I get accurate duration for m4a files?

Question 2: I thought that somehow I will be able to disable this flag FFMPEG_TORCHAUDIO_INFO_ENABLED: bool = True here https://github.com/lhotse-speech/lhotse/blob/master/lhotse/audio.py but couldn't figure it out how. The logic was that may be soxi will do it right.
How can I get correct duration for m4as?

desh2608 · 2023-08-17T12:57:56Z

Maybe Piotr can answer about the m4a, but to set that flag to false you can do:

from lhotse.audio import set_ffmpeg_torchaudio_info_enabled

set_ffmpeg_torchaudio_info_enabled(false)

pzelasko · 2023-08-17T13:13:44Z

Can you show an example of what's wrong with the duration, and provide your versions of lhotse and torch/torchaudio? Also can you run and show the output of:

import torchaudio

path = "path/to/problematic_rec.m4a"

for backend in (torchaudio.backend.soundfile_backend, torchaudio.backend.sox_io_backend):
  info = backend.info(path)
  print(info.__dict__)
  sr, audio = backend.load(path)
  print(audio.shape)

Regarding which format to use for distributing data: FLAC is lossless which might be more suitable for speech synthesis / enhancement etc.; for speech or speaker recognition, lossy compression is probably OK. I haven't seen any speech data distributed as m4a, some groups used mp3 in the past, but more recent releases seem to favor OPUS for better compression rate and quality (you can use ffmpeg to convert to/from OPUS).

npovey · 2023-08-17T20:08:34Z

my torch and torchaudio versions:
torch: 1.12.1+cu113
torchaudio: 0.12.1+cu113
lhotse: 1.16.0

The output for the above code is:

(venv) np@np-INTEL:/mnt/speech1/anna/text_search/examples/libriheavy$ python test_np.py 
Traceback (most recent call last):
  File "test_np.py", line 6, in <module>
    info = backend.info(path)
  File "/home/np/anna/venv/lib/python3.8/site-packages/torchaudio/backend/soundfile_backend.py", line 103, in info
    sinfo = soundfile.info(filepath)
  File "/home/np/anna/venv/lib/python3.8/site-packages/soundfile.py", line 467, in info
    return _SoundFileInfo(file, verbose)
  File "/home/np/anna/venv/lib/python3.8/site-packages/soundfile.py", line 412, in __init__
    with SoundFile(file) as f:
  File "/home/np/anna/venv/lib/python3.8/site-packages/soundfile.py", line 658, in __init__
    self._file = self._open(file, mode_int, closefd)
  File "/home/np/anna/venv/lib/python3.8/site-packages/soundfile.py", line 1216, in _open
    raise LibsndfileError(err, prefix="Error opening {0!r}: ".format(self.name))
soundfile.LibsndfileError: Error opening '/mnt/speech2/max_25_v2/step15/dummy_m4a/audio/0/id_001/audio_001.m4a': File contains data in an unknown format.
(venv) np@np-INTEL:/mnt/speech1/anna/text_search/examples/libriheavy$

After converting m4a to flac I am getting this output from code above [duration was correct with flac files]

(venv) np@np-INTEL:/mnt/speech1/anna/text_search/examples/libriheavy$ python test_np.py 
{'sample_rate': 44100, 'num_frames': 2102272, 'num_channels': 2, 'bits_per_sample': 24, 'encoding': 'FLAC'}
Traceback (most recent call last):
  File "test_np.py", line 13, in <module>
    print(audio.shape)
AttributeError: 'int' object has no attribute 'shape'
(venv) np@np-INTEL:/mnt/speech1/anna/text_search/examples/libriheavy$

npovey · 2023-08-18T01:38:33Z

python3 -c "import soundfile; print(soundfile.version)"
0.12.1
python3 -c "import soundfile; print(soundfile.libsndfile_version)"
1.2.0

pzelasko · 2023-08-18T01:40:22Z

That last error is a mistake on my side; it should be audio, sr = torchaudio.load(...).

I checked locally with an m4a file: it looks like both sox and libsoundfile do not support it, but ffmpeg does. That means it should work if you update pytorch to version 2.0 (together with torchaudio) and call the script with env variable TORCHAUDIO_USE_BACKEND_DISPATCHER=1 python my_script.py (and in torch 2.1 ffmpeg is going to be default and the env var will no longer be needed, see pytorch/audio#2950).

npovey · 2023-08-22T07:59:19Z

Update:
I am getting correct duration for mp3 audio files after using newer torch, it was producing zeroes before torch update:
new torch version:
torch 2.0.1
torchaudio 2.0.2

But m4a files are still producing incorrect duration.
I also tried to change flag as below but it didn't work.

from lhotse.audio import set_ffmpeg_torchaudio_info_enabled
set_ffmpeg_torchaudio_info_enabled(false)

here is my code

if [ $stage -le 1 ] && [ $stop_stage -ge 1 ]; then
  # We will get librilight_raw_cuts_{subset}.jsonl.gz
  # saved in $output_dir/manifests
  log "Stage 1: Prepare LibriLight manifest"
  TORCHAUDIO_USE_BACKEND_DISPATCHER=1 python prepare_manifest_np_m4a.py \
    --corpus-dir $corpus_dir \
    --books-dir $text_dir \
    --output-dir $output_dir/manifests \
    --num-jobs 5
fi

pzelasko · 2023-08-22T12:12:34Z

It seems it's an issue with torchaudio + m4a support, I made a PR with a workaround (#1124), please try it out and see if it helps. BTW you might want to post an issue in torchaudio (try torchaudio.info(m4a_path), regardless of version it seems to return wrong results).

npovey mentioned this issue Aug 18, 2023

[Not for merge] Add Bengaliai speech k2-fsa/icefall#1202

Closed

npovey mentioned this issue Aug 24, 2023

duration is wrong for m4a audios but works for flac and mp3 files pytorch/audio#3573

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

duration is wrong when used m4a audios but works for flac #1121

duration is wrong when used m4a audios but works for flac #1121

npovey commented Aug 17, 2023 •

edited

Loading

desh2608 commented Aug 17, 2023

pzelasko commented Aug 17, 2023 •

edited

Loading

npovey commented Aug 17, 2023

npovey commented Aug 18, 2023

pzelasko commented Aug 18, 2023 •

edited

Loading

npovey commented Aug 22, 2023 •

edited

Loading

pzelasko commented Aug 22, 2023

duration is wrong when used m4a audios but works for flac #1121

duration is wrong when used m4a audios but works for flac #1121

Comments

npovey commented Aug 17, 2023 • edited Loading

desh2608 commented Aug 17, 2023

pzelasko commented Aug 17, 2023 • edited Loading

npovey commented Aug 17, 2023

npovey commented Aug 18, 2023

pzelasko commented Aug 18, 2023 • edited Loading

npovey commented Aug 22, 2023 • edited Loading

pzelasko commented Aug 22, 2023

npovey commented Aug 17, 2023 •

edited

Loading

pzelasko commented Aug 17, 2023 •

edited

Loading

pzelasko commented Aug 18, 2023 •

edited

Loading

npovey commented Aug 22, 2023 •

edited

Loading