lhotse had problems preparing musan data and cannot produce supervisions data #37

wangkaisine · 2021-09-06T20:13:14Z

lhotse prepare musan download/musan data/manifests
WARNING:root:There are 15 recordings that do not have any corresponding supervisions in the SupervisionSet.

In data/manifests floder, there is only supervisions_music.json but no supervisions_noise.json，supervisions_speech.json.

pkufool · 2021-09-06T23:06:26Z

lhotse prepare musan download/musan data/manifests
WARNING:root:There are 15 recordings that do not have any corresponding supervisions in the SupervisionSet.

It's ok, just ignore this warning.

In data/manifests floder, there is only supervisions_music.json but no supervisions_noise.json，supervisions_speech.json.

There is no supervisions for noise & speech in musan dataset. Don't warry, it doesn't matter.

csukuangfj · 2021-09-06T23:07:26Z

In data/manifests floder, there is only supervisions_music.json but no supervisions_noise.json，supervisions_speech.json.

I think that is the expected behavior as the code in lhotse does not produce supervisions for noise and speech.
See the code
https://github.com/lhotse-speech/lhotse/blob/master/lhotse/recipes/musan.py#L70-L78

    if 'music' in parts:
        manifests['music'] = prepare_music(corpus_dir, use_vocals=use_vocals)
        validate_recordings_and_supervisions(**manifests['music'])
    if 'speech' in parts:
        manifests['speech'] = {'recordings': scan_recordings(corpus_dir / 'speech')}
        validate(manifests['speech']['recordings'])
    if 'noise' in parts:
        manifests['noise'] = {'recordings': scan_recordings(corpus_dir / 'noise')}
        validate(manifests['noise']['recordings'])

Some directories in noise and speech don't have ANNOTATIONS. Maybe @pzelasko has more to say
why not to generate supervisions_*.json for speech and noise.

(py38) fangjun:/ceph-fj/open-source/icefall4/egs/librispeech/ASR/download/musan$ find . -name ANNOTATIONS
./noise/free-sound/ANNOTATIONS
./music/jamendo/ANNOTATIONS
./music/fma-western-art/ANNOTATIONS
./music/fma/ANNOTATIONS
./music/rfm/ANNOTATIONS
./music/hd-classical/ANNOTATIONS
./speech/librivox/ANNOTATIONS
(py38) fangjun:/ceph-fj/open-source/icefall4/egs/librispeech/ASR/download/musan$ ls noise/
README  free-sound  sound-bible
(py38) fangjun:/ceph-fj/open-source/icefall4/egs/librispeech/ASR/download/musan$ ls speech/
README  librivox  us-gov
(py38) fangjun:/ceph-fj/open-source/icefall4/egs/librispeech/ASR/download/musan$ ls music/
README  fma  fma-western-art  hd-classical  jamendo  rfm

pzelasko · 2021-09-07T07:48:16Z

That’s right, IIRC there was no supervision data for noise and speech.

danpovey mentioned this issue Nov 27, 2021

Decoding error 'Fsa' object doesn't support assignment. #133

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lhotse had problems preparing musan data and cannot produce supervisions data #37

lhotse had problems preparing musan data and cannot produce supervisions data #37

wangkaisine commented Sep 6, 2021

pkufool commented Sep 6, 2021

csukuangfj commented Sep 6, 2021

pzelasko commented Sep 7, 2021

lhotse had problems preparing musan data and cannot produce supervisions data #37

lhotse had problems preparing musan data and cannot produce supervisions data #37

Comments

wangkaisine commented Sep 6, 2021

pkufool commented Sep 6, 2021

csukuangfj commented Sep 6, 2021

pzelasko commented Sep 7, 2021