Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Not for merge] Add Bengaliai speech #1202

Closed
wants to merge 4 commits into from

Conversation

yfyeung
Copy link
Collaborator

@yfyeung yfyeung commented Aug 7, 2023

Kaggle Competition: https://www.kaggle.com/competitions/bengaliai-speech/overview
The goal of this competition is to recognize Bengali speech from out-of-distribution audio recordings. You will build a model trained on the first Massively Crowdsourced (MaCro) Bengali speech dataset with 1,200 hours of data from ~24,000 people from India and Bangladesh. The test set contains samples from 17 different domains that are not present in training

@yfyeung yfyeung changed the title Add Bengaliai speech [Not for merge] Add Bengaliai speech Aug 7, 2023
@npovey
Copy link

npovey commented Aug 15, 2023

hi,
I tried this code but the duration in manifests for all audios are zero.

...."duration": 0.0,...

not sure if the output below is the cause:

2023-08-14 23:47:41,483 INFO [audio.py:137] The user overrided the global setting for whether to use ffmpeg-torchaudio to compute the duration of audio files. Old setting: True. New setting: False.

PS: added to stage 0 the line below

  unzip bengaliai-speech.zip -d download/bengaliai_speech

@yfyeung
Copy link
Collaborator Author

yfyeung commented Aug 15, 2023

hi, I tried this code but the duration in manifests for all audios are zero.

...."duration": 0.0,...

not sure if the output below is the cause:

2023-08-14 23:47:41,483 INFO [audio.py:137] The user overrided the global setting for whether to use ffmpeg-torchaudio to compute the duration of audio files. Old setting: True. New setting: False.

PS: added to stage 0 the line below

  unzip bengaliai-speech.zip -d download/bengaliai_speech

Using ffmpeg-torchaudio to compute the duration of .mp3 files leads to %CPU more than 100%, see issue: lhotse-speech/lhotse#1026
So we disable ffmpeg-torchaudio when the type of audio files is mp3.
I don't think this is the reason causing the duration in manifests for all audios are zero.

@danpovey
Copy link
Collaborator

@yfyeung so do you have a theory why the durations might be zero?

@npovey
Copy link

npovey commented Aug 18, 2023

The problem is fixed.
Actually, I was having a problem on my text_search project. The m4a duration was calculated wrong and @pzelasko advised to have pytorch 2.0 or above see here:lhotse-speech/lhotse#1121
After running

pip3 install torch torchvision torchaudio

This fixed this problem.

I had:
torch: 1.12.1+cu113
torchaudio: 0.12.1+cu113
lhotse: 1.16.0

Now I have:
python -c "import torch; print(torch.version)"
2.0.1+cu117
python -c "import torchaudio; print(torchaudio.version)"
2.0.2+cu117

@yfyeung yfyeung closed this Jan 23, 2024
@yfyeung yfyeung deleted the bengaliai_speech branch January 23, 2024 05:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants