Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How much performance can data augmentation improve? #4

Closed
BLack-yzf opened this issue May 27, 2020 · 8 comments
Closed

How much performance can data augmentation improve? #4

BLack-yzf opened this issue May 27, 2020 · 8 comments

Comments

@BLack-yzf
Copy link

I used the kaldi/egs/dihard_2018/v2 recipe to make front-end processing,and get the MFCC feature for training X-vector model. Then I reproduce the code of training x-vector on pytorch, the x-vector model is totally same as yours. The PLDA and AHC procedure also never change. Then I used the trained x-vector model of pytorch version to test Diarization performance, and get the following resluts.
The first doesn't use the data augmentation, then get DER=27.38% on dihard2_dev.
Another version use the data augmentation(same as the v2 recipe),and only get DER=27.60%.
Theoretically, using data augmentation will improve performance, but the results are terrible. I don't know which part caused the problem,I need help.
Thanks sincerely!

@BLack-yzf
Copy link
Author

@manojpamk

@manojpamk
Copy link
Owner

If I understood your procedure correctly, you have prepared the training data using Kaldi's dihard recipe, and trained xvectors using this repo, right?

The Kaldi repo reports 26.30% DER using supervised calibiration (https://github.com/kaldi-asr/kaldi/blob/master/egs/dihard_2018/v2/run.sh), while Pytorch xvectors returned similar numbers using spectral clustering (https://github.com/manojpamk/pytorch_xvectors/blob/master/README.md). Note that the dihard recipe uses voxceleb corpora for xvector training.

Now as to why the non-augmented model returned similar DER, I am not sure. It is likely that the clean data is large enough to show significant improvements in this task.

Manoj

@BLack-yzf
Copy link
Author

Thanks for your relpy.
Sorry to bother you. I have seen you achieve better results, so I want to consult you on a few quetions.
I didn't use your repo to train xvector model. I have reproduced the x-vector model before. The model structure is same as yours. But the training steps are a bit different from yours. I didn't execute the 'prepare for egs' procedure. Instead, I used the MFCC features obtained from the voxceleb copora to train x-vector model directly. But yours are the 'egs'. I think maybe it causes the difference in performence.

Another question, I have seen some resluts in 'diarize.sh'.(https://github.com/manojpamk/pytorch_xvectors/blob/master/egs/diarize.sh). The results on DIHARD2-dev using plda are worse than the kaldi baseline. Is there any trouble on computing plda score?

Yuan

@manojpamk
Copy link
Owner

Preparing features in the egs format mainly assists training - by ensuring samples have the same duration (i.e number of frames) within a batch. Further, samples in egs files are subset from the utterances themselves, so you can think of them as generating multiple equal-duration examples from the same utterance. Note that both kaldi and this repo perform CMVN and remove non-speech frames before egs file preparation.

All things said, I dont think 27% DER is too bad.

I believe the higher PLDA numbers are due to the AHC threshold not optimized - I currently set it to 0.

@BLack-yzf
Copy link
Author

Thanks a lot. I will add the 'egs' on my experiment.
Hope to ask you more question.
Thanks again!

@BLack-yzf
Copy link
Author

Hi, Manoji
Sorry to bother you. I don't konw how to make evaluation on AMI dataset. Is there any recipe about it?
Thanks.

@BLack-yzf
Copy link
Author

@manojpamk

@manojpamk
Copy link
Owner

Hi Yuan,

Do you already have the AMI corpus downloaded?

  1. For audio, check out the kaldi recipe (https://github.com/kaldi-asr/kaldi/blob/master/egs/ami/s5/run_ihm.sh)
  2. I don't know if the RTTMs are available, but they can be created using the segments and utt2spk files prepared using the kaldi recipe.
  3. To evaluate diarization, use this script (https://github.com/manojpamk/pytorch_xvectors/blob/master/egs/diarize.sh) after setting the wavDir and rttmDir variables appropriately.
  4. To determine the train-dev-eval session splits, check out this paper: https://arxiv.org/pdf/1902.03190.pdf

Manoj

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants