New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about performance gap between valid set & test set of VB-DMD dataset #13
Comments
Hey, thanks for your interest!
Hope that helps! |
Thanks for the reply! I retrained the model with your instruction but still get a similar result on testing set (PESQ ~ 2.7). The pre-trained checkpoint you provide indeed achieves ~ 2.9 on PESQ score, so I think somehow the default training setting on my side is not optimal. The GPUs I used for training is A40, but it shouldn't make such a huge difference. Do you have any suggestions for me to check something else? And, if it is possible, would you like to re-train the model as well with default setting to confirm it will generate the correct result? |
I compared the released code with the code we used for the pre-trained model checkpoint, and there was indeed a mismatch on one hyper-parameter. The pre-trained model checkpoint uses We retrained the model with the updated code on VoiceBank-Demand, and the model achieved PESQ: 2.93, ESTOI: 0.86, SI-SDR: 17.4, which is very similar to the values reported in the paper. The small deviation could be due to the stochastic nature of the method and the training procedure. We encourage you to pull the updated code and start another training. Please let us know if it works properly now. |
Hello Julius, thank you for the code update! I've re-run the experiment and this time the evaluation result is good now : ). |
First of all, thank you very much for providing the code with such good quality!
I am currently trying to reproduce the result of the model on the VB-DMD dataset, which I download from the link here. The training set I used is the clean & noisy_trainset_28spk_wav, where I split all 468 files from the speaker p286 as my valid set. The command I used for training is as follows:
python train.py --base_dir VB-DMD_dataset/ --accelerator gpu --gpus 2 --batch_size 12 --no_wandb --max_epochs 160
To my surprise, the result I got on my valid set is very poor according to the tensorboard's log: The PESQ score is about 2.2, and the ESTOI value converges 0.82. However, after I test the model on the testing set, the result is much closer to the paper's result: The PESQ score is 2.73 (plus-minus 0.55), and the STOI score is 0.86 (plus-minus 0.10). Now here are my questions:
Thank you in advance for your time and help!
The text was updated successfully, but these errors were encountered: