Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about performance gap between valid set & test set of VB-DMD dataset #13

Closed
Kuray107 opened this issue Oct 17, 2022 · 4 comments

Comments

@Kuray107
Copy link

First of all, thank you very much for providing the code with such good quality!

I am currently trying to reproduce the result of the model on the VB-DMD dataset, which I download from the link here. The training set I used is the clean & noisy_trainset_28spk_wav, where I split all 468 files from the speaker p286 as my valid set. The command I used for training is as follows:

python train.py --base_dir VB-DMD_dataset/ --accelerator gpu --gpus 2 --batch_size 12 --no_wandb --max_epochs 160

To my surprise, the result I got on my valid set is very poor according to the tensorboard's log: The PESQ score is about 2.2, and the ESTOI value converges 0.82. However, after I test the model on the testing set, the result is much closer to the paper's result: The PESQ score is 2.73 (plus-minus 0.55), and the STOI score is 0.86 (plus-minus 0.10). Now here are my questions:

  1. Do you have any clues about why the model's performance on my valid-set is so bad?
  2. Right now the PESQ score I got on the testing set is not ideal compared with the paper's result (2.73 v.s. 2.93). I know that the batch-size in my current setting is 24 instead of 32. However, do I need to change other hyper-parameters during training as well if I would like to reproduce your result? If so, could you give me a simple command showing how to set them?

Thank you in advance for your time and help!

@julius-richter
Copy link
Contributor

Hey, thanks for your interest!

  1. This probably depends on the choice of speaker(s) in the validation set. We refer to our baseline Diffuse and have chosen speakers p226 and p287 (see here).
  2. No, you do not need to change other hyperparameters for the training. The command is train.py --base_dir /data/VoiceBank/ --batch_size 8 --gpus 4. Hyperparameters such as spec_factor, spec_abs_exponent, sigma_max etc. do not need to be explicitly specified, since the values used in the paper are given as default values.

Hope that helps!

@Kuray107
Copy link
Author

Thanks for the reply! I retrained the model with your instruction but still get a similar result on testing set (PESQ ~ 2.7). The pre-trained checkpoint you provide indeed achieves ~ 2.9 on PESQ score, so I think somehow the default training setting on my side is not optimal. The GPUs I used for training is A40, but it shouldn't make such a huge difference. Do you have any suggestions for me to check something else? And, if it is possible, would you like to re-train the model as well with default setting to confirm it will generate the correct result?

@julius-richter
Copy link
Contributor

I compared the released code with the code we used for the pre-trained model checkpoint, and there was indeed a mismatch on one hyper-parameter. The pre-trained model checkpoint uses centered=True, which should also be the default setting when training SGMSE+. We have updated the code accordingly. Thanks you for bringing this issue to our attention and helping us finding the bug in the code.

We retrained the model with the updated code on VoiceBank-Demand, and the model achieved PESQ: 2.93, ESTOI: 0.86, SI-SDR: 17.4, which is very similar to the values reported in the paper. The small deviation could be due to the stochastic nature of the method and the training procedure.

We encourage you to pull the updated code and start another training. Please let us know if it works properly now.

@Kuray107
Copy link
Author

Kuray107 commented Nov 4, 2022

Hello Julius, thank you for the code update! I've re-run the experiment and this time the evaluation result is good now : ).

@Kuray107 Kuray107 closed this as completed Nov 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants