Question about performance gap between valid set & test set of VB-DMD dataset #13

Kuray107 · 2022-10-17T16:13:22Z

First of all, thank you very much for providing the code with such good quality!

I am currently trying to reproduce the result of the model on the VB-DMD dataset, which I download from the link here. The training set I used is the clean & noisy_trainset_28spk_wav, where I split all 468 files from the speaker p286 as my valid set. The command I used for training is as follows:

python train.py --base_dir VB-DMD_dataset/ --accelerator gpu --gpus 2 --batch_size 12 --no_wandb --max_epochs 160

To my surprise, the result I got on my valid set is very poor according to the tensorboard's log: The PESQ score is about 2.2, and the ESTOI value converges 0.82. However, after I test the model on the testing set, the result is much closer to the paper's result: The PESQ score is 2.73 (plus-minus 0.55), and the STOI score is 0.86 (plus-minus 0.10). Now here are my questions:

Do you have any clues about why the model's performance on my valid-set is so bad?
Right now the PESQ score I got on the testing set is not ideal compared with the paper's result (2.73 v.s. 2.93). I know that the batch-size in my current setting is 24 instead of 32. However, do I need to change other hyper-parameters during training as well if I would like to reproduce your result? If so, could you give me a simple command showing how to set them?

Thank you in advance for your time and help!

The text was updated successfully, but these errors were encountered:

julius-richter · 2022-10-19T19:08:35Z

Hey, thanks for your interest!

This probably depends on the choice of speaker(s) in the validation set. We refer to our baseline Diffuse and have chosen speakers p226 and p287 (see here).
No, you do not need to change other hyperparameters for the training. The command is train.py --base_dir /data/VoiceBank/ --batch_size 8 --gpus 4. Hyperparameters such as spec_factor, spec_abs_exponent, sigma_max etc. do not need to be explicitly specified, since the values used in the paper are given as default values.

Hope that helps!

Kuray107 · 2022-10-24T02:34:15Z

Thanks for the reply! I retrained the model with your instruction but still get a similar result on testing set (PESQ ~ 2.7). The pre-trained checkpoint you provide indeed achieves ~ 2.9 on PESQ score, so I think somehow the default training setting on my side is not optimal. The GPUs I used for training is A40, but it shouldn't make such a huge difference. Do you have any suggestions for me to check something else? And, if it is possible, would you like to re-train the model as well with default setting to confirm it will generate the correct result?

julius-richter · 2022-11-01T09:18:50Z

I compared the released code with the code we used for the pre-trained model checkpoint, and there was indeed a mismatch on one hyper-parameter. The pre-trained model checkpoint uses centered=True, which should also be the default setting when training SGMSE+. We have updated the code accordingly. Thanks you for bringing this issue to our attention and helping us finding the bug in the code.

We retrained the model with the updated code on VoiceBank-Demand, and the model achieved PESQ: 2.93, ESTOI: 0.86, SI-SDR: 17.4, which is very similar to the values reported in the paper. The small deviation could be due to the stochastic nature of the method and the training procedure.

We encourage you to pull the updated code and start another training. Please let us know if it works properly now.

Kuray107 · 2022-11-04T20:00:41Z

Hello Julius, thank you for the code update! I've re-run the experiment and this time the evaluation result is good now : ).

KeiKinn mentioned this issue Nov 2, 2022

Questions on the evaluation on the VB-DMD dataset #15

Closed

Kuray107 closed this as completed Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about performance gap between valid set & test set of VB-DMD dataset #13

Question about performance gap between valid set & test set of VB-DMD dataset #13

Kuray107 commented Oct 17, 2022

julius-richter commented Oct 19, 2022

Kuray107 commented Oct 24, 2022

julius-richter commented Nov 1, 2022

Kuray107 commented Nov 4, 2022

Question about performance gap between valid set & test set of VB-DMD dataset #13

Question about performance gap between valid set & test set of VB-DMD dataset #13

Comments

Kuray107 commented Oct 17, 2022

julius-richter commented Oct 19, 2022

Kuray107 commented Oct 24, 2022

julius-richter commented Nov 1, 2022

Kuray107 commented Nov 4, 2022