Bugs in generating wsj0-2mix dataset #164

hangtingchen · 2020-07-09T08:40:41Z

🐛 Bug

Should use mv1 instead of mv2 to get wav files

To Reproduce

https://github.com/mpariente/asteroid/blob/0bdec2644f2d770d037ce804b7f70cb98bd5c9fa/egs/wsj0-mix/DeepClustering/local/convert_sphere2wav.sh#L31

The line uses both mv2 and mv1 to get wav files. But the mv1 will be covered by mv2, resulting in the generated wav files being from mv2. The mv1 is noise-free while the mv2 is noisy. The wsj0-mix dataset is expected to use mv1.

The correct code is
wav=`echo "$line" | sed "s:wv1:wav:g" | awk -v dir=$wav_dir -F'/' '{printf("%s/%s/%s/%s", dir, $(NF-2), $(NF-1), $NF)}'`

Expected behavior

We tested the datasets generated by mv1 and mv2. It is observed that the former can reproduce the results, the latter is worse around 1-2 dB in SI-SNR.

Our results with mv1, the final validation loss was about 2950.

I am sorry that our results with mv2 were deleted, its final validation loss was about 3500.

Environment

Asteroid-master
PyTorch 1.4.0
PyTorchLightning 7.6.1

The text was updated successfully, but these errors were encountered:

mpariente · 2020-07-09T11:53:03Z

I think this is right and has been reported elsewhere as well.
I think I only had wv1 when I generated wsj0 in the first place so I didn't notice the problem.
I checked now and you're right.

Would you like to submit a PR for that please?
Thanks again !

By the way, I'm very happy you can finally reproduce the results.

mpariente · 2020-07-10T07:23:30Z

Closed by #166

hangtingchen added bug Something isn't working help wanted Extra attention is needed labels Jul 9, 2020

mpariente mentioned this issue Jul 9, 2020

fix bugs in generating wsj0-mix dataset with wv1 && loading model acc… #166

Merged

mpariente closed this as completed Jul 10, 2020

mpariente mentioned this issue Aug 7, 2020

WHAM ConvTasnet recipe 4.5dB worse than reported numbers #188

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugs in generating wsj0-2mix dataset #164

Bugs in generating wsj0-2mix dataset #164

hangtingchen commented Jul 9, 2020

mpariente commented Jul 9, 2020 •

edited

mpariente commented Jul 10, 2020

Bugs in generating wsj0-2mix dataset #164

Bugs in generating wsj0-2mix dataset #164

Comments

hangtingchen commented Jul 9, 2020

🐛 Bug

To Reproduce

Expected behavior

Environment

mpariente commented Jul 9, 2020 • edited

mpariente commented Jul 10, 2020

mpariente commented Jul 9, 2020 •

edited