Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you get the initial mean SDR on LibriSpeech using Google's test list? #20

Open
weedwind opened this issue Sep 29, 2019 · 8 comments

Comments

@weedwind
Copy link

Hi, seungwonpark,

I was trying to use Google's posted test list for LibriSpeech to reproduce their results. But I can not even get their initial mean SDR (10.1 dB in their paper). I got only 1.5 dB. I am wondering have you tried their list and got around 10.1 dB for mean SDR before applying voice filter?

Thank you so much.

@seungwonpark
Copy link
Contributor

Hi, @weedwind
Thank you for letting me know! Yes, I was aware of that test list, but haven’t tried to measure the actual performance with that.

Considering the followings, I think the experimental result (1.5dB, which turned out to be far worser than Google’s) is not really wrong:

  • my d-vector system shows larger EER than Google’s (due to lack of training time + data)
  • I didn’t use the correct loss function: see Need to try power-law compression loss #14 (this might be the main cause of the blurry spectrogram mask in high-frequency, according to my personal talk with Quan Wang at InterSpeech 2019)

Shall we leave this issue open, since this is somewhat critical issue? Thanks a lot!

@seungwonpark
Copy link
Contributor

TL; DR: (to the title of this issue)
No, I haven’t tried yet but I don’t think I can.

@weedwind
Copy link
Author

Hi, @seungwonpark

Thank you for your reply. I mean the SDR before applying the voice filter, not after. In Table 4 of their paper, this is the mean SDR in the first row, which is 10.1 dB. But I only got 1.5 dB. I used the same bss_eval python function as you did, just feed the function with the clean target utterance and the mixed utterance to compute the SDR before applying the voice filter. Do you have a clue why this SDR is so low?

@seungwonpark
Copy link
Contributor

Oh, looks like I had misunderstood your question. Sorry for that.
10.1dB is relatively high SDR for the mixed audios to have. The authors of VoiceFilter mentioned that the SDR before VoiceFilter got high due to silent part of utterances being sampled and mixed. (Note that fixed length of audio segments are sampled here)
But I’m not sure why you’re not getting 10.1dB. Perhaps we should review the preprocessing part and the SDR calculation code in bss_eval.

@weedwind
Copy link
Author

I noticed that your code used the first 3 sec and threw away the rest. I did not use fixed length. I used the entire length of the target clean signal, and truncate or zero pad the interference signal to the same length. Then I computed the SDR. Did you ever compute the mean SDR for your test set?

@weedwind
Copy link
Author

Hi, I read your generator again. In your code, both w1 and w2 need to be at least 3 sec long. Then, you take the first 3 sec from them and add. So the resulting target utterance is fully interfered by the other utterance. Since they have the same volume, the SDR should be nearly 0 dB in this case. Why did you get a median SDR of 1.9 dB?

@seungwonpark
Copy link
Contributor

Did you ever compute the mean SDR for your test set?

Not yet.

Why did you get a median SDR of 1.9 dB?

Actually the value 1.9dB was not calculated from all datasets -- it was from a single dataset. I should fix the table in README accordingly.

@matnshrn
Copy link

@weedwind I'm getting the same results as you (1.5dB SDR over the google LibriSpeech test list), have you managed to solve this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants