New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abnormal separated wavs #250
Comments
What model are you training and what are the hyper params? Can you upload a few sound samples? |
My guess is that the output amplitude is unconstrained and goes out of the -1/+1 range. Wav files are clipped above those values (and soundfile doesn't correct that, rightfully IMO). So you should rescale your audio outputs as done is
These infos would also help indeed. |
The amplitude is unconstrained, this is a flaw of the SI-SNR loss. The values are still float32 though. Also, @jonashaag probably meant to upload "sound samples" that we can listen to 😉 |
By the way, are you integrating the audio samples into tensorboard? |
No, I open it in CoolEdit. |
Thank you so much !!! @mpariente @jonashaag |
I don't think it learns this scale. The scale will be different for each training. |
yes, I rephrase it to avoid misleading. So it's better to do normalization before generating waveforms. |
Have a look at the eval.py file to see how we do it. |
@mpariente Oh thanks, I used the former scripts, I found it in current version. Looks like I should pay more attention to the updates. Thanks again!! |
Hi everyone, thanks first for the remarkable program, it's great! Thanks for your efforts.
(1)When I listen to the generated the audios, it's messy. I think soundfile.write directly writes the data as float32, after I change it as
sf.write('1.wav', estimate.astype(np.int16), 16000).
, it gets back to normal.(2)Another question is, I found that the longer I trained, the worse listening quality I got. It's wired, because the performance curves of training loss and development loss are all good.
I listen to the audios, and find that: the separated audios during first 20 epochs are good, towarding a good direction. After that, the amplitude of speech changes dramatically which often generates a swath. I also find it in the training set, again, it's abnormal that the performance loss is optimized well at the same time.
Can anybody help with it? I will check it deeper though. Thanks!!!
Environment
The text was updated successfully, but these errors were encountered: