Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Abnormal separated wavs #250

Closed
staplesinLA opened this issue Aug 31, 2020 · 11 comments
Closed

Abnormal separated wavs #250

staplesinLA opened this issue Aug 31, 2020 · 11 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@staplesinLA
Copy link

Hi everyone, thanks first for the remarkable program, it's great! Thanks for your efforts.
(1)When I listen to the generated the audios, it's messy. I think soundfile.write directly writes the data as float32, after I change it as sf.write('1.wav', estimate.astype(np.int16), 16000). , it gets back to normal.

(2)Another question is, I found that the longer I trained, the worse listening quality I got. It's wired, because the performance curves of training loss and development loss are all good.
I listen to the audios, and find that: the separated audios during first 20 epochs are good, towarding a good direction. After that, the amplitude of speech changes dramatically which often generates a swath. I also find it in the training set, again, it's abnormal that the performance loss is optimized well at the same time.

Can anybody help with it? I will check it deeper though. Thanks!!!

Environment

  • Asteroid Version: 0.3.0
  • PyTorch 1.6
  • Recipe: LibriMix, 2mix, Task: sep_noisy
@staplesinLA staplesinLA added bug Something isn't working help wanted Extra attention is needed labels Aug 31, 2020
@jonashaag
Copy link
Collaborator

What model are you training and what are the hyper params? Can you upload a few sound samples?

@mpariente
Copy link
Collaborator

My guess is that the output amplitude is unconstrained and goes out of the -1/+1 range. Wav files are clipped above those values (and soundfile doesn't correct that, rightfully IMO). So you should rescale your audio outputs as done is eval.py for example: non-intrusive rescaling to match the amplitude of the mixture.

What model are you training and what are the hyper params? Can you upload a few sound samples?

These infos would also help indeed.

@staplesinLA
Copy link
Author

What model are you training and what are the hyper params? Can you upload a few sound samples?

Thanks for helping!! I use Conv-tasnet on 16K data.
I test a audio sampled from the training set, trying to see the quantization, and it's shown as below:
image

The source data are all compressed to -1~1, and the estimation seems to prefer int16 values. I don't know why, maybe it's because of the SI-SNR loss?

@mpariente
Copy link
Collaborator

and the estimation seems to prefer int16 values

The amplitude is unconstrained, this is a flaw of the SI-SNR loss. The values are still float32 though.
See the above comment to solve this issue.

Also, @jonashaag probably meant to upload "sound samples" that we can listen to 😉

@mpariente
Copy link
Collaborator

By the way, are you integrating the audio samples into tensorboard?

@staplesinLA
Copy link
Author

staplesinLA commented Aug 31, 2020

By the way, are you integrating the audio samples into tensorboard?

No, I open it in CoolEdit.
I follow your suggestion, and manually divide the outputs by 32768 to compress it into -1~1. Then I use Soundfile to write it, it gets back to normal.
So, it's my wrong to convert it to int16 at the beginning, right? Though it's listened very well at the first 20 epochs.

@staplesinLA
Copy link
Author

Thank you so much !!! @mpariente @jonashaag
I close it for now, and try to do a complete review.

@mpariente
Copy link
Collaborator

I don't think it learns this scale. The scale will be different for each training.

@staplesinLA
Copy link
Author

staplesinLA commented Aug 31, 2020

I don't think it learns this scale. The scale will be different for each training.

yes, I rephrase it to avoid misleading. So it's better to do normalization before generating waveforms.

@mpariente
Copy link
Collaborator

Have a look at the eval.py file to see how we do it.
We normalize the estimates to have the same amplitude as the mixture. It's not the best non-intrusive guess we can do but that's better than -1/1 normalization.

@staplesinLA
Copy link
Author

@mpariente Oh thanks, I used the former scripts, I found it in current version. Looks like I should pay more attention to the updates. Thanks again!!

mpariente added a commit that referenced this issue Sep 25, 2020
mpariente added a commit that referenced this issue Sep 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants