Abnormal separated wavs #250

staplesinLA · 2020-08-31T09:34:48Z

Hi everyone, thanks first for the remarkable program, it's great! Thanks for your efforts.
(1)When I listen to the generated the audios, it's messy. I think soundfile.write directly writes the data as float32, after I change it as sf.write('1.wav', estimate.astype(np.int16), 16000). , it gets back to normal.

(2)Another question is, I found that the longer I trained, the worse listening quality I got. It's wired, because the performance curves of training loss and development loss are all good.
I listen to the audios, and find that: the separated audios during first 20 epochs are good, towarding a good direction. After that, the amplitude of speech changes dramatically which often generates a swath. I also find it in the training set, again, it's abnormal that the performance loss is optimized well at the same time.

Can anybody help with it? I will check it deeper though. Thanks!!!

Environment

Asteroid Version: 0.3.0
PyTorch 1.6
Recipe: LibriMix, 2mix, Task: sep_noisy

The text was updated successfully, but these errors were encountered:

jonashaag · 2020-08-31T12:07:05Z

What model are you training and what are the hyper params? Can you upload a few sound samples?

mpariente · 2020-08-31T12:54:48Z

My guess is that the output amplitude is unconstrained and goes out of the -1/+1 range. Wav files are clipped above those values (and soundfile doesn't correct that, rightfully IMO). So you should rescale your audio outputs as done is eval.py for example: non-intrusive rescaling to match the amplitude of the mixture.

What model are you training and what are the hyper params? Can you upload a few sound samples?

These infos would also help indeed.

staplesinLA · 2020-08-31T12:56:14Z

What model are you training and what are the hyper params? Can you upload a few sound samples?

Thanks for helping!! I use Conv-tasnet on 16K data.
I test a audio sampled from the training set, trying to see the quantization, and it's shown as below:

The source data are all compressed to -1~1, and the estimation seems to prefer int16 values. I don't know why, maybe it's because of the SI-SNR loss?

mpariente · 2020-08-31T13:01:30Z

and the estimation seems to prefer int16 values

The amplitude is unconstrained, this is a flaw of the SI-SNR loss. The values are still float32 though.
See the above comment to solve this issue.

Also, @jonashaag probably meant to upload "sound samples" that we can listen to 😉

mpariente · 2020-08-31T13:02:03Z

By the way, are you integrating the audio samples into tensorboard?

staplesinLA · 2020-08-31T13:11:30Z

By the way, are you integrating the audio samples into tensorboard?

No, I open it in CoolEdit.
I follow your suggestion, and manually divide the outputs by 32768 to compress it into -1~1. Then I use Soundfile to write it, it gets back to normal.
So, it's my wrong to convert it to int16 at the beginning, right? Though it's listened very well at the first 20 epochs.

staplesinLA · 2020-08-31T13:13:14Z

Thank you so much !!! @mpariente @jonashaag
I close it for now, and try to do a complete review.

mpariente · 2020-08-31T14:50:30Z

I don't think it learns this scale. The scale will be different for each training.

staplesinLA · 2020-08-31T14:59:26Z

I don't think it learns this scale. The scale will be different for each training.

yes, I rephrase it to avoid misleading. So it's better to do normalization before generating waveforms.

mpariente · 2020-08-31T15:01:45Z

Have a look at the eval.py file to see how we do it.
We normalize the estimates to have the same amplitude as the mixture. It's not the best non-intrusive guess we can do but that's better than -1/1 normalization.

staplesinLA · 2020-08-31T15:42:55Z

@mpariente Oh thanks, I used the former scripts, I found it in current version. Looks like I should pay more attention to the updates. Thanks again!!

staplesinLA added bug Something isn't working help wanted Extra attention is needed labels Aug 31, 2020

staplesinLA closed this as completed Aug 31, 2020

mpariente added a commit that referenced this issue Sep 25, 2020

[docs] Add issue #250 to FAQ

cc7bb84

mpariente added a commit that referenced this issue Sep 25, 2020

[docs] Add issue #250 to FAQ (#260)

aa85165

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Abnormal separated wavs #250

Abnormal separated wavs #250

staplesinLA commented Aug 31, 2020

jonashaag commented Aug 31, 2020

mpariente commented Aug 31, 2020

staplesinLA commented Aug 31, 2020

mpariente commented Aug 31, 2020

mpariente commented Aug 31, 2020

staplesinLA commented Aug 31, 2020 •

edited

staplesinLA commented Aug 31, 2020

mpariente commented Aug 31, 2020

staplesinLA commented Aug 31, 2020 •

edited

mpariente commented Aug 31, 2020

staplesinLA commented Aug 31, 2020

Abnormal separated wavs #250

Abnormal separated wavs #250

Comments

staplesinLA commented Aug 31, 2020

Environment

jonashaag commented Aug 31, 2020

mpariente commented Aug 31, 2020

staplesinLA commented Aug 31, 2020

mpariente commented Aug 31, 2020

mpariente commented Aug 31, 2020

staplesinLA commented Aug 31, 2020 • edited

staplesinLA commented Aug 31, 2020

mpariente commented Aug 31, 2020

staplesinLA commented Aug 31, 2020 • edited

mpariente commented Aug 31, 2020

staplesinLA commented Aug 31, 2020

staplesinLA commented Aug 31, 2020 •

edited

staplesinLA commented Aug 31, 2020 •

edited