Source Separation Integration: sum(sources + background_noise) != mixture with mels. #38

popcornell · 2020-06-28T13:46:23Z

I am experimenting a bit with lhotse integration in asteroid here:
https://github.com/mpariente/asteroid/blob/lhotse_integration_test/egs/MiniLibriMix/lhotse/

One thing i noticed (unless I did something completely wrong) is that the sum of the sources plus background noise features is different from the mixture features:
https://github.com/mpariente/asteroid/blob/lhotse_integration_test/egs/MiniLibriMix/lhotse/test_additive.py
This could be a problem when training a separation model as basically the underlining assumption is that the process is additive.
I guess this is due to the fact that the feature computation via torchaudio.complicance.kaldi.fbank must have some non-linear operations (aside the log operation of course !).
I guess so because dithering is disabled by default ( see pytorch/audio#371 ).
Does any of you have a clue of why this happens ? The difference seems too substantial (first decimal digit) to be ascribed to truncation etc.

BTW the problem is easily side-stepped by summing at training time the sources and noise mels to get the mixture. It is inexpensive + you save space on the disk by avoiding dumping also the mixture feats.

The text was updated successfully, but these errors were encountered:

danpovey · 2020-06-28T13:58:36Z

Can you please show an example? Wonder how often it's that different; which mel bin is most different; etc.

popcornell · 2020-06-28T14:12:51Z

I don't know how much useful it can be, but here are some plots for now ( I can compute also some stats on the difference distribution). There is a difference of over 3.9 for one bin and it is very strange.

Loaded mixture feats:

On the fly np.log(np.sum(np.exp(c_sources), 0) + np.exp(c_noise)):

Abs difference between the two:

danpovey · 2020-06-28T14:32:44Z

That looks fine to me, as long as the difference isn't too consistent one way or the other. Sometimes the signal will be exactly in or out of phase and you won't get the exact energy you expect. It's not a problem.

…

On Sun, Jun 28, 2020 at 10:13 PM Samuele Cornell ***@***.***> wrote: I don't know how much useful it can be, but here are some plots for now ( I can compute also some stats on the difference distribution). There is a difference of over 3.9 for one bin and it is very strange. Loaded mixture feats: [image: c_mix] <https://user-images.githubusercontent.com/18726713/85949807-27d94600-b959-11ea-971d-392f7b6d1c8f.png> On the fly np.log(np.sum(np.exp(c_sources), 0) + np.exp(c_noise)): [image: onthefly] <https://user-images.githubusercontent.com/18726713/85949824-3e7f9d00-b959-11ea-99f4-8ceaac90a5cb.png> Abs difference between the two: [image: difference] <https://user-images.githubusercontent.com/18726713/85949846-61aa4c80-b959-11ea-9e5e-bf01b2deb646.png> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#38 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZFLOZS7TF6KA6BQF5YMDDRY5FW5ANCNFSM4OKRF54Q> .

popcornell · 2020-06-28T14:56:05Z

Thank you very much.
I'll try to train two systems (when i will have some spare GPUs) for separation in feature domain.
One with mixing on-the-fly as above and one without and see what happens. In the past I have always mixed the features on-the-fly and had decent results.

My main concern is that it is sorta like using "noisy labels" for separation.
And because the separation is done on mels (and not in log-mels) those differences actually can be even more substantial and it could be difficult for the DNN to learn a mask for each speaker with that amount of "noise" in the oracle targets.

pzelasko · 2020-11-04T19:30:06Z

I'm closing as it seems stale - if there're any new developments be sure to let us know!

pzelasko closed this as completed Nov 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Source Separation Integration: sum(sources + background_noise) != mixture with mels. #38

Source Separation Integration: sum(sources + background_noise) != mixture with mels. #38

popcornell commented Jun 28, 2020

danpovey commented Jun 28, 2020

popcornell commented Jun 28, 2020

danpovey commented Jun 28, 2020 via email

popcornell commented Jun 28, 2020

pzelasko commented Nov 4, 2020

Source Separation Integration: sum(sources + background_noise) != mixture with mels. #38

Source Separation Integration: sum(sources + background_noise) != mixture with mels. #38

Comments

popcornell commented Jun 28, 2020

danpovey commented Jun 28, 2020

popcornell commented Jun 28, 2020

danpovey commented Jun 28, 2020 via email

popcornell commented Jun 28, 2020

pzelasko commented Nov 4, 2020