Provide trained model in higher resolution #4

FSharpCSharp · 2020-02-26T09:49:12Z

I have now carried out extensive tests with the model. Unfortunately I found out that the output signal is always cut off at 22050 Hz. Although the actual output signal would have a purely theoretical resolution of 32000 Hz. This means that the signal does not have the full range that it could actually have.

Is this due to the learned model, or any additional settings? I have now proceeded as described in the Python notebook, and double-checked everything. Unfortunately the quality is not as brilliant as it could be due to the 22050 Hz output result. Here is a short explanation.

davda54 · 2020-03-11T16:57:48Z

Hi, you're right, there's a clear cut-off after 10 kHz, but we're unsure about its cause. It seems to be an internal property of the neural network. Please let me know if you catch the bug :)

RadioAngurem · 2020-03-18T11:24:05Z

I have downloaded the MusDB tracks and I have checked that the sum of the stems is not equal to the stereo mix. Not only that, the difference between the stereomix and the stems sum it´s to big to not consider it. You can identify the song listening that difference so I believe that compute the weights of the TCN mask using the MusDB instead of the MusDB HQ add an error to the model.

Also, why not try to add another step to the network?. The MusDB cut the frequencies above 16Khz so the model is not trained to work with audios that have information above that frequency. Could a network whit this parameters work?:

S = 10; 1, T/2, sr=6000 Hz
S = 20; 1,T, sr=12000 Hz
S = 40; 1,2T, sr=24000 Hz
S = 80; 1,4T, sr=48000 Hz

coincoin73 · 2020-07-05T10:07:18Z

I have downloaded the MusDB tracks and I have checked that the sum of the stems is not equal to the stereo mix. Not only that, the difference between the stereomix and the stems sum it´s to big to not consider it. You can identify the song listening that difference so I believe that compute the weights of the TCN mask using the MusDB instead of the MusDB HQ add an error to the model.

Also, why not try to add another step to the network?. The MusDB cut the frequencies above 16Khz so the model is not trained to work with audios that have information above that frequency. Could a network whit this parameters work?:

S = 10; 1, T/2, sr=6000 Hz
S = 20; 1,T, sr=12000 Hz
S = 40; 1,2T, sr=24000 Hz
S = 80; 1,4T, sr=48000 Hz

Did someone, made the test ?

JeffreyCA · 2020-10-12T20:59:55Z

There was a similar issue to this with Spleeter, where high frequencies are not present in output files. Here's their explanation: https://github.com/deezer/spleeter/wiki/5.-FAQ#why-are-there-no-high-frequencies-in-the-generated-output-files-

@davda54 Could this issue be similar to that?

JeffreyCA mentioned this issue Oct 12, 2020

Support additional source separation models JeffreyCA/spleeter-web#19

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide trained model in higher resolution #4

Provide trained model in higher resolution #4

FSharpCSharp commented Feb 26, 2020

davda54 commented Mar 11, 2020

RadioAngurem commented Mar 18, 2020

coincoin73 commented Jul 5, 2020

JeffreyCA commented Oct 12, 2020

Provide trained model in higher resolution #4

Provide trained model in higher resolution #4

Comments

FSharpCSharp commented Feb 26, 2020

davda54 commented Mar 11, 2020

RadioAngurem commented Mar 18, 2020

coincoin73 commented Jul 5, 2020

JeffreyCA commented Oct 12, 2020