Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide trained model in higher resolution #4

Open
FSharpCSharp opened this issue Feb 26, 2020 · 4 comments
Open

Provide trained model in higher resolution #4

FSharpCSharp opened this issue Feb 26, 2020 · 4 comments

Comments

@FSharpCSharp
Copy link

I have now carried out extensive tests with the model. Unfortunately I found out that the output signal is always cut off at 22050 Hz. Although the actual output signal would have a purely theoretical resolution of 32000 Hz. This means that the signal does not have the full range that it could actually have.

Is this due to the learned model, or any additional settings? I have now proceeded as described in the Python notebook, and double-checked everything. Unfortunately the quality is not as brilliant as it could be due to the 22050 Hz output result. Here is a short explanation.

@davda54
Copy link
Contributor

davda54 commented Mar 11, 2020

Hi, you're right, there's a clear cut-off after 10 kHz, but we're unsure about its cause. It seems to be an internal property of the neural network. Please let me know if you catch the bug :)

spectrogram_cutoff

@RadioAngurem
Copy link

I have downloaded the MusDB tracks and I have checked that the sum of the stems is not equal to the stereo mix. Not only that, the difference between the stereomix and the stems sum it´s to big to not consider it. You can identify the song listening that difference so I believe that compute the weights of the TCN mask using the MusDB instead of the MusDB HQ add an error to the model.

Also, why not try to add another step to the network?. The MusDB cut the frequencies above 16Khz so the model is not trained to work with audios that have information above that frequency. Could a network whit this parameters work?:

S = 10; 1, T/2, sr=6000 Hz
S = 20; 1,T, sr=12000 Hz
S = 40; 1,2T, sr=24000 Hz
S = 80; 1,4T, sr=48000 Hz

@coincoin73
Copy link

I have downloaded the MusDB tracks and I have checked that the sum of the stems is not equal to the stereo mix. Not only that, the difference between the stereomix and the stems sum it´s to big to not consider it. You can identify the song listening that difference so I believe that compute the weights of the TCN mask using the MusDB instead of the MusDB HQ add an error to the model.

Also, why not try to add another step to the network?. The MusDB cut the frequencies above 16Khz so the model is not trained to work with audios that have information above that frequency. Could a network whit this parameters work?:

S = 10; 1, T/2, sr=6000 Hz
S = 20; 1,T, sr=12000 Hz
S = 40; 1,2T, sr=24000 Hz
S = 80; 1,4T, sr=48000 Hz

Did someone, made the test ?

@JeffreyCA
Copy link

There was a similar issue to this with Spleeter, where high frequencies are not present in output files. Here's their explanation: https://github.com/deezer/spleeter/wiki/5.-FAQ#why-are-there-no-high-frequencies-in-the-generated-output-files-

@davda54 Could this issue be similar to that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants