-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different sampling rates #5
Comments
Hi! Thank you for the great question! |
Thank you for quick reply :). I will try this experiment as soon as possible and report here. |
Thank you for trying the additional experiment! Plz let me know if you need some help! |
I did some experiments with 16k samples. I used 4h 16k data, and the default model is being training for 9 days. Up to now, everything is okey. I am sharing tensorboard log. It seems that the model is able to boost the quality of the sound well qualitatively. I also want to observe difference between normal upsampling and neural upsampling. I have a 16k test set. I down-sampled this set for testing. I also have an acoustic model (which is trained in 16k audio). The results are here:
So, neural upsampling is worse than sox :(. |
Interesting works! |
In addition, if you have an STT model you could apply conditional score generation suggested by Yang Song (https://arxiv.org/pdf/2011.13456.pdf Section 5 and Appendix I). |
Thank you for advices. I think the problem is less training data and computational power. Training is continuing :). I am also going deeper and adapting the model to my case. I will report if the problem is solved. Yang Song's work is also interesting. I will check out if I can apply. Thank you :). |
Hi :). I did some experiments on same dataset with different noise level. In paper, a different noise level is mentioned ( Do you think if that noise level is too much for 8k->16k? Or it is okey for that setup? |
Hmm, I think you need to modify the inference schedule instead of the training schedule. Since 8 iteration's value is fit to our setup, maybe it could not be optimal for 8k->16k. |
I think you are right, I didn't change inference part. I will check and report here. Thank you so much :). |
(Now I can see it) |
In addition, I am curious about your hyperparameters. Please let me know your batch or audio length or any difference between our hparameter.yaml file. |
My hparameter.yaml :
My GPU is Tesla P100-PCIE-16GB |
I am increasing data now. I will start a training and report here as soon as possible. |
Hello Viet Anh! Thank you for considering our model as a reference and I will be waiting for your upcoming paper! |
Thank you for your response. |
Oh sorry for the misinformation. Yeah I mean 240k iterations instead of epoch. |
Interesting! It is still noisy after 8 iterations? We only trained and tested on 48k target so not very sure with 8k->16k setups! Reporting your results will helpful for our works too! |
Plz run for_test.py or lightning_test.py for numerical results |
Here is the results
Not really good right? I only changed audio.sr to 16000 and audio.ratio to 2. |
I think it is similar problem that our 48k model is also not good at generating harmonics. While our 48k model generates over 12k frequency elements which is not contained harmonics much, but 8k->16k is almost harmonic generation. |
Thank you for your explanation, I have noted that. Looking forward to your adjustments. |
I found that recent work from ByteDance(https://arxiv.org/pdf/2109.13731.pdf) cited our work and their results are not good too. I recommend you to read this! |
Hi!
Did you observe trainings with different sampling rates such as 8K->16K, 8K-> 22K, 16K->22K, etc.. ?
(diferent from demo page)
and what changes should we do to train with these data? (maybe hop length, n_fft, noise_schedule, pos_emb_scale, etc..)
The text was updated successfully, but these errors were encountered: