New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] How to apply on 16k data? #12
Comments
1.You can change lines 86-89 in models.py self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i),
h.upsample_initial_channel//(2**(i+1)),k, u,
padding=(k-u)//2))) modified: self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i),
h.upsample_initial_channel//(2**(i+1)),
k, u, padding=(u//2 + u%2), output_padding=u%2))) 2.or just cutoff (conv_in_size * u) after each transposed convolution. def __init__():
...
self.upsample_rates = h.upsample_rates
def forward():
....
x1 = F.leaky_relu(x, LRELU_SLOPE)
x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
.... |
Many thanks, I'll try. |
I use the method 1, but it's not work for me. And I don't know why, Is there any wrong in my config?(I did not change the segment_size, but other params changed)
Error:
|
make sure [8,8,2,2] (8 * 8 * 2 * 2 == hop_size). In your config, hop_size should be 256 not 300 |
Thank you for your answer! I tried to change the [8,8,2,2] to [4, 5, 3, 5] the error occurred again. If I have to use this config:
Is there anything else such as upsample_rates I must change in this?(I'm new to vocoder, don't know much about that, sorry:) )
|
Maybe you need to read code of generator carefully. |
Thank you so much! I will |
Hi, @Miralan I'm sorry to bother you again.But I tried all these two methods, they did not work for me.
My error when I try these two methods:
By the way, did this two methods work in 16k? @OnceJune |
@Mingrg I tried method 1 and it works for 16k. |
you should make sure segment_size % hop_size == 0, try 9000 instead of 8192 |
Thank you so much for your patience! |
Hi @Miralan, |
If you want to use pretrain model, you can use pretrain model generate 22.05khz audio and resample it to 16khz.Directly changing sampling rate is useless |
(according to jik876#12 (comment))
Hi @OnceJune , did you change the n_fft to 800 as same as win_size? |
@wizardk no, I used hop_size 200, win_size 800, n_fft 1024, fft requires the input to be 2^n (which is 2^10 in this case), you might want to check https://en.wikipedia.org/wiki/Fast_Fourier_transform to find more details. |
Hi, thanks for sharing your impressive code.
I tried to apply hifigan on 16k data, with config:
"upsample_rates": [8,5,5],
"upsample_kernel_sizes": [16,10,10],
"segment_size": 6400,
"hop_size": 200,
"win_size": 800,
"sampling_rate": 16000,
And it reports error like:
Traceback (most recent call last):
File "train.py", line 271, in <module>
main()
File "train.py", line 267, in main
train(0, a, h)
File "train.py", line 149, in train
loss_fm_f = feature_loss(fmap_f_r, fmap_f_g)
File "models.py", line 255, in feature_loss
loss += torch.mean(torch.abs(rl - gl))
RuntimeError: The size of tensor a (1067) must match the size of tensor b (1068) at non-singleton dimension 2
Is there any wrong in the modified config? Is it padding related?
The text was updated successfully, but these errors were encountered: