Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] How to apply on 16k data? #12

Closed
OnceJune opened this issue Nov 16, 2020 · 15 comments
Closed

[Question] How to apply on 16k data? #12

OnceJune opened this issue Nov 16, 2020 · 15 comments

Comments

@OnceJune
Copy link

OnceJune commented Nov 16, 2020

Hi, thanks for sharing your impressive code.
I tried to apply hifigan on 16k data, with config:
"upsample_rates": [8,5,5],
"upsample_kernel_sizes": [16,10,10],
"segment_size": 6400,
"hop_size": 200,
"win_size": 800,
"sampling_rate": 16000,

And it reports error like:
Traceback (most recent call last):
File "train.py", line 271, in <module>
main()
File "train.py", line 267, in main
train(0, a, h)
File "train.py", line 149, in train
loss_fm_f = feature_loss(fmap_f_r, fmap_f_g)
File "models.py", line 255, in feature_loss
loss += torch.mean(torch.abs(rl - gl))
RuntimeError: The size of tensor a (1067) must match the size of tensor b (1068) at non-singleton dimension 2

Is there any wrong in the modified config? Is it padding related?

@Miralan
Copy link

Miralan commented Nov 16, 2020

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

@OnceJune
Copy link
Author

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Many thanks, I'll try.

@Mingrg
Copy link

Mingrg commented Dec 24, 2020

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Many thanks, I'll try.

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

I use the method 1, but it's not work for me. And I don't know why, Is there any wrong in my config?(I did not change the segment_size, but other params changed)
My config:

    "upsample_rates": [8,8,2,2],
    "upsample_kernel_sizes": [16,16,4,4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "segment_size": 8192,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft": 2048,
    "hop_size": 300,
    "win_size": 1200,

    "sampling_rate": 24000,

Error:

  File "train.py", line 278, in <module>
    main()
  File "train.py", line 274, in main
    train(0, a, h)
  File "train.py", line 151, in train
    loss_mel = F.l1_loss(y_mel, y_g_hat_mel) * 45
  File "/data/rd/anaconda3/envs/mrg_hifigan/lib/python3.6/site-packages/torch/nn/functional.py", line 2186, in l1_loss
    ret = torch.abs(input - target)
RuntimeError: The size of tensor a (27) must match the size of tensor b (23) at non-singleton dimension 2

@Miralan
Copy link

Miralan commented Dec 24, 2020

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Many thanks, I'll try.

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

I use the method 1, but it's not work for me. And I don't know why, Is there any wrong in my config?(I did not change the segment_size, but other params changed)
My config:

    "upsample_rates": [8,8,2,2],
    "upsample_kernel_sizes": [16,16,4,4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "segment_size": 8192,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft": 2048,
    "hop_size": 300,
    "win_size": 1200,

    "sampling_rate": 24000,

Error:

  File "train.py", line 278, in <module>
    main()
  File "train.py", line 274, in main
    train(0, a, h)
  File "train.py", line 151, in train
    loss_mel = F.l1_loss(y_mel, y_g_hat_mel) * 45
  File "/data/rd/anaconda3/envs/mrg_hifigan/lib/python3.6/site-packages/torch/nn/functional.py", line 2186, in l1_loss
    ret = torch.abs(input - target)
RuntimeError: The size of tensor a (27) must match the size of tensor b (23) at non-singleton dimension 2

make sure [8,8,2,2] (8 * 8 * 2 * 2 == hop_size). In your config, hop_size should be 256 not 300

@Mingrg
Copy link

Mingrg commented Dec 24, 2020

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Many thanks, I'll try.

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

I use the method 1, but it's not work for me. And I don't know why, Is there any wrong in my config?(I did not change the segment_size, but other params changed)
My config:

    "upsample_rates": [8,8,2,2],
    "upsample_kernel_sizes": [16,16,4,4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "segment_size": 8192,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft": 2048,
    "hop_size": 300,
    "win_size": 1200,

    "sampling_rate": 24000,

Error:

  File "train.py", line 278, in <module>
    main()
  File "train.py", line 274, in main
    train(0, a, h)
  File "train.py", line 151, in train
    loss_mel = F.l1_loss(y_mel, y_g_hat_mel) * 45
  File "/data/rd/anaconda3/envs/mrg_hifigan/lib/python3.6/site-packages/torch/nn/functional.py", line 2186, in l1_loss
    ret = torch.abs(input - target)
RuntimeError: The size of tensor a (27) must match the size of tensor b (23) at non-singleton dimension 2

make sure [8,8,2,2] (8 * 8 * 2 * 2 == hop_size). In your config, hop_size should be 256 not 300

Thank you for your answer! I tried to change the [8,8,2,2] to [4, 5, 3, 5] the error occurred again. If I have to use this config:
(because my fastspeech config is like this)

     "segment_size": 8192,
     "num_mels": 80,
     "num_freq": 1025,
     "n_fft": 2048,
     "hop_size": 300,
     "win_size": 1200,

Is there anything else such as upsample_rates I must change in this?(I'm new to vocoder, don't know much about that, sorry:) )

    "upsample_rates": [4,5,3,5],
    "upsample_kernel_sizes": [16,16,4,4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

@Miralan
Copy link

Miralan commented Dec 24, 2020

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Many thanks, I'll try.

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

I use the method 1, but it's not work for me. And I don't know why, Is there any wrong in my config?(I did not change the segment_size, but other params changed)
My config:

    "upsample_rates": [8,8,2,2],
    "upsample_kernel_sizes": [16,16,4,4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "segment_size": 8192,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft": 2048,
    "hop_size": 300,
    "win_size": 1200,

    "sampling_rate": 24000,

Error:

  File "train.py", line 278, in <module>
    main()
  File "train.py", line 274, in main
    train(0, a, h)
  File "train.py", line 151, in train
    loss_mel = F.l1_loss(y_mel, y_g_hat_mel) * 45
  File "/data/rd/anaconda3/envs/mrg_hifigan/lib/python3.6/site-packages/torch/nn/functional.py", line 2186, in l1_loss
    ret = torch.abs(input - target)
RuntimeError: The size of tensor a (27) must match the size of tensor b (23) at non-singleton dimension 2

make sure [8,8,2,2] (8 * 8 * 2 * 2 == hop_size). In your config, hop_size should be 256 not 300

Thank you for your answer! I tried to change the [8,8,2,2] to [4, 5, 3, 5] the error occurred again. If I have to use this config:
(because my fastspeech config is like this)

     "segment_size": 8192,
     "num_mels": 80,
     "num_freq": 1025,
     "n_fft": 2048,
     "hop_size": 300,
     "win_size": 1200,

Is there anything else such as upsample_rates I must change in this?(I'm new to vocoder, don't know much about that, sorry:) )

    "upsample_rates": [4,5,3,5],
    "upsample_kernel_sizes": [16,16,4,4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

Maybe you need to read code of generator carefully.
If you use [4, 5, 3, 5], your kernel _size should be [8, 10, 6, 10]

@Mingrg
Copy link

Mingrg commented Dec 24, 2020

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Many thanks, I'll try.

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

I use the method 1, but it's not work for me. And I don't know why, Is there any wrong in my config?(I did not change the segment_size, but other params changed)
My config:

    "upsample_rates": [8,8,2,2],
    "upsample_kernel_sizes": [16,16,4,4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "segment_size": 8192,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft": 2048,
    "hop_size": 300,
    "win_size": 1200,

    "sampling_rate": 24000,

Error:

  File "train.py", line 278, in <module>
    main()
  File "train.py", line 274, in main
    train(0, a, h)
  File "train.py", line 151, in train
    loss_mel = F.l1_loss(y_mel, y_g_hat_mel) * 45
  File "/data/rd/anaconda3/envs/mrg_hifigan/lib/python3.6/site-packages/torch/nn/functional.py", line 2186, in l1_loss
    ret = torch.abs(input - target)
RuntimeError: The size of tensor a (27) must match the size of tensor b (23) at non-singleton dimension 2

make sure [8,8,2,2] (8 * 8 * 2 * 2 == hop_size). In your config, hop_size should be 256 not 300

Thank you for your answer! I tried to change the [8,8,2,2] to [4, 5, 3, 5] the error occurred again. If I have to use this config:
(because my fastspeech config is like this)

     "segment_size": 8192,
     "num_mels": 80,
     "num_freq": 1025,
     "n_fft": 2048,
     "hop_size": 300,
     "win_size": 1200,

Is there anything else such as upsample_rates I must change in this?(I'm new to vocoder, don't know much about that, sorry:) )

    "upsample_rates": [4,5,3,5],
    "upsample_kernel_sizes": [16,16,4,4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

Maybe you need to read code of generator carefully.
If you use [4, 5, 3, 5], your kernel _size should be [8, 10, 6, 10]

Thank you so much! I will

@Mingrg
Copy link

Mingrg commented Dec 24, 2020

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Hi, @Miralan I'm sorry to bother you again.But I tried all these two methods, they did not work for me.
My config now:

    "upsample_rates": [4,5,3,5],
    "upsample_kernel_sizes": [8,10,6,10],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "segment_size": 8192,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft": 2048,
    "hop_size": 300,
    "win_size": 1200,

    "sampling_rate": 24000,

My error when I try these two methods:

Traceback (most recent call last):
  File "train.py", line 278, in <module>
    main()
  File "train.py", line 274, in main
    train(0, a, h)
  File "train.py", line 155, in train
    loss_fm_f = feature_loss(fmap_f_r, fmap_f_g)
  File "/data1/rd/mingruigang/TTS/hifi-gan/models.py", line 263, in feature_loss
    loss += torch.mean(torch.abs(rl - gl))
RuntimeError: The size of tensor a (1366) must match the size of tensor b (1350) at non-singleton dimension 2

By the way, did this two methods work in 16k? @OnceJune

@OnceJune
Copy link
Author

@Mingrg I tried method 1 and it works for 16k.

@Miralan
Copy link

Miralan commented Dec 24, 2020

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Hi, @Miralan I'm sorry to bother you again.But I tried all these two methods, they did not work for me.
My config now:

    "upsample_rates": [4,5,3,5],
    "upsample_kernel_sizes": [8,10,6,10],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "segment_size": 8192,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft": 2048,
    "hop_size": 300,
    "win_size": 1200,

    "sampling_rate": 24000,

My error when I try these two methods:

Traceback (most recent call last):
  File "train.py", line 278, in <module>
    main()
  File "train.py", line 274, in main
    train(0, a, h)
  File "train.py", line 155, in train
    loss_fm_f = feature_loss(fmap_f_r, fmap_f_g)
  File "/data1/rd/mingruigang/TTS/hifi-gan/models.py", line 263, in feature_loss
    loss += torch.mean(torch.abs(rl - gl))
RuntimeError: The size of tensor a (1366) must match the size of tensor b (1350) at non-singleton dimension 2

By the way, did this two methods work in 16k? @OnceJune

you should make sure segment_size % hop_size == 0, try 9000 instead of 8192

@Mingrg
Copy link

Mingrg commented Dec 24, 2020

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Hi, @Miralan I'm sorry to bother you again.But I tried all these two methods, they did not work for me.
My config now:

    "upsample_rates": [4,5,3,5],
    "upsample_kernel_sizes": [8,10,6,10],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "segment_size": 8192,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft": 2048,
    "hop_size": 300,
    "win_size": 1200,

    "sampling_rate": 24000,

My error when I try these two methods:

Traceback (most recent call last):
  File "train.py", line 278, in <module>
    main()
  File "train.py", line 274, in main
    train(0, a, h)
  File "train.py", line 155, in train
    loss_fm_f = feature_loss(fmap_f_r, fmap_f_g)
  File "/data1/rd/mingruigang/TTS/hifi-gan/models.py", line 263, in feature_loss
    loss += torch.mean(torch.abs(rl - gl))
RuntimeError: The size of tensor a (1366) must match the size of tensor b (1350) at non-singleton dimension 2

By the way, did this two methods work in 16k? @OnceJune

you should make sure segment_size % hop_size == 0, try 9000 instead of 8192

Thank you so much for your patience!

@Approximetal
Copy link

Hi @Miralan,
Can I use the pre-train model if I change the sampling rate to 16000 while keeping other parameters unchanged?

@Miralan
Copy link

Miralan commented Dec 25, 2020

Hi @Miralan,
Can I use the pre-train model if I change the sampling rate to 16000 while keeping other parameters unchanged?

If you want to use pretrain model, you can use pretrain model generate 22.05khz audio and resample it to 16khz.Directly changing sampling rate is useless

Numanor added a commit to thuhcsi/hifi-gan that referenced this issue Mar 5, 2021
@wizardk
Copy link

wizardk commented Apr 27, 2021

Hi, thanks for sharing your impressive code.
I tried to apply hifigan on 16k data, with config:
"upsample_rates": [8,5,5],
"upsample_kernel_sizes": [16,10,10],
"segment_size": 6400,
"hop_size": 200,
"win_size": 800,
"sampling_rate": 16000,

And it reports error like:
Traceback (most recent call last):
File "train.py", line 271, in <module>
main()
File "train.py", line 267, in main
train(0, a, h)
File "train.py", line 149, in train
loss_fm_f = feature_loss(fmap_f_r, fmap_f_g)
File "models.py", line 255, in feature_loss
loss += torch.mean(torch.abs(rl - gl))
RuntimeError: The size of tensor a (1067) must match the size of tensor b (1068) at non-singleton dimension 2

Is there any wrong in the modified config? Is it padding related?

Hi @OnceJune , did you change the n_fft to 800 as same as win_size?

@OnceJune
Copy link
Author

@wizardk no, I used hop_size 200, win_size 800, n_fft 1024, fft requires the input to be 2^n (which is 2^10 in this case), you might want to check https://en.wikipedia.org/wiki/Fast_Fourier_transform to find more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants