[Question] How to apply on 16k data? #12

OnceJune · 2020-11-16T03:02:06Z

Hi, thanks for sharing your impressive code.
I tried to apply hifigan on 16k data, with config:
"upsample_rates": [8,5,5],
"upsample_kernel_sizes": [16,10,10],
"segment_size": 6400,
"hop_size": 200,
"win_size": 800,
"sampling_rate": 16000,

And it reports error like:
Traceback (most recent call last):
File "train.py", line 271, in <module>
main()
File "train.py", line 267, in main
train(0, a, h)
File "train.py", line 149, in train
loss_fm_f = feature_loss(fmap_f_r, fmap_f_g)
File "models.py", line 255, in feature_loss
loss += torch.mean(torch.abs(rl - gl))
RuntimeError: The size of tensor a (1067) must match the size of tensor b (1068) at non-singleton dimension 2

Is there any wrong in the modified config? Is it padding related?

The text was updated successfully, but these errors were encountered:

Miralan · 2020-11-16T03:22:11Z

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

OnceJune · 2020-11-16T03:26:47Z

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Many thanks, I'll try.

Mingrg · 2020-12-24T03:50:39Z

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Many thanks, I'll try.

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

I use the method 1, but it's not work for me. And I don't know why, Is there any wrong in my config?(I did not change the segment_size, but other params changed)
My config:

    "upsample_rates": [8,8,2,2],
    "upsample_kernel_sizes": [16,16,4,4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "segment_size": 8192,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft": 2048,
    "hop_size": 300,
    "win_size": 1200,

    "sampling_rate": 24000,

Error:

  File "train.py", line 278, in <module>
    main()
  File "train.py", line 274, in main
    train(0, a, h)
  File "train.py", line 151, in train
    loss_mel = F.l1_loss(y_mel, y_g_hat_mel) * 45
  File "/data/rd/anaconda3/envs/mrg_hifigan/lib/python3.6/site-packages/torch/nn/functional.py", line 2186, in l1_loss
    ret = torch.abs(input - target)
RuntimeError: The size of tensor a (27) must match the size of tensor b (23) at non-singleton dimension 2

Miralan · 2020-12-24T04:31:14Z

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Many thanks, I'll try.

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

I use the method 1, but it's not work for me. And I don't know why, Is there any wrong in my config?(I did not change the segment_size, but other params changed)
My config:

    "upsample_rates": [8,8,2,2],
    "upsample_kernel_sizes": [16,16,4,4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "segment_size": 8192,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft": 2048,
    "hop_size": 300,
    "win_size": 1200,

    "sampling_rate": 24000,

Error:

  File "train.py", line 278, in <module>
    main()
  File "train.py", line 274, in main
    train(0, a, h)
  File "train.py", line 151, in train
    loss_mel = F.l1_loss(y_mel, y_g_hat_mel) * 45
  File "/data/rd/anaconda3/envs/mrg_hifigan/lib/python3.6/site-packages/torch/nn/functional.py", line 2186, in l1_loss
    ret = torch.abs(input - target)
RuntimeError: The size of tensor a (27) must match the size of tensor b (23) at non-singleton dimension 2

make sure [8,8,2,2] (8 * 8 * 2 * 2 == hop_size). In your config, hop_size should be 256 not 300

Mingrg · 2020-12-24T04:54:53Z

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Many thanks, I'll try.

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

I use the method 1, but it's not work for me. And I don't know why, Is there any wrong in my config?(I did not change the segment_size, but other params changed)
My config:

    "upsample_rates": [8,8,2,2],
    "upsample_kernel_sizes": [16,16,4,4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "segment_size": 8192,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft": 2048,
    "hop_size": 300,
    "win_size": 1200,

    "sampling_rate": 24000,

Error:

  File "train.py", line 278, in <module>
    main()
  File "train.py", line 274, in main
    train(0, a, h)
  File "train.py", line 151, in train
    loss_mel = F.l1_loss(y_mel, y_g_hat_mel) * 45
  File "/data/rd/anaconda3/envs/mrg_hifigan/lib/python3.6/site-packages/torch/nn/functional.py", line 2186, in l1_loss
    ret = torch.abs(input - target)
RuntimeError: The size of tensor a (27) must match the size of tensor b (23) at non-singleton dimension 2

make sure [8,8,2,2] (8 * 8 * 2 * 2 == hop_size). In your config, hop_size should be 256 not 300

Thank you for your answer! I tried to change the [8,8,2,2] to [4, 5, 3, 5] the error occurred again. If I have to use this config:
(because my fastspeech config is like this)

     "segment_size": 8192,
     "num_mels": 80,
     "num_freq": 1025,
     "n_fft": 2048,
     "hop_size": 300,
     "win_size": 1200,

Is there anything else such as upsample_rates I must change in this？(I'm new to vocoder, don't know much about that, sorry:) )

    "upsample_rates": [4,5,3,5],
    "upsample_kernel_sizes": [16,16,4,4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

Miralan · 2020-12-24T05:27:00Z

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Many thanks, I'll try.

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

I use the method 1, but it's not work for me. And I don't know why, Is there any wrong in my config?(I did not change the segment_size, but other params changed)
My config:

    "upsample_rates": [8,8,2,2],
    "upsample_kernel_sizes": [16,16,4,4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "segment_size": 8192,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft": 2048,
    "hop_size": 300,
    "win_size": 1200,

    "sampling_rate": 24000,

Error:

  File "train.py", line 278, in <module>
    main()
  File "train.py", line 274, in main
    train(0, a, h)
  File "train.py", line 151, in train
    loss_mel = F.l1_loss(y_mel, y_g_hat_mel) * 45
  File "/data/rd/anaconda3/envs/mrg_hifigan/lib/python3.6/site-packages/torch/nn/functional.py", line 2186, in l1_loss
    ret = torch.abs(input - target)
RuntimeError: The size of tensor a (27) must match the size of tensor b (23) at non-singleton dimension 2

make sure [8,8,2,2] (8 * 8 * 2 * 2 == hop_size). In your config, hop_size should be 256 not 300

Thank you for your answer! I tried to change the [8,8,2,2] to [4, 5, 3, 5] the error occurred again. If I have to use this config:
(because my fastspeech config is like this)

     "segment_size": 8192,
     "num_mels": 80,
     "num_freq": 1025,
     "n_fft": 2048,
     "hop_size": 300,
     "win_size": 1200,

Is there anything else such as upsample_rates I must change in this？(I'm new to vocoder, don't know much about that, sorry:) )

    "upsample_rates": [4,5,3,5],
    "upsample_kernel_sizes": [16,16,4,4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

Maybe you need to read code of generator carefully.
If you use [4, 5, 3, 5], your kernel _size should be [8, 10, 6, 10]

Mingrg · 2020-12-24T05:41:41Z

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Many thanks, I'll try.

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

I use the method 1, but it's not work for me. And I don't know why, Is there any wrong in my config?(I did not change the segment_size, but other params changed)
My config:

    "upsample_rates": [8,8,2,2],
    "upsample_kernel_sizes": [16,16,4,4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "segment_size": 8192,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft": 2048,
    "hop_size": 300,
    "win_size": 1200,

    "sampling_rate": 24000,

Error:

  File "train.py", line 278, in <module>
    main()
  File "train.py", line 274, in main
    train(0, a, h)
  File "train.py", line 151, in train
    loss_mel = F.l1_loss(y_mel, y_g_hat_mel) * 45
  File "/data/rd/anaconda3/envs/mrg_hifigan/lib/python3.6/site-packages/torch/nn/functional.py", line 2186, in l1_loss
    ret = torch.abs(input - target)
RuntimeError: The size of tensor a (27) must match the size of tensor b (23) at non-singleton dimension 2

make sure [8,8,2,2] (8 * 8 * 2 * 2 == hop_size). In your config, hop_size should be 256 not 300

Thank you for your answer! I tried to change the [8,8,2,2] to [4, 5, 3, 5] the error occurred again. If I have to use this config:
(because my fastspeech config is like this)

     "segment_size": 8192,
     "num_mels": 80,
     "num_freq": 1025,
     "n_fft": 2048,
     "hop_size": 300,
     "win_size": 1200,

Is there anything else such as upsample_rates I must change in this？(I'm new to vocoder, don't know much about that, sorry:) )

    "upsample_rates": [4,5,3,5],
    "upsample_kernel_sizes": [16,16,4,4],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

Maybe you need to read code of generator carefully.
If you use [4, 5, 3, 5], your kernel _size should be [8, 10, 6, 10]

Thank you so much! I will

Mingrg · 2020-12-24T06:42:42Z

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Hi, @Miralan I'm sorry to bother you again.But I tried all these two methods, they did not work for me.
My config now:

    "upsample_rates": [4,5,3,5],
    "upsample_kernel_sizes": [8,10,6,10],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "segment_size": 8192,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft": 2048,
    "hop_size": 300,
    "win_size": 1200,

    "sampling_rate": 24000,

My error when I try these two methods:

Traceback (most recent call last):
  File "train.py", line 278, in <module>
    main()
  File "train.py", line 274, in main
    train(0, a, h)
  File "train.py", line 155, in train
    loss_fm_f = feature_loss(fmap_f_r, fmap_f_g)
  File "/data1/rd/mingruigang/TTS/hifi-gan/models.py", line 263, in feature_loss
    loss += torch.mean(torch.abs(rl - gl))
RuntimeError: The size of tensor a (1366) must match the size of tensor b (1350) at non-singleton dimension 2

By the way, did this two methods work in 16k? @OnceJune

OnceJune · 2020-12-24T07:26:56Z

@Mingrg I tried method 1 and it works for 16k.

Miralan · 2020-12-24T07:55:56Z

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Hi, @Miralan I'm sorry to bother you again.But I tried all these two methods, they did not work for me.
My config now:

    "upsample_rates": [4,5,3,5],
    "upsample_kernel_sizes": [8,10,6,10],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "segment_size": 8192,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft": 2048,
    "hop_size": 300,
    "win_size": 1200,

    "sampling_rate": 24000,

My error when I try these two methods:

Traceback (most recent call last):
  File "train.py", line 278, in <module>
    main()
  File "train.py", line 274, in main
    train(0, a, h)
  File "train.py", line 155, in train
    loss_fm_f = feature_loss(fmap_f_r, fmap_f_g)
  File "/data1/rd/mingruigang/TTS/hifi-gan/models.py", line 263, in feature_loss
    loss += torch.mean(torch.abs(rl - gl))
RuntimeError: The size of tensor a (1366) must match the size of tensor b (1350) at non-singleton dimension 2

By the way, did this two methods work in 16k? @OnceJune

you should make sure segment_size % hop_size == 0, try 9000 instead of 8192

Mingrg · 2020-12-24T08:02:12Z

1.You can change lines 86-89 in models.py
origin:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 	
                                                h.upsample_initial_channel//(2**(i+1)),k, u, 
                                                padding=(k-u)//2)))

modified:

self.ups.append(weight_norm(ConvTranspose1d(h.upsample_initial_channel//(2**i), 
                                                 h.upsample_initial_channel//(2**(i+1)),
                                                 k, u, padding=(u//2 + u%2), output_padding=u%2)))

2.or just cutoff (conv_in_size * u) after each transposed convolution.

def  __init__():
     ...
     self.upsample_rates = h.upsample_rates           
def forward():
    ....
    x1 = F.leaky_relu(x, LRELU_SLOPE)
    x = self.ups[i](x1)[:, :, :x1.size(-1) * self.upsample_rates[i]]
    ....

Hi, @Miralan I'm sorry to bother you again.But I tried all these two methods, they did not work for me.
My config now:

    "upsample_rates": [4,5,3,5],
    "upsample_kernel_sizes": [8,10,6,10],
    "upsample_initial_channel": 512,
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],

    "segment_size": 8192,
    "num_mels": 80,
    "num_freq": 1025,
    "n_fft": 2048,
    "hop_size": 300,
    "win_size": 1200,

    "sampling_rate": 24000,

My error when I try these two methods:

Traceback (most recent call last):
  File "train.py", line 278, in <module>
    main()
  File "train.py", line 274, in main
    train(0, a, h)
  File "train.py", line 155, in train
    loss_fm_f = feature_loss(fmap_f_r, fmap_f_g)
  File "/data1/rd/mingruigang/TTS/hifi-gan/models.py", line 263, in feature_loss
    loss += torch.mean(torch.abs(rl - gl))
RuntimeError: The size of tensor a (1366) must match the size of tensor b (1350) at non-singleton dimension 2

By the way, did this two methods work in 16k? @OnceJune

you should make sure segment_size % hop_size == 0, try 9000 instead of 8192

Thank you so much for your patience!

Approximetal · 2020-12-25T03:26:11Z

Hi @Miralan,
Can I use the pre-train model if I change the sampling rate to 16000 while keeping other parameters unchanged?

Miralan · 2020-12-25T04:54:39Z

Hi @Miralan,
Can I use the pre-train model if I change the sampling rate to 16000 while keeping other parameters unchanged?

If you want to use pretrain model, you can use pretrain model generate 22.05khz audio and resample it to 16khz.Directly changing sampling rate is useless

(according to jik876#12 (comment))

wizardk · 2021-04-27T05:44:13Z

Hi, thanks for sharing your impressive code.
I tried to apply hifigan on 16k data, with config:
"upsample_rates": [8,5,5],
"upsample_kernel_sizes": [16,10,10],
"segment_size": 6400,
"hop_size": 200,
"win_size": 800,
"sampling_rate": 16000,

And it reports error like:
Traceback (most recent call last):
File "train.py", line 271, in <module>
main()
File "train.py", line 267, in main
train(0, a, h)
File "train.py", line 149, in train
loss_fm_f = feature_loss(fmap_f_r, fmap_f_g)
File "models.py", line 255, in feature_loss
loss += torch.mean(torch.abs(rl - gl))
RuntimeError: The size of tensor a (1067) must match the size of tensor b (1068) at non-singleton dimension 2

Is there any wrong in the modified config? Is it padding related?

Hi @OnceJune , did you change the n_fft to 800 as same as win_size?

OnceJune · 2021-04-27T05:57:56Z

@wizardk no, I used hop_size 200, win_size 800, n_fft 1024, fft requires the input to be 2^n (which is 2^10 in this case), you might want to check https://en.wikipedia.org/wiki/Fast_Fourier_transform to find more details.

OnceJune closed this as completed Nov 16, 2020

jik876 mentioned this issue Dec 26, 2020

How to train a new model with 8k data #37

Closed

Numanor added a commit to thuhcsi/hifi-gan that referenced this issue Mar 5, 2021

slight model change for 16kHz data

7d23a45

(according to jik876#12 (comment))

MuruganR96 mentioned this issue Nov 22, 2022

Is there pretrained HIFI gan vocoder? yl4579/StarGANv2-VC#59

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] How to apply on 16k data? #12

[Question] How to apply on 16k data? #12

OnceJune commented Nov 16, 2020 •

edited

Miralan commented Nov 16, 2020

OnceJune commented Nov 16, 2020

Mingrg commented Dec 24, 2020 •

edited

Miralan commented Dec 24, 2020

Mingrg commented Dec 24, 2020

Miralan commented Dec 24, 2020

Mingrg commented Dec 24, 2020

Mingrg commented Dec 24, 2020

OnceJune commented Dec 24, 2020

Miralan commented Dec 24, 2020

Mingrg commented Dec 24, 2020

Approximetal commented Dec 25, 2020

Miralan commented Dec 25, 2020 •

edited

wizardk commented Apr 27, 2021

OnceJune commented Apr 27, 2021

[Question] How to apply on 16k data? #12

[Question] How to apply on 16k data? #12

Comments

OnceJune commented Nov 16, 2020 • edited

Miralan commented Nov 16, 2020

OnceJune commented Nov 16, 2020

Mingrg commented Dec 24, 2020 • edited

Miralan commented Dec 24, 2020

Mingrg commented Dec 24, 2020

Miralan commented Dec 24, 2020

Mingrg commented Dec 24, 2020

Mingrg commented Dec 24, 2020

OnceJune commented Dec 24, 2020

Miralan commented Dec 24, 2020

Mingrg commented Dec 24, 2020

Approximetal commented Dec 25, 2020

Miralan commented Dec 25, 2020 • edited

wizardk commented Apr 27, 2021

OnceJune commented Apr 27, 2021

OnceJune commented Nov 16, 2020 •

edited

Mingrg commented Dec 24, 2020 •

edited

Miralan commented Dec 25, 2020 •

edited