Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some notable differences with official implementation #17

Open
chapter544 opened this issue Oct 28, 2019 · 9 comments
Open

Some notable differences with official implementation #17

chapter544 opened this issue Oct 28, 2019 · 9 comments

Comments

@chapter544
Copy link

Hi,
Just FYI, in the official MelGan repo, the authors used Hinge losses. However, in the paper, the author described with L2 loss. This repo is consistent with the paper! I am setting up some experiments with the Hinge loss to see the differences. Another note is that the default length of the segment_length is 8912 in the official as well (vs 16k in this repo).

@seungwonpark
Copy link
Owner

seungwonpark commented Oct 28, 2019

Thanks for the information!

I could also observe the following differences:

  • slope of LeakyReLU: 0.2 (official) / 0.01 (here)
  • mel_fmax: 11025.0 (official) / 8000.0 (here) (discussed at mel_fmax does not cover all frequency #7)
  • padding: reflection pad (official) / zero pad (here)

@seungwonpark seungwonpark changed the title L2 vs Hinge Loss Some notable differences with official implementation Oct 28, 2019
@seungwonpark
Copy link
Owner

Working on this at fix/17.

@bob80333
Copy link

bob80333 commented Nov 16, 2019

A few things I noticed when looking:

  • Reflection padding only in generator and the first layer of the discriminator, not any of the other discriminator layers
  • Discriminator downsample was:
    nn.AvgPool1d(4, stride=2, padding=1, count_include_pad=False)
    not
    nn.AvgPool1d(kernel_size=4, stride=2, padding=2)
  • The shortcut connection for the resblocks have a convolution: Conv1d(256, 256, kernel_size=1)
  • The official generator has 4266050 parameters, on the master branch the generator here has 4262658 parameters, 3392 less (probably the shortcut convolutions)
  • The last convolution in the resblock has no padding, and has kernel_size=1 not 3
  • Reflection padding was not used for the Conv1dTranspose layers

@bob80333
Copy link

Working on this in my fork's fix/17, then will make a PR

@bob80333
Copy link

Changed everything to match, but for some reason, the official generator still has 1440 more params than this one.

@seungwonpark
Copy link
Owner

The weight norm for shortcut was missing. When I add them, the number of parameters became exactly the same as the official one: 4266050.

I just added a commit to fix that in fix/17 branch.

@seungwonpark
Copy link
Owner

Differences in a padding method look significant.
I'll be testing our fix/17 branch with both LJSpeech and private data.

@bob80333
Copy link

bob80333 commented Nov 23, 2019

I was doing a training run on part of voxceleb2 before you added the weight_norm to the shortcut, so I left it running, and the difference at 800K steps on that branch vs 1M steps at 1725a7508f4f3a2de0bbc0ec83d33deaa40c3255 (reached 3200 epochs commit on master branch) is quite large.

This is from the validation data, which are separate speakers in voxceleb2.

Original audio:
https://voca.ro/imS3kiSQ4ik

1M steps on older commit:
https://voca.ro/eOJv99KjtZO

800K steps on 1 commit old fix/17 branch:
https://voca.ro/fMTYWO4vUvX

I am now doing another run from the latest commit on fix/17 to see if the shortcut weight norm will help even more.

@nukes
Copy link

nukes commented Feb 23, 2021

what are the differences between reflected padding and zero padding in melgan ? better audio quality?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants