Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frozen parameters in GaussianFourierProjection #166

Closed
vvvm23 opened this issue Aug 11, 2022 · 5 comments
Closed

Frozen parameters in GaussianFourierProjection #166

vvvm23 opened this issue Aug 11, 2022 · 5 comments

Comments

@vvvm23
Copy link
Contributor

vvvm23 commented Aug 11, 2022

Hi, just a beginner with diffusion models and have been using your implementations as reference. I have a question about this class

Why is requires_grad set to false in the weight parameter? Won't this mean, during training, the noise level embeddings won't be updated?

Thanks!

@patil-suraj
Copy link
Contributor

cc @patrickvonplaten

@patrickvonplaten
Copy link
Contributor

Hey @vvvm23,

It's set to False because we don't want to train those parameters. I followed the implementaton of the original model here: https://github.com/yang-song/score_sde_pytorch/blob/1618ddea340f3e4a2ed7852a0694a809775cf8d0/models/layerspp.py#L37

Does this make sense?

@vvvm23
Copy link
Contributor Author

vvvm23 commented Aug 26, 2022

Hi @patrickvonplaten

I somewhat misphrased my original question, I'm aware setting requires_grad to False prevents that particular parameter from accumulating gradients, essentially stopping the training of those parameters.

But why would we not want to train the noise level embeddings? Or is this just a simple, fixed (albeit randomly initialised) projection from a per-batch noise value to a different space, which would later have some learned transformation applied to it?

Thanks!

@patrickvonplaten
Copy link
Contributor

Hey @vvvm23,

sinusoidal position features like GaussianFourierProjection don't need training because every embedding already has a distinctly different vector that the model can use a "cue" to know what time position has been passed to it.

If one wants to train position embedding vectors (or time embedding vectors here), one can just randomly initialize such a vector and let the model learn it. If however we use sinusoidal embeddings, there is no need to learn it

@vvvm23
Copy link
Contributor Author

vvvm23 commented Aug 31, 2022

Okay thank you @patrickvonplaten ! That explanation makes a lot of sense~

@vvvm23 vvvm23 closed this as completed Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants