Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vqgan training #52

Open
wants to merge 42 commits into
base: main
Choose a base branch
from
Open

Vqgan training #52

wants to merge 42 commits into from

Conversation

isamu-isozaki
Copy link
Collaborator

This is a draft pr for adding the vqgan training. It's still quite rough around the edges but might be able to do ok after some bug fixes.

@isamu-isozaki
Copy link
Collaborator Author

Tested out on random noise and it runs. I'll try adapting to webdataset on some clusters and see how it does!

@isamu-isozaki
Copy link
Collaborator Author

I found https://arxiv.org/abs/2212.03185 thanks to Laion(Ryu) which improves on movq.
The main ideas are

  1. Add in perceptual loss from lower layers(which we are already doing)
  2. entropy maximization so the codebook usage is 100%

@isamu-isozaki
Copy link
Collaborator Author

I'm starting to add the projected gan technique from here. This seems to still have state-of-the-art in quite a few datasets although it is from 2021. The main idea is instead of plugging in images to the generator/discriminator, plugging in timm computed hierarchical features which makes training converge faster.

@isamu-isozaki
Copy link
Collaborator Author

isamu-isozaki commented May 8, 2023

Other news is I was finally able to add the imagenet training dataset to the cluster so I will be testing the movq/spectral norm added f16 pre-trained model soon

@isamu-isozaki
Copy link
Collaborator Author

I'll add Finite Scalar Quantization: VQ-VAE Made Simple since that seems very interesting. It seems to lead to Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation which has a better fid than diffusion models seems like

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant