Training plans? #17

nbardy · 2023-06-21T00:58:41Z

I've got a bunch of compute the next couple weeks and thinking to train this on LAION.

Wondering if there is any other training going on right now. Would hate to duplicate efforts too much.

lucidrains · 2023-06-21T13:28:16Z

@nbardy where do you have the compute from? you should join the LAION discord and check to see first

i will be finishing the unconditional training code this week for starters, before the entire training code by end of month

nbardy · 2023-06-21T18:50:18Z

512 TPUv4 from a google startup grant.

Didn't get any response in LAION when I asked. Looks like nothing going on yet.

lucidrains · 2023-06-21T19:19:17Z

ohh sweet, though you probably should do it in jax? or has the state of pytorch xla improved?

lucidrains · 2023-06-21T20:07:54Z

are you doing a startup? or working for a new one?

francqz31 · 2023-06-23T15:12:43Z

@nbardy I think you should just train it for the Super Resolution Upsampling Task 128px to 4k Which is the highlight of the paper. Gigagan's text to image is kinda meh and not good nor impressive.

What's impressive and holds the current SOTA in text to Image is this project https://raphael-painter.github.io/ it even beats midjouney v5.1 and is competitive with 5.2v and has Efficient finetuning
lucid might implement raphael and you might train it that would be a far better idea than wasting all that compute on nothing.

lucidrains · 2023-06-23T15:18:49Z

@francqz31 oh nice, wasn't aware of raphael. there is no implementation yet?

lucidrains · 2023-06-23T15:22:56Z

@francqz31 i see, they just added a ton of mixture of experts. i have been meaning to open source ST-MoE for language modeling front, so maybe this is good timing. also have a few ideas for improving PKM

francqz31 · 2023-06-23T15:37:32Z

@lucidrains Nope there isn't , I asked one of the authors he said something about releasing an api or something but they will not open source it that's 100% for sure. downside of an api that i don't think it will have fine-tuning. but yeah overall they trained it on 1000 A100s for 2 months straight , if you implement it and nbardy trains it. it will be a huge leap in the opensource community.

lucidrains · 2023-06-23T15:39:47Z

@francqz31 i haven't dived into the paper yet, but i think there's basically nothing to it besides adding MoE and some hand wavy stuff about each expert being 'painters'. i just need to do to mixture-of-experts what i did to attention, and both language and generative image / videos will naturally improve if one replaces the feedforwards with them

lucidrains · 2023-06-23T15:40:15Z

@francqz31 it was on my plate anyways, since we now know GPT4 uses mixture of experts

lucidrains · 2023-06-23T15:41:07Z

@francqz31 do correct me if i'm wrong about that paper. i will get around to reading it (too much in the queue)

francqz31 · 2023-06-23T16:16:35Z

@lucidrains that's my pleasure I indeed will , I even took some prompts of raphael and I compared it with midjourney v5.2 , it is almost the same if not even better , But in the paper they compare with v5.1
like this for example with 5.1v

prompts by order:

A cute little matte low poly isometric cherry blossom forest island, waterfalls, lighting, soft shadows, trending on
Artstation, 3d render, monument valley, fez video game
A shanty version of Tokyo, new rustic style, bold colors with all colors palette, video game, genshin, tribe, fantasy,
overwatch.
Cartoon characters, mini characters, figures, illustrations, flower fairy, green dress, brown hair, curly long hair, elf-like
wings, many flowers and leaves, natural scenery, golden eyes, detailed light and shadow , a high degree of detail.
Cartoon characters, mini characters, hand-made, illustrations, robot kids, color expressions, boy, short brown hair, curly
hair, blue eyes, technological age, cyberpunk, big eyes, cute, mini, detailed light and shadow, high detail.

lucidrains · 2023-06-23T16:20:34Z

@francqz31 cool! yea, i guess this is yet another testament to using mixture-of-experts or conditional computation modules

nbardy · 2023-06-23T16:31:40Z

Definitely most interested in training the upscaler.

@lucidrains do you have an idea how much work is left for the upscaler code? Looking at the paper it seems pretty similar to the base unconditioned model with some tweaks.

although the paper is light on details about the upscaler

I’m still at the same startup, Facet.

Talking to the Google team and they said the performance is very similar between PyTorch and Jax now.

nbardy · 2023-06-23T16:33:32Z

@francqz31 thanks for sharing, too much work to implement and train a new model architecture on a short timeline. Raphael does look quite interesting, although expensive to run inference with MoE.

particularly interested in the openMUSe training going on.

francqz31 · 2023-06-23T16:36:25Z

@nbardy no problems don't feel any pressure , Dr. phil might just implement it and leave it for the open source community. if any one else is interested. someone will be hopefully.

francqz31 · 2023-06-23T16:45:05Z

it is more than enough that you are willing to train the Upsampler. it is not an easy work. plus it is the most important thing in the paper.

lucidrains · 2023-06-23T16:46:42Z

@nbardy i'll get to it soon, but like anything in open source, no promises on timeline

@francqz31 oh please, don't address me that way. got enough of that in med school

nbardy · 2023-06-23T16:49:46Z

Happy to jump in and help.

How up to date is the TODO list? You mentioned there is some work left on the unconditioned model code still.

lucidrains · 2023-06-23T17:23:46Z

@nbardy yea, the plan of attack was going to be to wire up hf accelerate for unconditional, following their example here, then move on to conditional, before finally tackling the upsampler modifications

lucidrains · 2023-06-23T17:26:05Z

@nbardy are you planning on open sourcing the final model, or is this for commercial purposes for Facet?

francqz31 · 2023-06-23T18:02:30Z

@francqz31 do correct me if i'm wrong about that paper. i will get around to reading it (too much in the queue)

Ok here is a quick thing that I hacked Because I read the paper before.

To implement the RAPHAEL model described in this paper, here are the main steps they used:

1-Data collection and preprocessing
*They Collect a large-scale dataset of text prompt-image pairs. This paper uses LAION-5B of course and some internal datasets.
*They Preprocess the images and text by removing noise, resizing images, etc.
2-Model architecture
*The model is based on a U-Net architecture with 16 transformer blocks.
*Each block contains:
*A self-attention layer
*A cross-attention layer with textPrompt
*A space-Mixture-of-Experts (space-MoE) layer
*A time-Mixture-of-Experts (time-MoE) layer
*An edge-supervised learning module
3-Space-MoE
*The space-MoE layer uses experts to model the relationship between text tokens and image regions.
*A text gate network is used to assign text tokens to experts.
*A thresholding mechanism is used to determine the correspondence between text tokens and image regions.
*There are 6 space experts in each of the 16 transformer blocks.
4-Time-MoE
*The time-MoE layer uses experts to handle different diffusion timesteps.
*A time gate network is used to assign timesteps to experts.
*There are 4 time experts.
5-Edge-supervised learning. Add an edge detection module to extract edges from the input image.
Supervise the model using these edges and a focal loss. Pause edge learning after a certain timestep threshold.
6-Training
*They Use the AdamW optimizer with learning rate 1e-4.
*They Train for 2 months on 1000 GPUs with a batch size of 2000, Warmup steps 20000.
*They Combine a denoising loss and an edge-supervised loss.
*Optional: Use LoRA, ControlNet or SR-GAN for additional controls or higher resolution.
*They use a private tailormade SR-GAN model too I think not the public one but that can be replaced by the Gigagan upsampler ;).

lucidrains · 2023-06-23T18:06:30Z

@francqz31 thanks for the rundown!

yea, there is nothing surprising then. mostly more attention (transformer blocks), and the experts per diffusion timesteps goes back to eDiff from Balaji et al

the application of space and time MoE seems to be the main novelty, but that in itself is just porting over lessons from LLM

nbardy · 2023-06-23T20:26:46Z

@nbardy are you planning on open sourcing the final model, or is this for commercial purposes for Facet?

Got the all clear to open source the weights.

Might finetune on some proprietary data. But the base model trained on LAION we'd release.

lucidrains · 2023-06-23T20:47:12Z

@nbardy awesome! i will prioritize this! expect me to power through it this weekend

nbardy · 2023-06-23T20:51:45Z

🥳

lucidrains · 2023-06-26T15:22:45Z

didn't get to it this weekend 😢 caught up with some TTS work and Pride celebrations

going to work on it this morning!

lucidrains · 2023-06-26T15:23:41Z

@nbardy the upsampler is nothing more than a unet with some high resolution downsampling layers removed, should be straightforward!

lucidrains · 2023-06-27T18:20:12Z

ok, got the unet upsampler to a decent place, will move onwards to unconditional training tomorrow, and by week's end, conditional + unet upsampler training

nbardy · 2023-06-28T00:27:25Z

Exiting progress.

Trying to start some jobs this week and there is no actually available TPUv4. We have the quota but the LLMs teams must be taking them all. Yet to see if we actually have compute :( or if it's a mirage.

Probably willing to pay to scale up a smaller version of this. It looks like the compute budget isn't too high for the upscaler.

lucidrains · 2023-07-18T02:33:00Z

@francqz31 nice find!

nbardy · 2023-07-18T06:28:00Z

I was not able to find the t_local and t_global sizes in the paper.

nbardy · 2023-07-18T07:12:38Z

Reading through training details. Some notes on datasets and models size from the paper.

with the exception of the 128-to 1024 upsampler model trained on Adobe’s internal Stock
images.

That is the 8x upsampler that gives the stunning results in the paper.

Unfortunately it's hyper-parameters are not in the paper, but I imagine it would be about the same size maybe a little deeper to get some higher resolution features. Should take less compute than the text conditioned upscalers.

Also interesting

Additionally, we train a separate 256px class-conditional upsampler model and combine them with end-to-end finetuning stage.

Does this mean training the text->image and upsampler models in series for fine tuning, I hadn't noticed before.

lucidrains · 2023-07-18T18:05:05Z

ok, finished the text-conditioning logic for both base and upsampler

going to start wiring up accelerate probably this afternoon (as well as some hparams for more efficient recon and multi-scale losses)

lucidrains · 2023-07-18T18:07:35Z

will also aim to get the eval for both base and upsampler done, using what @CerebralSeed pull requested as a starting point. then we can see the GAN working for some toy datasets for unconditional training

lucidrains · 2023-07-18T18:57:14Z

@nbardy or were you planning on doing the distributed stuff with accelerate + ray today? just making sure no overlapping work

nbardy · 2023-07-19T02:48:10Z

Thanks for all the great work.

I'm happy to take the distributed stuff from here. Was hoping to have a distributed run going today on the cluster, but just got a single chip running
I have a couple different training scripts on my fork one of them uses ray and accelerate.

Just got a webdataset script working with the upsampler on the TPU chip. Was surprisingly a pain debugging webdataset pipe errors and setting up credentials.

lucidrains · 2023-07-19T15:26:04Z

@nbardy yea no problem, i know how it is. things are never straightforward in software

@CerebralSeed pull requested the sampler script and validated that the upsampler works! that should unblock you for your work

i'm going to give accelerate integration (sans ray, since i'm not familiar with it) a try today

nbardy · 2023-07-19T23:39:49Z

Learning on the accelerated chips finally! Remarkably good results for 40 steps in. Last time I trained a GAN was a very long time ago.

Losses look stable.

Looking at the XLA docs trying to figure out what the best way to network this is with tpus. Might just drop ray 🤔 already checkpointing and tracking runs with WB.

https://wandb.ai/nbardy-facet/gigagan/runs/zv9004dr?workspace=user-nbardy-facet

nbardy · 2023-07-20T02:38:05Z

Got started on XMP today. It’s getting stuck on step 1 . Most likely more device errors

nbardy · 2023-07-20T02:38:32Z

Accelerate was giving bad crashes. Probably incompatible.

nbardy · 2023-07-20T02:41:12Z

I will talk more with Google tomorrow. They will mostly likely be able to help me sort this out end of day tomorrow.

lucidrains · 2023-07-20T13:26:38Z

@nbardy good to see some progress on your end!

for me, i was stuck on a bug in the base generator architecture, but finally got it working before bedtime

i'm going to wire up accelerate this morning (this time for real lol) and try out that vision aided discriminator loss

nbardy · 2023-07-20T18:04:28Z

Training across 16 chips with XLA/XMP.

Logs(Currently very slow because XLA is compiling the first steps and debug mode is on)

nbardy · 2023-07-20T18:09:49Z

And they all crash at 30 minutes :(

lucidrains · 2023-07-20T19:33:30Z

And they all crash at 30 minutes :(

haha yea, expected this to be not that mature

they are basically exchanging free compute for free QA

today was much smoother sailing for me; accelerate and mixed precision is working for multi-gpu on my one machine!

randintgenr · 2023-07-21T02:29:42Z

Hi Phil,

I have been using your implementation and noted that subpixel upsampling is giving me a lower generative performance.

It is introducing checkerboard artifacts that negatively affect the quality of the generated images. To address this, I have experimented with replacing subpixel convolution with Bilinear Upsampling, and it has yielded better results.

Also, the StyleGAN generator relies on maintaining unit variance for its feature activations for effective style mixing. It is unclear if the subpixel upsampling still leads to activations that are unit variance.

lucidrains · 2023-07-21T15:50:52Z

Hi Phil,

I have been using your implementation and noted that subpixel upsampling is giving me a lower generative performance.

It is introducing checkerboard artifacts that negatively affect the quality of the generated images. To address this, I have experimented with replacing subpixel convolution with Bilinear Upsampling, and it has yielded better results.

Also, the StyleGAN generator relies on maintaining unit variance for its feature activations for effective style mixing. It is unclear if the subpixel upsampling still leads to activations that are unit variance.

hey yup! i was actually going to offer this as an option as i noticed the same

defaulted it to bilinear upsample for now, controllable with this option

lucidrains · 2023-07-21T16:01:28Z

@randintgenr are you a computer vision researcher?

lucidrains · 2023-07-22T19:09:03Z

almost done with the entire training code

lucidrains · 2023-07-23T17:25:10Z

ok, i think it is done, save for a few edge cases and cleanup

going to wind down work on this repo next week and move back to video gen

lucidrains · 2023-07-24T16:23:18Z

closing, as code is there, and I know of a group moving forward with training already

anandbhattad · 2023-07-27T20:24:42Z

Hey @lucidrains, have you heard anything about a timeline for the group that's currently training GigaGAN? I'd appreciate any information you have. Thank you!

lucidrains · 2023-07-27T20:31:15Z

@anandbhattad yea they have proceeded, but this group will not be doing it open sourced

anandbhattad · 2023-07-27T20:46:38Z

@lucidrains, I appreciate your response. I was wondering if you knew the necessary computing power for training on the LIAON-5B dataset. The paper lacks clear information on compute and time requirements for training the model (Table A2 is ambiguous). As I only have academic compute access, I am interested in exploring whether GigaGAN utilizes familiar rendering elements such as normals and depth like we demonstrated in StyleGAN-2. Here's the link for more information: https://arxiv.org/abs/2306.00987

CerebralSeed · 2023-07-28T13:20:27Z

@nbardy would greatly appreciate if you're able to share what image size and other settings you use, if you get anything that works at a size larger than 128px. TIA

davizca · 2023-12-19T12:43:43Z

@lucidrains I'm pretty sure that group is this one:
https://magnific.ai/

Or at least it seems so. If I had money and anything more than 24 GB VRAM I will train this but is impossible for me, haha.

topological-modular-forms · 2024-01-25T18:27:28Z

@nbardy Hi Nicholas! Do you still plan to train this model on LAION, or have any updates regarding it?

CerebralSeed mentioned this issue Jul 19, 2023

Add save and sampling #23

Closed

lucidrains closed this as completed Jul 24, 2023

Training plans? #17

Training plans? #17

Comments

nbardy commented Jun 21, 2023

lucidrains commented Jun 21, 2023

nbardy commented Jun 21, 2023

lucidrains commented Jun 21, 2023

lucidrains commented Jun 21, 2023

francqz31 commented Jun 23, 2023

lucidrains commented Jun 23, 2023 • edited Loading

lucidrains commented Jun 23, 2023 • edited Loading

francqz31 commented Jun 23, 2023

lucidrains commented Jun 23, 2023 • edited Loading

lucidrains commented Jun 23, 2023

lucidrains commented Jun 23, 2023

francqz31 commented Jun 23, 2023

lucidrains commented Jun 23, 2023

nbardy commented Jun 23, 2023

nbardy commented Jun 23, 2023

francqz31 commented Jun 23, 2023

francqz31 commented Jun 23, 2023

lucidrains commented Jun 23, 2023

nbardy commented Jun 23, 2023

lucidrains commented Jun 23, 2023

lucidrains commented Jun 23, 2023

francqz31 commented Jun 23, 2023

lucidrains commented Jun 23, 2023 • edited Loading

nbardy commented Jun 23, 2023

lucidrains commented Jun 23, 2023

nbardy commented Jun 23, 2023

lucidrains commented Jun 26, 2023

lucidrains commented Jun 26, 2023

lucidrains commented Jun 27, 2023

nbardy commented Jun 28, 2023

lucidrains commented Jul 18, 2023

nbardy commented Jul 18, 2023

nbardy commented Jul 18, 2023

lucidrains commented Jul 18, 2023

lucidrains commented Jul 18, 2023

lucidrains commented Jul 18, 2023

nbardy commented Jul 19, 2023

lucidrains commented Jul 19, 2023 • edited Loading

nbardy commented Jul 19, 2023

nbardy commented Jul 20, 2023

nbardy commented Jul 20, 2023

nbardy commented Jul 20, 2023

lucidrains commented Jul 20, 2023

nbardy commented Jul 20, 2023

nbardy commented Jul 20, 2023

lucidrains commented Jul 20, 2023

randintgenr commented Jul 21, 2023

lucidrains commented Jul 21, 2023

lucidrains commented Jul 21, 2023

lucidrains commented Jul 22, 2023

lucidrains commented Jul 23, 2023 • edited Loading

lucidrains commented Jul 24, 2023

anandbhattad commented Jul 27, 2023

lucidrains commented Jul 27, 2023

anandbhattad commented Jul 27, 2023

CerebralSeed commented Jul 28, 2023

davizca commented Dec 19, 2023

topological-modular-forms commented Jan 25, 2024

lucidrains commented Jun 23, 2023 •

edited

Loading

lucidrains commented Jun 23, 2023 •

edited

Loading

lucidrains commented Jun 23, 2023 •

edited

Loading

lucidrains commented Jun 23, 2023 •

edited

Loading

lucidrains commented Jul 19, 2023 •

edited

Loading

lucidrains commented Jul 23, 2023 •

edited

Loading