I trained pixart alpha on my own dataset(about 10M)，It is normal when trained on reso 256256, however, when trained on reso 512512 and 1024*1024, #142

jucic · 2024-05-09T02:42:32Z

jucic
May 9, 2024

I trained pixart alpha on my own dataset(about 30M)，It is normal when trained on reso 256256, however, when trained on reso 512512 and 1024*1024, sometimes the image generated by the pipline are fully black image, such as the image below, I generate 4 image for each prompt, sometimes four image are all black, sometimes, one or two of them are black. Anybody knows the reason? I apply my own model by converting it to diffusers format and replace the transformer folder in official hugging face repo. And the loss seems normal when training on all resolution.

update: please see the screenshot below, I found in PixArtAlphaPipeline, the initial latents and the noise_pred are normal at the beginning 2 or 3 denoise steps, however, after 2 or 3 denoise steps, it begins to generate NAN in noise_pred predicted by self.transformer, and then the same in the latents. It seems to be numerical instability in self.transformer model(or the diffusion transformer), how to avoid this? for now, it only happens when training for many epoches, such as 200 epoches.

jucic · 2024-05-09T02:59:47Z

jucic
May 9, 2024
Author

another issue is that the aesthetic degree of the generated images are low when training on my own hq dataset(about 700w midjourney data, I trained it using official PixArt-XL-2-256x256.pth as initial weights) on 256*256 reso,for example, for prompt ‘a Chinese young man', official PixArt-XL-2-256x256.pth generates handsome man's image like this:

my model finetuned with 700w midjourney data generates ugly image like this:

Anybody knows the reason? thanks ahead!
Is it because the 700w midjourney data are much worse than your 1400 internal data? by the way, what the source of your 1000w internal data(except for the 400w journeydb data)? Is it filtered from laion2b or any other dataset?

2 replies

lawrence-cj May 17, 2024
Maintainer

What kind of dataset are you using? Are you willing to share a few samples for reference? It seems indeed a degration of performance after training.

jucic May 20, 2024
Author

Firstly trained with 256 reso with dataset about 3700w (laion2000w+midjourney700w+sam1000w), then trained with 256 and 512 and 1024 reso in order with the same hq data(midjourney 300w+journeydb400w mentioned in the pixart alpha paper+ 400w high quality laion data filtered from laion2b )

lawrence-cj · 2024-05-17T03:32:13Z

lawrence-cj
May 17, 2024
Maintainer

Hi @jucic , thanks for sharing and discussion. I'm curious about what training script are you using during training?

1 reply

jucic May 20, 2024
Author

training script:https://github.com/PixArt-alpha/PixArt-alpha/blob/master/train_scripts/train.py
config file:https://github.com/PixArt-alpha/PixArt-alpha/blob/master/configs/pixart_config/PixArt_xl2_img1024_internalms.py
I found in PixArtAlphaPipeline, torch_dtype=torch.float32 generates normal images while torch_dtype=torch.float16 generates full black image sometimes, it seems to be problem of numerical overflow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PixArt

I trained pixart alpha on my own dataset(about 10M)，It is normal when trained on reso 256256, however, when trained on reso 512512 and 1024*1024, #142

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

PixArt

I trained pixart alpha on my own dataset(about 10M)，It is normal when trained on reso 256*256, however, when trained on reso 512*512 and 1024*1024, #142

jucic May 9, 2024

Replies: 2 comments · 3 replies

jucic May 9, 2024 Author

lawrence-cj May 17, 2024 Maintainer

jucic May 20, 2024 Author

lawrence-cj May 17, 2024 Maintainer

jucic May 20, 2024 Author

I trained pixart alpha on my own dataset(about 10M)，It is normal when trained on reso 256256, however, when trained on reso 512512 and 1024*1024, #142

jucic
May 9, 2024

Replies: 2 comments 3 replies

jucic
May 9, 2024
Author

lawrence-cj May 17, 2024
Maintainer

jucic May 20, 2024
Author

lawrence-cj
May 17, 2024
Maintainer

jucic May 20, 2024
Author