Replies: 2 comments 3 replies
-
Beta Was this translation helpful? Give feedback.
2 replies
-
Hi @jucic , thanks for sharing and discussion. I'm curious about what training script are you using during training? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I trained pixart alpha on my own dataset(about 30M),It is normal when trained on reso 256256, however, when trained on reso 512512 and 1024*1024, sometimes the image generated by the pipline are fully black image, such as the image below, I generate 4 image for each prompt, sometimes four image are all black, sometimes, one or two of them are black. Anybody knows the reason? I apply my own model by converting it to diffusers format and replace the transformer folder in official hugging face repo. And the loss seems normal when training on all resolution.
![image](https://private-user-images.githubusercontent.com/17615552/329108450-232bd70f-e2f3-48ef-be78-1195096b9dfa.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAwNDU0NzEsIm5iZiI6MTcyMDA0NTE3MSwicGF0aCI6Ii8xNzYxNTU1Mi8zMjkxMDg0NTAtMjMyYmQ3MGYtZTJmMy00OGVmLWJlNzgtMTE5NTA5NmI5ZGZhLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzAzVDIyMTkzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWFmMTg4OTNkMzJjYTRmMmU5OGMxN2M5MWE5YTQ0ZmRlNzVjZDA2NzI5YmUxNWNiOTgxZDI0MzUxY2Q1YzE2OWImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.b3E2U9nlyIKlrN-tJD_ko5FZk9CvIDAnQ5zjgU1EMlk)
update: please see the screenshot below, I found in PixArtAlphaPipeline, the initial latents and the noise_pred are normal at the beginning 2 or 3 denoise steps, however, after 2 or 3 denoise steps, it begins to generate NAN in noise_pred predicted by self.transformer, and then the same in the latents. It seems to be numerical instability in self.transformer model(or the diffusion transformer), how to avoid this? for now, it only happens when training for many epoches, such as 200 epoches.
![TcRbLQiYCu](https://private-user-images.githubusercontent.com/17615552/329746070-fa077b56-d31c-48d6-b29c-a4005711d045.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjAwNDU0NzEsIm5iZiI6MTcyMDA0NTE3MSwicGF0aCI6Ii8xNzYxNTU1Mi8zMjk3NDYwNzAtZmEwNzdiNTYtZDMxYy00OGQ2LWIyOWMtYTQwMDU3MTFkMDQ1LmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MDMlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzAzVDIyMTkzMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTgxMmRlNjc1MTI1OTc5MTBlZDZmZWY0MDgxOTE0N2MxMGQ0MjgxNWI2YTk3MjdhMDk1ZTU4ZDk2NGM0NWEyZTkmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.90JgceF8eiHYjQv63IzPrdio7Lsrp0mqKygIb0iwDSU)
Beta Was this translation helpful? Give feedback.
All reactions