-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How many parameters? #6
Comments
I don't know how many parameters it has, but there is no way it can run on a 3090ti, it only has 24GB of VRAM. Maybe maybe maybe maybe a100 with 80GB can |
@orenong I was wondering how much could deepspeed lower the VRAM usage. |
According to the paper, the decoder is 3.5 billion parameters (Appendix C, table 3). It's 1.2B for the text model, and 2.3 for the vision model - which is not too bad actually. Then it seems like they also have 2 upsamplers (64^2-> 256^2 and 256^2-> 1024^2), which are both fewer parameters. The 64->256 one was 700M, and the 256->1024 was 300M. They also had two different models for the clip embedding prior, each about a billion. I think if they do release it, you might actually be able to run it on a 3090 if you run each model once at a time and do a lot of other tricks to reduce RAM use. |
Atleast this model fits in A100 with very less effort. If it were massive model like GPT3 or PaLM, doing research with it would have become next to impossible.
|
Sorry to ask, but DALL-E v1 has 12 billions parameters, however it is unclear how many parameters has DALL-E v2.
I'm also wondering wether inference can be run on a single 3090 ti GPU or in other words, will consummers be able to use it on realistic hardware? If not then you should consider leveraging https://github.com/microsoft/DeepSpeed
The text was updated successfully, but these errors were encountered: