Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How many parameters? #6

Open
LifeIsStrange opened this issue Apr 13, 2022 · 4 comments
Open

How many parameters? #6

LifeIsStrange opened this issue Apr 13, 2022 · 4 comments

Comments

@LifeIsStrange
Copy link

Sorry to ask, but DALL-E v1 has 12 billions parameters, however it is unclear how many parameters has DALL-E v2.
I'm also wondering wether inference can be run on a single 3090 ti GPU or in other words, will consummers be able to use it on realistic hardware? If not then you should consider leveraging https://github.com/microsoft/DeepSpeed

@orenong
Copy link

orenong commented Apr 14, 2022

I don't know how many parameters it has, but there is no way it can run on a 3090ti, it only has 24GB of VRAM. Maybe maybe maybe maybe a100 with 80GB can

@LifeIsStrange
Copy link
Author

@orenong I was wondering how much could deepspeed lower the VRAM usage.
Also, RAM can be compressed https://en.m.wikipedia.org/wiki/Zswap can the same be achieved for VRAM?

@tcl9876
Copy link

tcl9876 commented Apr 16, 2022

According to the paper, the decoder is 3.5 billion parameters (Appendix C, table 3). It's 1.2B for the text model, and 2.3 for the vision model - which is not too bad actually. Then it seems like they also have 2 upsamplers (64^2-> 256^2 and 256^2-> 1024^2), which are both fewer parameters. The 64->256 one was 700M, and the 256->1024 was 300M. They also had two different models for the clip embedding prior, each about a billion.

I think if they do release it, you might actually be able to run it on a 3090 if you run each model once at a time and do a lot of other tricks to reduce RAM use.

@INF800
Copy link

INF800 commented Apr 19, 2022

Atleast this model fits in A100 with very less effort. If it were massive model like GPT3 or PaLM, doing research with it would have become next to impossible.

According to the paper, the decoder is 3.5 billion parameters (Appendix C, table 3). It's 1.2B for the text model, and 2.3 for the vision model - which is not too bad actually. Then it seems like they also have 2 upsamplers (64^2-> 256^2 and 256^2-> 1024^2), which are both fewer parameters. The 64->256 one was 700M, and the 256->1024 was 300M. They also had two different models for the clip embedding prior, each about a billion.

I think if they do release it, you might actually be able to run it on a 3090 if you run each model once at a time and do a lot of other tricks to reduce RAM use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants