Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusion about Deepspeed Inference #879

Open
ZekaiGalaxy opened this issue Mar 25, 2024 · 1 comment
Open

Confusion about Deepspeed Inference #879

ZekaiGalaxy opened this issue Mar 25, 2024 · 1 comment

Comments

@ZekaiGalaxy
Copy link

Hi, I read the deepspeed docs and have the following confusion:

(1) What's the difference between these methods when in inferencing LLMs?

a. deepspeed.initialize and then write code to generate text

b. deepspeed.init_inference then write code to generate

c. use mii to inference

(2) Which of them are friendly for memory? For example, I want to inference 70b models, which of them support model parallelism that separates model parameters across gpus?

(3) For inference, what's the best practice now for inferencing 70b llama?

a. zero3 + cpu offload (1*a100)

b. zero3 (2*a100)

...

Thank you!

@teis-e
Copy link

teis-e commented May 17, 2024

Hello, did you find an answer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants