Detailed Training setting #26

xiuzbl · 2023-07-11T11:27:35Z

Hi, may you provide the detailed hyper-paramters when you training llama-13b? For example, how many and what kind of GPUs you use, what are the gradient accumulation steps and batch size per GPU? Moreover, when I directly use your deepspeed config setting to deepspeed-initialize a llama-7b on an 80G A100, the server reports CUDA OOM error.

Looking forward to your reply.

Thank you so much!

fahadh4ilyas · 2023-11-08T06:22:47Z

Are you still get OOM when fine tuning? I kept getting it because of the size of the optimizer.

xiuzbl closed this as completed Jul 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detailed Training setting #26

Detailed Training setting #26

xiuzbl commented Jul 11, 2023 •

edited

fahadh4ilyas commented Nov 8, 2023

Detailed Training setting #26

Detailed Training setting #26

Comments

xiuzbl commented Jul 11, 2023 • edited

fahadh4ilyas commented Nov 8, 2023

xiuzbl commented Jul 11, 2023 •

edited