ds_inference success but OOM when use tp_presharded_mode=True

I run bloom-ds-inference.py to inference bloom 176b, I successed run, and I set save_mp_checkpoint_path to deepspeed.init_inference for saving presharded tp checkpoints. Then I try to infer with setting tp_presharded_mode=True & load from the presharded tp checkpoints. The loading is faster but it OOM in deepspeed.init_inference.

How to reduce memory usage with fp16?
I use 16 * v100 (32G) GPUs, fp16

Thinks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ds_inference success but OOM when use tp_presharded_mode=True #92

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ds_inference success but OOM when use tp_presharded_mode=True #92

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions