This repository was archived by the owner on Oct 9, 2024. It is now read-only.

Description
I run bloom-ds-inference.py to inference bloom 176b, I successed run, and I set save_mp_checkpoint_path to deepspeed.init_inference for saving presharded tp checkpoints. Then I try to infer with setting tp_presharded_mode=True & load from the presharded tp checkpoints. The loading is faster but it OOM in deepspeed.init_inference.
How to reduce memory usage with fp16?
I use 16 * v100 (32G) GPUs, fp16
Thinks