-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] apply_tensor_parallelism() is not executed in Zero3 without self.mpu #4080
Comments
I meet same issue, too. I try to call apply_tensor_parallelism(), I get wrong attention tensor size, too I find auto_tp.py that tensor parallel copy to original size not tp tensor size, |
I meet same issue, too. I disable if Log output is │ /opt/conda/lib/python3.8/site-packages/deepspeed/ops/transformer/inference/ds_attention.py:117 │ |
@hxdtest The size [0] tensor seems caused by not properly setting tp weight.
The original code does not make attn weight stays after |
Describe the bug
In Hybrid Engine, the
apply_tensor_parallelism()
is not called when model inference container requires tp > 1 but self.mpu is None. For example, for a large model in Zero3, theapply_tensor_parallelism()
is not called.Log output
To Reproduce
DeepSpeed/deepspeed/runtime/hybrid_engine.py
Line 206 in a7fe3bc
To reproduce - call any large model, say Llama 30b using Zero 3 + Hybrid Engine.
The text was updated successfully, but these errors were encountered: