You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am trying to implement a custom shard policy with different layer distribution, but it seems all built-in policies have the following inconsistent implementation:
In get_held_layers(), a policy uses self.distribute_layers() and self.get_stage_index(), which are customizable:
which distributes layers in a slightly different way: first stage has 4 more layers.
This leads the following error:
...
File "/usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py", line 312, in forward
query, key, value = self.c_attn(hidden_states).split(self.split_size, dim=2)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/transformers/pytorch_utils.py", line 107, in forward
x = torch.addmm(self.bias, x.view(-1, x.size(-1)), self.weight)
TypeError: addmm(): argument 'input' (position 1) must be Tensor, not NoneType
Environment
torch 2.1.0 + cu118
The text was updated successfully, but these errors were encountered:
🐛 Describe the bug
Hi, I am trying to implement a custom shard policy with different layer distribution, but it seems all built-in policies have the following inconsistent implementation:
In
get_held_layers()
, a policy usesself.distribute_layers()
andself.get_stage_index()
, which are customizable:ColossalAI/colossalai/shardformer/policies/gpt2.py
Lines 170 to 175 in 79718fa
But in
set_pipeline_forward()
, the policy usesPolicy.distribute_layers()
andPolicy.get_stage_index()
:ColossalAI/colossalai/shardformer/policies/gpt2.py
Lines 192 to 193 in 79718fa
which will raise an error during pipeline forward due to layer inconsistency if the functions are overridden.
How to reproduce
I tested with
examples/language/gpt/hybridparallelism/finetune.py
.For
hybrid_parallel
plugin, add a custom policy:which distributes layers in a slightly different way: first stage has 4 more layers.
This leads the following error:
Environment
torch 2.1.0 + cu118
The text was updated successfully, but these errors were encountered: