Skip to content

Give a better solution to manually calculated vLLM sharding logic in sharding.py #174

@pradeepfn

Description

@pradeepfn

vLLM emits a fqn -> tensors state-dict.
This state-dict does not contains the sharding details of the tensor to be loaded. ( It does not have the start of the shard index).
What we do right now, is to manually calculate the sharding details in sharding::_calculate_tensor_shard().

This is a fragile code ( doing outside vLLM sharder + manual routine).
Ideally we should get the sharding info when we call the model.state_dict() of vLLM loaded model.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions