Give a better solution to manually calculated vLLM sharding logic in sharding.py

vLLM emits a fqn -> tensors state-dict.
This state-dict does not contains the sharding details of the  tensor to be loaded. ( It does not have the start of the shard index).
What we do right now, is to manually calculate the sharding details in sharding::_calculate_tensor_shard().

This is a fragile code ( doing outside vLLM sharder + manual routine).
Ideally we should get the sharding info when we call the model.state_dict() of vLLM loaded model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Give a better solution to manually calculated vLLM sharding logic in sharding.py #174

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Give a better solution to manually calculated vLLM sharding logic in sharding.py #174

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions