Multi-LoRA SFT support FSDP2 #155
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements FSDP2 support for MultiLoraTransformersModel by integrating it into the shared strategy and lazy-wrap lifecycle and introducing sharding-aware parameter access helpers. Review feedback identifies critical bugs in the distributed tensor handling: _write_param_tensor may incorrectly double-shard local data, set_state_dict risks shape mismatches when applying global state to local shards, and get_state_dict returns sharded tensors that could lead to corrupt checkpoints. Furthermore, the model's initialization should be refactored to properly use the parent class, and internal imports should be moved to the module level.
|
I'd love to have this feature! Just curious — why was this PR changed to draft? Any other plans in the works? |
cause working in progress, when finished, will merge into branch main |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request implements distributed training support for MultiLora, introducing helper methods for tensor sharding and updating model lifecycle methods to handle distributed contexts. The review identified critical issues where local shards are incorrectly processed as global tensors, potentially leading to corrupted weights during sharding and incomplete state dicts. Feedback emphasizes the need to gather tensors before saving or returning them to ensure compatibility with standard loaders and correct distributed behavior.
PR type
PR information
Multi-LoRA support FSDP2