🚨🚨🚨 [Trainer] Default to FSDP2, simplify API around fsdp + fsdp_config#45640
Open
🚨🚨🚨 [Trainer] Default to FSDP2, simplify API around fsdp + fsdp_config#45640
Conversation
- TrainingArguments.fsdp is now a boolean on-switch. String / list values are still accepted for backward compatibility but are translated into fsdp_config entries (and emit a deprecation warning). - fsdp_config now defaults to FSDP2 (version=2). FSDP1 usage still works but logs a deprecation warning (to be removed in v5.20). - New fsdp_config keys: auto_wrap_policy (default TRANSFORMER_BASED_WRAP), cpu_offload, state_dict_type (default FULL_STATE_DICT so trainer.save_model() produces an HF-compatible checkpoint out of the box). - FSDP1-only handling (forward_prefetch, backward_prefetch, use_orig_params, sync_module_states, and the string form of reshard_after_forward) is now isolated in a single branch for easy removal once v1 support is dropped. Legacy `fsdp` string/list parsing lives in a single _apply_legacy_fsdp_to_config helper for the same reason. - Trainer reads args.fsdp / args.fsdp_config guarded so that running without FSDP (or via `accelerate launch` with no transformers-side config) no longer crashes on None. - Docstring rewritten around fsdp_config; FSDP2-only keys surfaced, FSDP1-only keys (sync_module_states, use_orig_params, limit_all_gathers) dropped from the public docstring. Verified on the repro from PR #42521: TrainingArguments(fsdp=True, fsdp_config={"fsdp_version": 2, "reshard_after_forward": True}) now produces fsdp_version=2 and reshard_after_forward=True.
Member
Author
|
@bot /style |
- `args.fsdp` is already a bool after `_process_fsdp_args`, so wrapping it with `bool(...)` in trainer.py was redundant. - Remove the explanatory comment above `state_dict_type` default; the docstring already covers it.
Drop the post-accelerator `setattr` loop that re-pushed these two keys onto the already-constructed FSDP plugin. They are now forwarded during plugin construction via `fsdp_plugin_args`: - `activation_checkpointing` is shared between FSDP1 and FSDP2. - `limit_all_gathers` is FSDP1-only (obsolete in FSDP2), so it lives in the v1 branch. The `activation_checkpointing` + `gradient_checkpointing` conflict check stays in trainer.py (still needs the post-plugin state to compare).
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
stevhliu
approved these changes
Apr 24, 2026
Member
stevhliu
left a comment
There was a problem hiding this comment.
thanks! just a couple minor comments to use active voice :)
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
…dp-v2-default-cleanup
Member
Author
|
@bot /style |
Contributor
|
Style fix bot fixed some files and pushed the changes. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do ?
This PR defaults to FSDP version to 2. We cleanup the docstring to only show FSDPv2 related args and I cleaned a bit the codebase to separate FSDPv1 arg and FSDPv2 args logic. I've also added a bunch of deprecation messages for FSDPv1 + we will deprecate passing string in fsdp args.
This shouldn't impact users who relies on accelerate config to specify their fsdp related args.