Skip to content

🚨🚨🚨 [Trainer] Default to FSDP2, simplify API around fsdp + fsdp_config#45640

Open
SunMarc wants to merge 9 commits intomainfrom
fsdp-v2-default-cleanup
Open

🚨🚨🚨 [Trainer] Default to FSDP2, simplify API around fsdp + fsdp_config#45640
SunMarc wants to merge 9 commits intomainfrom
fsdp-v2-default-cleanup

Conversation

@SunMarc
Copy link
Copy Markdown
Member

@SunMarc SunMarc commented Apr 24, 2026

What does this PR do ?

This PR defaults to FSDP version to 2. We cleanup the docstring to only show FSDPv2 related args and I cleaned a bit the codebase to separate FSDPv1 arg and FSDPv2 args logic. I've also added a bunch of deprecation messages for FSDPv1 + we will deprecate passing string in fsdp args.

This shouldn't impact users who relies on accelerate config to specify their fsdp related args.

- TrainingArguments.fsdp is now a boolean on-switch. String / list values
  are still accepted for backward compatibility but are translated into
  fsdp_config entries (and emit a deprecation warning).
- fsdp_config now defaults to FSDP2 (version=2). FSDP1 usage still works
  but logs a deprecation warning (to be removed in v5.20).
- New fsdp_config keys: auto_wrap_policy (default TRANSFORMER_BASED_WRAP),
  cpu_offload, state_dict_type (default FULL_STATE_DICT so
  trainer.save_model() produces an HF-compatible checkpoint out of the
  box).
- FSDP1-only handling (forward_prefetch, backward_prefetch, use_orig_params,
  sync_module_states, and the string form of reshard_after_forward) is
  now isolated in a single branch for easy removal once v1 support is
  dropped. Legacy `fsdp` string/list parsing lives in a single
  _apply_legacy_fsdp_to_config helper for the same reason.
- Trainer reads args.fsdp / args.fsdp_config guarded so that running
  without FSDP (or via `accelerate launch` with no transformers-side
  config) no longer crashes on None.
- Docstring rewritten around fsdp_config; FSDP2-only keys surfaced,
  FSDP1-only keys (sync_module_states, use_orig_params, limit_all_gathers)
  dropped from the public docstring.

Verified on the repro from PR #42521:

    TrainingArguments(fsdp=True, fsdp_config={"fsdp_version": 2,
                                              "reshard_after_forward": True})

now produces fsdp_version=2 and reshard_after_forward=True.
@SunMarc
Copy link
Copy Markdown
Member Author

SunMarc commented Apr 24, 2026

@bot /style

github-actions Bot and others added 2 commits April 24, 2026 17:08
- `args.fsdp` is already a bool after `_process_fsdp_args`, so wrapping
  it with `bool(...)` in trainer.py was redundant.
- Remove the explanatory comment above `state_dict_type` default; the
  docstring already covers it.
Drop the post-accelerator `setattr` loop that re-pushed these two keys
onto the already-constructed FSDP plugin. They are now forwarded during
plugin construction via `fsdp_plugin_args`:

- `activation_checkpointing` is shared between FSDP1 and FSDP2.
- `limit_all_gathers` is FSDP1-only (obsolete in FSDP2), so it lives in
  the v1 branch.

The `activation_checkpointing` + `gradient_checkpointing` conflict check
stays in trainer.py (still needs the post-plugin state to compare).
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Copy Markdown
Member

@stevhliu stevhliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! just a couple minor comments to use active voice :)

Comment thread src/transformers/training_args.py Outdated
Comment thread src/transformers/training_args.py Outdated
Comment thread src/transformers/training_args.py Outdated
Comment thread src/transformers/training_args.py Outdated
Comment thread src/transformers/training_args.py Outdated
Comment thread src/transformers/training_args.py Outdated
Comment thread src/transformers/training_args.py Outdated
Comment thread src/transformers/training_args.py Outdated
SunMarc and others added 3 commits April 27, 2026 14:18
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
@SunMarc
Copy link
Copy Markdown
Member Author

SunMarc commented Apr 27, 2026

@bot /style

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 27, 2026

Style fix bot fixed some files and pushed the changes.

@SunMarc SunMarc changed the title [Trainer] default to FSDP2, simplify API around fsdp + fsdp_config 🚨🚨🚨 [Trainer] Default to FSDP2, simplify API around fsdp + fsdp_config Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants