[graph_trainer] Add remat pass and torch.no_grad() execution to minimal_fx_tracer by tugsbayasgalan · Pull Request #2767 · pytorch/torchtitan

tugsbayasgalan · 2026-03-31T15:00:41Z

Stack from ghstack (oldest at bottom):

-> [graph_trainer] Add remat pass and torch.no_grad() execution to minimal_fx_tracer #2767
[graph_trainer] Replace trace_module/run_traced_module with minimal_fx_trace API #2753

Annotate backward FX nodes with {"remat_pass_tag": "is_backward"} during
_patch_engine_run_backward so remat_using_tags_for_fwd_loss_bwd_graph can
identify the forward/backward boundary.
Apply remat_using_tags_for_fwd_loss_bwd_graph as a default post-trace pass.
Nodes tagged PREFER_RECOMPUTE (from selective AC) are duplicated before
backward and the forward copies are DCE'd, reducing peak memory.
Execute traced graph under torch.no_grad() since the graph already contains
explicit backward ops. Without this, PyTorch builds a redundant autograd
graph keeping all forward intermediates alive via grad_fn references.
Add test_llama_1b_peak_memory: verifies traced+AC peak memory is within 20%
of eager+AC on Llama 1B (BS=2, seq=2048, bf16).

…al_fx_tracer - Annotate backward FX nodes with {"remat_pass_tag": "is_backward"} during _patch_engine_run_backward so remat_using_tags_for_fwd_loss_bwd_graph can identify the forward/backward boundary. - Apply remat_using_tags_for_fwd_loss_bwd_graph as a default post-trace pass. Nodes tagged PREFER_RECOMPUTE (from selective AC) are duplicated before backward and the forward copies are DCE'd, reducing peak memory. - Execute traced graph under torch.no_grad() since the graph already contains explicit backward ops. Without this, PyTorch builds a redundant autograd graph keeping all forward intermediates alive via grad_fn references. - Add test_llama_1b_peak_memory: verifies traced+AC peak memory is within 20% of eager+AC on Llama 1B (BS=2, seq=2048, bf16). [ghstack-poisoned]

tugsbayasgalan requested review from SherlockNoMad, aditvenk, tianyu-l, xmfan and yiming0416 as code owners March 31, 2026 15:00

tugsbayasgalan mentioned this pull request Mar 31, 2026

[graph_trainer] Replace trace_module/run_traced_module with minimal_fx_trace API #2753

Closed

pytorch-bot Bot added the ciflow/8gpu label Mar 31, 2026

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 31, 2026

tugsbayasgalan closed this Mar 31, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[graph_trainer] Add remat pass and torch.no_grad() execution to minimal_fx_tracer#2767

[graph_trainer] Add remat pass and torch.no_grad() execution to minimal_fx_tracer#2767
tugsbayasgalan wants to merge 1 commit intogh/tugsbayasgalan/12/basefrom
gh/tugsbayasgalan/12/head

tugsbayasgalan commented Mar 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tugsbayasgalan commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tugsbayasgalan commented Mar 31, 2026 •

edited

Loading