Skip to content

Conversation

@yushangdi
Copy link
Contributor

@yushangdi yushangdi commented Nov 11, 2025

The seq_nr doesn't always increment for gradient accumulation nodes, and they might be copying annotation from forward nodes.

I'm just going to skip copying the custom meta for any gradient accumulation nodes and give them a special tag e.g. node.meta["is_gradient_acc"]=True

Example repro for deepseek torchtitan (without using DTensor): https://gist.github.com/yushangdi/aae13ea382732f31d0fdfb3ffeda12c8

(side note: if you want some more hints on these gradient acc node: 1) they have torch.ops.aten.add.Tensor op, not add.default. 2) they have the highest seq_nr(s) )

cc @ezyang @EikanWang @jgong5 @wenzhe-nrv @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @jiayisunx @chenyang78 @kadeng @chauhang @amjames @Lucaskabela

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 11, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167572

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit cb930ff with merge base f6a79b2 (image):

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: fx release notes category label Nov 11, 2025
@yushangdi yushangdi marked this pull request as ready for review November 11, 2025 21:26
@yushangdi yushangdi changed the title Tag gradient acc in node [annotation] Skip copying custom meta for gradient accumulation nodes; tag with is_gradient_acc=True Nov 11, 2025
@yushangdi yushangdi force-pushed the sy_exp branch 2 times, most recently from 8f33d1e to a30df58 Compare November 11, 2025 21:54
@yushangdi
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 12, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request fx Merged module: dynamo release notes: fx release notes category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants