Skip to content

Conversation

@wconstab
Copy link
Contributor

Run your pytorch program with TORCH_TRACE=<path_to_logdir> set, and then run tlparse <path_to_specific_logfile> to generate a manifold URL to an html artifact.

At this point, it adds 3 artifact files into the tlparse, and others could be added later as needed.

  • autoparallel_joint_graph
  • autoparallel_sharding_optimizer_log
  • autoparallel_parallel_graph

Example output (llama3 debugmodel, 8gpu, tp=4, without torch.compile): https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpblWM9X/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 26, 2025
@wconstab wconstab requested review from bdhirsh, ezyang and fmassa June 26, 2025 17:20
Copy link
Contributor

@bdhirsh bdhirsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely looks to me like a good starting point!

Run your pytorch program with `TORCH_TRACE=<path_to_logdir>` set, and
then run `tlparse <path_to_specific_logfile>` to generate a manifold URL
to an html artifact.

At this point, it adds 3 artifact files into the tlparse, and others
could be added later as needed.
- autoparallel_joint_graph
- autoparallel_sharding_optimizer_log
- autoparallel_parallel_graph

Example output (llama3 debugmodel, 8gpu, tp=4, without torch.compile):
https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpblWM9X/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000
@wconstab wconstab merged commit 1306427 into main Jun 26, 2025
2 checks passed
@fmassa fmassa deleted the whc/tlparse branch June 27, 2025 08:44
# clean it up by removing the added aliases from previous pass
# as well as redundant views
parallel_gm = joint_graph_passes(parallel_gm)
trace_structured(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it would be maybe more readable to store the parallel_gm after the apply_node_renaming as the nodes would have the same name as for the unsharded graph

"name": "autoparallel_joint_graph",
"encoding": "string",
},
payload_fn=lambda: str(gm.graph),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be preferable to store the str(gm), as it would let it be a runnable representation of the graph.

bdhirsh pushed a commit that referenced this pull request Jul 1, 2025
Run your pytorch program with `TORCH_TRACE=<path_to_logdir>` set, and
then run `tlparse <path_to_specific_logfile>` to generate a manifold URL
to an html artifact.

At this point, it adds 3 artifact files into the tlparse, and others
could be added later as needed.
- autoparallel_joint_graph
- autoparallel_sharding_optimizer_log
- autoparallel_parallel_graph

Example output (llama3 debugmodel, 8gpu, tp=4, without torch.compile):
https://manifold.edge.x2p.facebook.net/v0/read/tree/logs/.tmpblWM9X/index.html?bucketName=tlparse_reports&apiKey=tlparse_reports-key&withPayload=1&timeoutMsec=10000
fmassa added a commit that referenced this pull request Jul 2, 2025
fmassa added a commit that referenced this pull request Jul 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants