Update on "Reduce overhead in CUDAGraph Trees" · pytorch/pytorch@7a9b0a6

Commit

Update on "Reduce overhead in CUDAGraph Trees"

Significantly reduces overhead of constructing Tensors and Storages and checking Storage Liveness. Removes the regression for HF models that I tested and removes 75% of overhead of the extremely overhead bound resnet50 training we have in torchbench. (.91x base commit, 1.02x torchinductor default, 1.16x this PR, 1.25 previous cudagraphs impl). 

This PR takes care of all of the lower hanging fruit. 

- Computes storage aliasing at record time instead of during at runtime. We no longer need to use a runtime storage cache, and can instead index directly into the existing alias if there is one, or construct a new Storage

- Moves the heavyweight C++ calls into a batch - getting storage weakrefs and constructing tensors

cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire

[ghstack-poisoned]

Loading branch information

eellison committed Apr 6, 2023

2 parents 832492b + ba657ad commit 7a9b0a6

torch/_inductor/cudagraph_trees.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -551,7 +551,7 @@ class AliasesNewOutput(OutputAliasInfo): @@
         __slots__ = ["index"]
-        index: PathOutputIndex
+        index: int
         def __init__(self, index):
             assert isinstance(index, int)
@@ Expand Down @@

0 comments on commit `7a9b0a6`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `7a9b0a6`

Commit

There are no files selected for viewing

0 comments on commit 7a9b0a6

0 comments on commit `7a9b0a6`