-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[JIT] Cache compilation of free functions #30503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good change, but we need one additional thing to make it safe I think. Compiling a redefined function will lead to weird behavior:
def foo():
# something
fn1 = torch.jit.script(foo)
def foo():
# something else
fn2 = torch.jit.script(foo)
In the current implementation, fn2 will be the same as fn1. I think the fix is simple; you can use the fn pyobject ptr as the key into a compilation cache maintained from the python side.
|
Can you also add a test to verify that we the corner case right? |
Good point, and yea change sounds good. We could consider doing something like this for all objects which are script-convertible. This would also fix: #30421 The change for modules is a little more complicated because you have to check that attributes haven't changed. |
|
actually hold off on review i need to fix something edit: fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm besides some code movement requests!
Run Constant Propagation upon compilation only on ops with non-aliasing inputs and outputs. This speeds up the first run of `torchvision.models.resnet18` by over 50% and speeds up compilation by about 25% (although the effects didn't seem additive with with #30503, so I'm going to land this PR first and then see if caching still has a sizable impact). Running constant prop only with non-aliasing types does a lot of graph cleanup by removing constant ifs and a bunch of other smaller ops. It also avoids all the jitter problems we had when we tried running full constant prop previously. Bc it is idempotent it doesn't jitter, and it doesn't jitter graphs constructed from tracing because tracing doesn't emit any ops that only involve non-aliasing inputs. Full constant prop isn't idempotent because what ops are run depends on the state of mutation in alias db, which will often change upon successive iterations of constant propagation, and bc it affects graphs constructed from tracing. [ghstack-poisoned]
Run Constant Propagation upon compilation only on ops with non-aliasing inputs and outputs. This speeds up the first run of `torchvision.models.resnet18` by over 50% and speeds up compilation by about 25% (although the effects didn't seem additive with with #30503, so I'm going to land this PR first and then see if caching still has a sizable impact). Running constant prop only with non-aliasing types does a lot of graph cleanup by removing constant ifs and a bunch of other smaller ops. It also avoids all the jitter problems we had when we tried running full constant prop previously. Bc it is idempotent it doesn't jitter, and it doesn't jitter graphs constructed from tracing because tracing doesn't emit any ops that only involve non-aliasing inputs. Full constant prop isn't idempotent because what ops are run depends on the state of mutation in alias db, which will often change upon successive iterations of constant propagation, and bc it affects graphs constructed from tracing. [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eellison is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Run Constant Propagation upon compilation only on ops with non-aliasing inputs and outputs. This speeds up the first run of `torchvision.models.resnet18` by over 50% and speeds up compilation by about 25% (although the effects didn't seem additive with with #30503, so I'm going to land this PR first and then see if caching still has a sizable impact). Running constant prop only with non-aliasing types does a lot of graph cleanup by removing constant ifs and a bunch of other smaller ops. It also avoids all the jitter problems we had when we tried running full constant prop previously. Bc it is idempotent it doesn't jitter, and it doesn't jitter graphs constructed from tracing because tracing doesn't emit any ops that only involve non-aliasing inputs. Full constant prop isn't idempotent because what ops are run depends on the state of mutation in alias db, which will often change upon successive iterations of constant propagation, and bc it affects graphs constructed from tracing. Edit: if we were okay with running constant propagation on graphs constructed from tracing (potentially making them hard to debug), an alternative would be to run constant propagation until the graph reaches a fixed point. [ghstack-poisoned]
Run Constant Propagation upon compilation only on ops with non-aliasing inputs and outputs. This speeds up the first run of `torchvision.models.resnet18` by over 50% and speeds up compilation by about 25% (although the effects didn't seem additive with with #30503, so I'm going to land this PR first and then see if caching still has a sizable impact). Running constant prop only with non-aliasing types does a lot of graph cleanup by removing constant ifs and a bunch of other smaller ops. It also avoids all the jitter problems we had when we tried running full constant prop previously. Bc it is idempotent it doesn't jitter, and it doesn't jitter graphs constructed from tracing because tracing doesn't emit any ops that only involve non-aliasing inputs. Full constant prop isn't idempotent because what ops are run depends on the state of mutation in alias db, which will often change upon successive iterations of constant propagation, and bc it affects graphs constructed from tracing. Edit: if we were okay with running constant propagation on graphs constructed from tracing (potentially making them hard to debug), an alternative would be to run constant propagation until the graph reaches a fixed point. [ghstack-poisoned]
Run Constant Propagation upon compilation only on ops with non-aliasing inputs and outputs. This speeds up the first run of `torchvision.models.resnet18` by over 50% and speeds up compilation by about 25% (although the effects didn't seem additive with with #30503, so I'm going to land this PR first and then see if caching still has a sizable impact). Running constant prop only with non-aliasing types does a lot of graph cleanup by removing constant ifs and a bunch of other smaller ops. It also avoids all the jitter problems we had when we tried running full constant prop previously. Bc it is idempotent it doesn't jitter, and it doesn't jitter graphs constructed from tracing because tracing doesn't emit any ops that only involve non-aliasing inputs. Full constant prop isn't idempotent because what ops are run depends on the state of mutation in alias db, which will often change upon successive iterations of constant propagation, and bc it affects graphs constructed from tracing. Edit: if we were okay with running constant propagation on graphs constructed from tracing (potentially making them hard to debug), an alternative would be to run constant propagation until the graph reaches a fixed point. [ghstack-poisoned]
Summary: Pull Request resolved: #30544 Run Constant Propagation upon compilation only on ops with non-aliasing inputs and outputs. This speeds up the first run of `torchvision.models.resnet18` by over 50% and speeds up compilation by about 25% (although the effects didn't seem additive with with #30503, so I'm going to land this PR first and then see if caching still has a sizable impact). Running constant prop only with non-aliasing types does a lot of graph cleanup by removing constant ifs and a bunch of other smaller ops. It also avoids all the jitter problems we had when we tried running full constant prop previously. Bc it is idempotent it doesn't jitter, and it doesn't jitter graphs constructed from tracing because tracing doesn't emit any ops that only involve non-aliasing inputs. Full constant prop isn't idempotent because what ops are run depends on the state of mutation in alias db, which will often change upon successive iterations of constant propagation, and bc it affects graphs constructed from tracing. Edit: if we were okay with running constant propagation on graphs constructed from tracing (potentially making them hard to debug), an alternative would be to run constant propagation until the graph reaches a fixed point. Test Plan: Imported from OSS Differential Revision: D18833607 Pulled By: eellison fbshipit-source-id: 92a0adb4882d67ed5a0db5c279f5e122aeeba54a
Summary: We don't have to recompile free functions if we've already compiled them. Improved compilation of resnet18 by 27%. Pull Request resolved: pytorch#30503 Differential Revision: D18796501 Pulled By: eellison fbshipit-source-id: 2dee0fc5fcf9adc5b92213f8cb813730d71b376f
Summary: Pull Request resolved: pytorch#30544 Run Constant Propagation upon compilation only on ops with non-aliasing inputs and outputs. This speeds up the first run of `torchvision.models.resnet18` by over 50% and speeds up compilation by about 25% (although the effects didn't seem additive with with pytorch#30503, so I'm going to land this PR first and then see if caching still has a sizable impact). Running constant prop only with non-aliasing types does a lot of graph cleanup by removing constant ifs and a bunch of other smaller ops. It also avoids all the jitter problems we had when we tried running full constant prop previously. Bc it is idempotent it doesn't jitter, and it doesn't jitter graphs constructed from tracing because tracing doesn't emit any ops that only involve non-aliasing inputs. Full constant prop isn't idempotent because what ops are run depends on the state of mutation in alias db, which will often change upon successive iterations of constant propagation, and bc it affects graphs constructed from tracing. Edit: if we were okay with running constant propagation on graphs constructed from tracing (potentially making them hard to debug), an alternative would be to run constant propagation until the graph reaches a fixed point. Test Plan: Imported from OSS Differential Revision: D18833607 Pulled By: eellison fbshipit-source-id: 92a0adb4882d67ed5a0db5c279f5e122aeeba54a
We don't have to recompile free functions if we've already compiled them.
Improved compilation of resnet18 by 27%.