Skip to content

Conversation

zhxchen17
Copy link
Contributor

@zhxchen17 zhxchen17 commented Nov 2, 2023

Summary:
IIUC, from the beginning of torch.export to the current moment, we never really find a way to meaningfully use generated guards from torch dynamo, since export by design must output a non Python dependent artifact to users. Therefore I think in the forseeable future we still don't need dynamo guards to be present.

Recently we ran into some slowness in dynamo for internal models and observed a substantial time consumed in guard propagation logic, especially when we're dealing with list VTs. This PR effectively detects whether we're exporting and if so skip the entire guard propagation logic.

This doesn't resolve the fundemental issue of O(n^2) algorithm used in VT but greatly reduce the constant factor of it, thus still gives us great speedup for exporting time. We've discussed the resolution in pt2 core meeting but as a quick unblocker in the short term I think we could also try to skip guards to unblock several internal models.

Some experiment ran on the internal FM model:
local_fm (training): 15min -> 4min
ads launcher ir generation (training, D50847630): >1hr -> 30min

Test Plan: CI

Differential Revision: D50914819

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @aakhundov @kadeng

Copy link

pytorch-bot bot commented Nov 2, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/112685

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit cba8099 with merge base 2337d8d (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50914819

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50914819

@ezyang ezyang requested review from voznesenskym and mlazos November 3, 2023 14:16
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50914819

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50914819

Summary:

IIUC, from the beginning of torch.export to the current moment, we never really find a way to meaningfully use generated guards from torch dynamo, since export by design must output a non Python dependent artifact to users. Therefore I think in the forseeable future we still don't need dynamo guards to be present.

Recently we ran into some slowness in dynamo for internal models and observed a substantial time consumed in guard propagation logic, especially when we're dealing with list VTs. This PR effectively detects whether we're exporting and if so skip the entire guard propagation logic.

This doesn't resolve the fundemental issue of O(n^2) algorithm used in VT but greatly reduce the constant factor of it, thus still gives us great speedup for exporting time. We've discussed the resolution in pt2 core meeting but as a quick unblocker in the short term I think we could also try to skip guards to unblock several internal models.

Some experiment ran on the internal FM model:
local_fm (training): 15min -> 4min
ads launcher ir generation (training, D50847630): >1hr -> 30min

Test Plan: CI

Reviewed By: tugsbayasgalan

Differential Revision: D50914819
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50914819

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D50914819

@zhxchen17
Copy link
Contributor Author

Added some comments inline about why we skip guards propagation for export.

Copy link
Contributor

@jansel jansel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be made obsolete by:
#111726

Does that PR fix the performance issues you are seeing? If so maybe we don't need to add an extra mode.

@zhxchen17
Copy link
Contributor Author

I think this will be made obsolete by: #111726

Does that PR fix the performance issues you are seeing? If so maybe we don't need to add an extra mode.

I will do a test with that PR.

@zhxchen17 zhxchen17 closed this Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants