-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Add support for name kwarg in mark_dynamic #163246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163246
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit f7c3ddf with merge base 39450e7 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Ergonomic improvement to allow sharing symbols without having to do the complex torch._check paradigm as described by anijain2305 in his recent UED: ``` Different symbols for KV cached tensors - One property in my case was that the KV cache for different attention blocks had the same seq length, but there is no API to enforce that. The only way is to add torch._check but torch.compile must trace those functions to instruct the dynamic shape infra. This required me to change the model code. Changing model code is not the best experience. Lets see how transformers maintainers react to my PR. Maybe an API with fullgraph=True is a better bet here. ``` cc ezyang EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela [ghstack-poisoned]
Ergonomic improvement to allow sharing symbols without having to do the complex torch._check paradigm as described by anijain2305 in his recent UED: ``` Different symbols for KV cached tensors - One property in my case was that the KV cache for different attention blocks had the same seq length, but there is no API to enforce that. The only way is to add torch._check but torch.compile must trace those functions to instruct the dynamic shape infra. This required me to change the model code. Changing model code is not the best experience. Lets see how transformers maintainers react to my PR. Maybe an API with fullgraph=True is a better bet here. ``` cc ezyang EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela [ghstack-poisoned]
Ergonomic improvement to allow sharing symbols without having to do the complex torch._check paradigm as described by anijain2305 in his recent UED: ``` Different symbols for KV cached tensors - One property in my case was that the KV cache for different attention blocks had the same seq length, but there is no API to enforce that. The only way is to add torch._check but torch.compile must trace those functions to instruct the dynamic shape infra. This required me to change the model code. Changing model code is not the best experience. Lets see how transformers maintainers react to my PR. Maybe an API with fullgraph=True is a better bet here. ``` cc ezyang EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela [ghstack-poisoned]
Ergonomic improvement to allow sharing symbols without having to do the complex torch._check paradigm as described by anijain2305 in his recent UED: ``` Different symbols for KV cached tensors - One property in my case was that the KV cache for different attention blocks had the same seq length, but there is no API to enforce that. The only way is to add torch._check but torch.compile must trace those functions to instruct the dynamic shape infra. This required me to change the model code. Changing model code is not the best experience. Lets see how transformers maintainers react to my PR. Maybe an API with fullgraph=True is a better bet here. ``` cc ezyang EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela [ghstack-poisoned]
|
I think doing it with strings is unwise, as in a large enough codebase it can be difficult to avoid collisions, which will cause extremely strange errors. The only use case for strings is if there is a single global configuration spot for dynamic that applies everywhere, but mark_dynamic works both for fullgraph and graph break cases, and also it can propagate unpredictably as its data flow. Easy fix: explicitly allocate Dim symbols (similar to how export does it) and then use those to dedupe by object identity. Speaking of which, why don't we use export's Dim directly? cc @avikchaudhuri |
Fair point. I assumed this would be such a rare power-user case that it wouldn’t matter in practice, but you’re right that using object identity is cleaner. I did think about the Dims API, though it seemed like a heavy lift for existing models with mark_dynamic infrastructure (for example, ads PT2 wrappers). Since this is a power-user feature anyway, that cost may not be a big deal. I’ll close this PR for now and see if I can get the Dims API working for compile in a more ergonomic way. Worst case, there may be a middle-ground approach where we create dim-like objects and thread them through the mark_dynamic calls. |
…opener.py (#163469) Pull Request resolved: #163469 Approved by: https://github.com/aorenste, https://github.com/Skylion007 ghstack dependencies: #163246
…_dq_pass.py (#163470) Pull Request resolved: #163470 Approved by: https://github.com/aorenste ghstack dependencies: #163246, #163469
…opener.py (pytorch#163469) Pull Request resolved: pytorch#163469 Approved by: https://github.com/aorenste, https://github.com/Skylion007 ghstack dependencies: pytorch#163246
…_dq_pass.py (pytorch#163470) Pull Request resolved: pytorch#163470 Approved by: https://github.com/aorenste ghstack dependencies: pytorch#163246, pytorch#163469
…orter/_globals.py (pytorch#163472) Pull Request resolved: pytorch#163472 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#163246, pytorch#163469, pytorch#163470
Stack from ghstack (oldest at bottom):
Ergonomic improvement to allow sharing symbols without having to do the complex torch._check paradigm as described by @anijain2305 in his recent UED:
cc @ezyang @EikanWang @jgong5 @wenzhe-nrv @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @jiayisunx @chenyang78 @kadeng @chauhang @amjames @Lucaskabela