Skip to content

Conversation

@bobrenjc93
Copy link
Contributor

@bobrenjc93 bobrenjc93 commented Sep 18, 2025

Stack from ghstack (oldest at bottom):

Ergonomic improvement to allow sharing symbols without having to do the complex torch._check paradigm as described by @anijain2305 in his recent UED:

Different symbols for KV cached tensors - One property in my case was that the KV cache for different attention blocks had the same seq length, but there is no API to enforce that. The only way is to add torch._check but torch.compile must trace those functions to instruct the dynamic shape infra. This required me to change the model code.
Changing model code is not the best experience. Lets see how transformers maintainers react to my PR. Maybe an API with fullgraph=True is a better bet here.

cc @ezyang @EikanWang @jgong5 @wenzhe-nrv @voznesenskym @penguinwu @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @jiayisunx @chenyang78 @kadeng @chauhang @amjames @Lucaskabela

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/163246

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit f7c3ddf with merge base 39450e7 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

bobrenjc93 added a commit that referenced this pull request Sep 18, 2025
ghstack-source-id: b0b8fa6
Pull Request resolved: #163246
@bobrenjc93 bobrenjc93 added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Sep 18, 2025
Ergonomic improvement to allow sharing symbols without having to do the complex torch._check paradigm as described by anijain2305 in his recent UED:

```
Different symbols for KV cached tensors - One property in my case was that the KV cache for different attention blocks had the same seq length, but there is no API to enforce that. The only way is to add torch._check but torch.compile must trace those functions to instruct the dynamic shape infra. This required me to change the model code.
Changing model code is not the best experience. Lets see how transformers maintainers react to my PR. Maybe an API with fullgraph=True is a better bet here.
```

cc ezyang EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela

[ghstack-poisoned]
bobrenjc93 added a commit that referenced this pull request Sep 18, 2025
ghstack-source-id: ee461a2
Pull Request resolved: #163246
Ergonomic improvement to allow sharing symbols without having to do the complex torch._check paradigm as described by anijain2305 in his recent UED:

```
Different symbols for KV cached tensors - One property in my case was that the KV cache for different attention blocks had the same seq length, but there is no API to enforce that. The only way is to add torch._check but torch.compile must trace those functions to instruct the dynamic shape infra. This required me to change the model code.
Changing model code is not the best experience. Lets see how transformers maintainers react to my PR. Maybe an API with fullgraph=True is a better bet here.
```

cc ezyang EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela

[ghstack-poisoned]
bobrenjc93 added a commit that referenced this pull request Sep 18, 2025
ghstack-source-id: f67376d
Pull Request resolved: #163246
Ergonomic improvement to allow sharing symbols without having to do the complex torch._check paradigm as described by anijain2305 in his recent UED:

```
Different symbols for KV cached tensors - One property in my case was that the KV cache for different attention blocks had the same seq length, but there is no API to enforce that. The only way is to add torch._check but torch.compile must trace those functions to instruct the dynamic shape infra. This required me to change the model code.
Changing model code is not the best experience. Lets see how transformers maintainers react to my PR. Maybe an API with fullgraph=True is a better bet here.
```

cc ezyang EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela

[ghstack-poisoned]
bobrenjc93 added a commit that referenced this pull request Sep 18, 2025
ghstack-source-id: 11d6ca8
Pull Request resolved: #163246
@bobrenjc93 bobrenjc93 marked this pull request as ready for review September 18, 2025 21:44
Ergonomic improvement to allow sharing symbols without having to do the complex torch._check paradigm as described by anijain2305 in his recent UED:

```
Different symbols for KV cached tensors - One property in my case was that the KV cache for different attention blocks had the same seq length, but there is no API to enforce that. The only way is to add torch._check but torch.compile must trace those functions to instruct the dynamic shape infra. This required me to change the model code.
Changing model code is not the best experience. Lets see how transformers maintainers react to my PR. Maybe an API with fullgraph=True is a better bet here.
```

cc ezyang EikanWang jgong5 wenzhe-nrv voznesenskym penguinwu Guobing-Chen XiaobingSuper zhuhaozhe blzheng jiayisunx chenyang78 kadeng chauhang amjames Lucaskabela

[ghstack-poisoned]
@ezyang
Copy link
Contributor

ezyang commented Sep 21, 2025

I think doing it with strings is unwise, as in a large enough codebase it can be difficult to avoid collisions, which will cause extremely strange errors. The only use case for strings is if there is a single global configuration spot for dynamic that applies everywhere, but mark_dynamic works both for fullgraph and graph break cases, and also it can propagate unpredictably as its data flow. Easy fix: explicitly allocate Dim symbols (similar to how export does it) and then use those to dedupe by object identity.

Speaking of which, why don't we use export's Dim directly? cc @avikchaudhuri

@bobrenjc93
Copy link
Contributor Author

I think doing it with strings is unwise, as in a large enough codebase it can be difficult to avoid collisions, which will cause extremely strange errors.

Fair point. I assumed this would be such a rare power-user case that it wouldn’t matter in practice, but you’re right that using object identity is cleaner. I did think about the Dims API, though it seemed like a heavy lift for existing models with mark_dynamic infrastructure (for example, ads PT2 wrappers). Since this is a power-user feature anyway, that cost may not be a big deal. I’ll close this PR for now and see if I can get the Dims API working for compile in a more ergonomic way. Worst case, there may be a middle-ground approach where we create dim-like objects and thread them through the mark_dynamic calls.

@bobrenjc93 bobrenjc93 closed this Sep 21, 2025
pytorchmergebot pushed a commit that referenced this pull request Sep 22, 2025
pytorchmergebot pushed a commit that referenced this pull request Sep 22, 2025
pytorchmergebot pushed a commit that referenced this pull request Sep 23, 2025
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
dsashidh pushed a commit to dsashidh/pytorch that referenced this pull request Sep 26, 2025
@github-actions github-actions bot deleted the gh/bobrenjc93/564/head branch October 22, 2025 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request fx module: dynamo release notes: fx release notes category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants