Conversation
_valid_auto_compile_criteria() gates auto-compilation on device type but excluded Neuron, so torch.compile never triggers automatically even when StaticCache is used. Add "neuron" to the valid hardware list alongside "cuda" and "xpu". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR adds "neuron" to the list of valid hardware device types in _valid_auto_compile_criteria(), enabling auto-compilation (torch.compile) on AWS Neuron (Trainium/Inferentia) devices when a compilable cache (e.g., StaticCache) is used. This is one item from the broader issue #44742 tracking static-shape generation support for Neuron.
Changes:
- Added
"neuron"to thevalid_hardwaredevice type check in_valid_auto_compile_criteria(), alongside"cuda"and"xpu".
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
|
||
| # Base logic | ||
| valid_hardware = self.device.type in ["cuda", "xpu"] or bool( | ||
| valid_hardware = self.device.type in ["cuda", "xpu", "neuron"] or bool( |
There was a problem hiding this comment.
isn't it dependent on adding a full static-shape generation loop first?
There was a problem hiding this comment.
Hmm not sure what you mean? I think this is a general list of devices that support compile OOB - you can also hack CPU etc with some private flag iirc
The static shapes etc come later in the input preparation
There was a problem hiding this comment.
will it not auto-compile, and then error out down the line due to dynamic inputs? From what I understood this device cannot support full compile without complete static shapes
There was a problem hiding this comment.
See the line below the condition is set to use valid hardware + cache --> if you don't set static cache (and hence all the static prep), you are out of luck either way
There is no real dynamic thing going on
There was a problem hiding this comment.
Ok discussed internally, now understanding it: With this we enable compile for neuron when we set static caches but there are still dynamic traces within the whole generate loop so it potentially doesn't make sense to add yet - we should rather wait for feature completeness before adding this. That's at least what I understood now
There was a problem hiding this comment.
For testing purposes, we can enable via the private flags within the compile config
Summary
_valid_auto_compile_criteria()gates auto-compilation ondevice.type in ["cuda", "xpu"], excluding Neuron devices. This meanstorch.compilenever triggers automatically on Neuron even whenStaticCacheis used (which setsis_compileable = True)."neuron"to the valid hardware list so that Neuron devices benefit from auto-compilation like CUDA and XPU.Addresses the "Auto-compilation gate missing Neuron" item in #44742.
🤖 Generated with Claude Code
Co-authored-by: Claude Opus 4.6 noreply@anthropic.com