Skip to content

Add Neuron to auto-compile hardware list#44757

Open
dacorvo wants to merge 2 commits intomainfrom
neuron_auto_compile
Open

Add Neuron to auto-compile hardware list#44757
dacorvo wants to merge 2 commits intomainfrom
neuron_auto_compile

Conversation

@dacorvo
Copy link
Copy Markdown
Contributor

@dacorvo dacorvo commented Mar 16, 2026

Summary

  • _valid_auto_compile_criteria() gates auto-compilation on device.type in ["cuda", "xpu"], excluding Neuron devices. This means torch.compile never triggers automatically on Neuron even when StaticCache is used (which sets is_compileable = True).
  • Adds "neuron" to the valid hardware list so that Neuron devices benefit from auto-compilation like CUDA and XPU.

Addresses the "Auto-compilation gate missing Neuron" item in #44742.

🤖 Generated with Claude Code

Co-authored-by: Claude Opus 4.6 noreply@anthropic.com

_valid_auto_compile_criteria() gates auto-compilation on device type
but excluded Neuron, so torch.compile never triggers automatically
even when StaticCache is used. Add "neuron" to the valid hardware
list alongside "cuda" and "xpu".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds "neuron" to the list of valid hardware device types in _valid_auto_compile_criteria(), enabling auto-compilation (torch.compile) on AWS Neuron (Trainium/Inferentia) devices when a compilable cache (e.g., StaticCache) is used. This is one item from the broader issue #44742 tracking static-shape generation support for Neuron.

Changes:

  • Added "neuron" to the valid_hardware device type check in _valid_auto_compile_criteria(), alongside "cuda" and "xpu".

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@dacorvo dacorvo requested review from vasqu and zucchini-nlp April 14, 2026 12:08

# Base logic
valid_hardware = self.device.type in ["cuda", "xpu"] or bool(
valid_hardware = self.device.type in ["cuda", "xpu", "neuron"] or bool(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it dependent on adding a full static-shape generation loop first?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure what you mean? I think this is a general list of devices that support compile OOB - you can also hack CPU etc with some private flag iirc

The static shapes etc come later in the input preparation

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it not auto-compile, and then error out down the line due to dynamic inputs? From what I understood this device cannot support full compile without complete static shapes

Copy link
Copy Markdown
Contributor

@vasqu vasqu Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the line below the condition is set to use valid hardware + cache --> if you don't set static cache (and hence all the static prep), you are out of luck either way

There is no real dynamic thing going on

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok discussed internally, now understanding it: With this we enable compile for neuron when we set static caches but there are still dynamic traces within the whole generate loop so it potentially doesn't make sense to add yet - we should rather wait for feature completeness before adding this. That's at least what I understood now

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing purposes, we can enable via the private flags within the compile config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants