Add xpu-kernels skill - Intel XPU Triton kernel development by danielfleischer · Pull Request #547 · huggingface/kernels

danielfleischer · 2026-05-13T12:34:38Z

Adds a new skill under kernel-builder/skills/xpu-kernels/, alongside the existing cuda-kernels and rocm-kernels skills, bringing Intel XPU support to kernel-builder. Target hardware is Intel Battlemage / Arc Pro B70 (Xe2) via the Intel XPU Backend for Triton (https://github.com/intel/intel-xpu-backend-for-triton).

The skill packages the Xe-Forge (https://github.com/IntelLabs/Xe-Forge) workflow — an LLM-driven loop that transforms PyTorch code into optimized Triton kernels for Intel XPU — into the hf-kernels skill format. Xe-Forge has been used to produce measured speedups on KernelBench Level 2 fused kernels (bf16) and Flash Attention (fp16); full results live in that repo.

What's included

SKILL.md - skill definition and the analyze → validate → benchmark → profile → finalize trial-loop workflow, XPU-specific patterns (tensor descriptors, GRF 256, tile swizzling, bf16 + fp32 accumulation), and XPU correctness constraints.
scripts/ - CLI tools: analyze_kernel.py, validate_triton.py, benchmark.py (uses AI-Bench (https://github.com/libxsmm/AI-bench) as the harness), trial_manager.py (tree-structured trial tracking), xpu_profiler.py (VTune integration),
plus HF kernels / transformers integration examples.
references/ - knowledge base: correctness, XPU optimizations, fusion patterns, memory patterns, dtype choices, persistent kernel patterns, optimization levels/strategies, KernelBench classification.

Next steps / guidance welcome

The skill content has been developed and validated against Xe-Forge directly. Integration into this repo's Nix-based build is the remaining piece, and I'd appreciate pointers from maintainers on:

The expected Nix flow for adding a new skill under kernel-builder/skills/ (build target, validation command)
Whether anything beyond the skill's own manifest.txt needs updating (indexes, CI config, registries)
Conventions for skills shipping Python CLI tools + external deps — currently using requirements.txt; open to a Nix-native packaging if preferred
Review/testing expectations for a new hardware backend skill

Happy to iterate on any of the above.

Adds a new skill under kernel-builder/skills/xpu-kernels/, alongside the existing cuda-kernels and rocm-kernels skills, bringing Intel XPU support to kernel-builder. Target hardware is Intel Battlemage / Arc Pro B70 (Xe2) via the Intel XPU Backend for Triton (https://github.com/intel/intel-xpu-backend-for-triton). The skill packages the Xe-Forge (https://github.com/IntelLabs/Xe-Forge) workflow — an LLM-driven loop that transforms PyTorch code into optimized Triton kernels for Intel XPU — into the hf-kernels skill format. Xe-Forge has been used to produce measured speedups on KernelBench Level 2 fused kernels (bf16) and Flash Attention forward (fp16); full results live in that repo.

github-actions · 2026-05-13T12:34:58Z

Hi @danielfleischer, thanks for your interest in contributing!

This project requires that pull request authors are vouched, and you are not in the list of vouched users.

This PR will be closed automatically. See https://github.com/huggingface/kernels/blob/main/CONTRIBUTING.md for more details.

github-actions · 2026-05-13T16:15:55Z

Hi @danielfleischer, thanks for your interest in contributing!

This project requires that pull request authors are vouched, and you are not in the list of vouched users.

This PR will be closed automatically. See https://github.com/huggingface/kernels/blob/main/CONTRIBUTING.md for more details.

github-actions · 2026-05-14T13:33:36Z

Hi @danielfleischer, thanks for your interest in contributing!

This project requires that pull request authors are vouched, and you are not in the list of vouched users.

This PR will be closed automatically. See https://github.com/huggingface/kernels/blob/main/CONTRIBUTING.md for more details.

* fix: remove existing test repo before upload (huggingface#519) * fix: remove existing test repo before upload * fix: add missing content type * fix: prefer removing repos via hub library * fix: use lib from nix shell on runner * fix: disallow more than one instance of E2E running at once to avoid race conditions * fix: prefer using ci token * fix: update e2e to use trust_remote_code for the dummy user * fix: prefer using latest kernels-data in test * fix: update nix warns to throws (huggingface#540) * feat: bump cute dsl/cutlass (huggingface#545) * feat: add to vouched (huggingface#551) * hook up skill in the cli and add docs. --------- Co-authored-by: drbh <david.richard.holtz@gmail.com> Co-authored-by: Copilot <copilot@github.com>

…#550) * Update version bumping scripts with the `--major` option With this change the script supports both major and minor version bumping. For example: Codebase at `0.10.1.dev0` ``` (none) -> 0.10.1 --major -> 0.11.0 --dev -> 0.10.1.dev1 --dev --major -> 0.11.0.dev0 ``` Codebase at `0.10.1`: ``` (none) -> 0.10.2 --major -> 0.11.0 --dev -> 0.10.2.dev0 ``` These are the typical version bumping workflows within the project. * Sync .PHONY targets

…face#543)

…auto discovery (huggingface#555)

sayakpaul · 2026-05-18T10:05:32Z

 #!/usr/bin/env python3
 """Bump all version strings in the repo.

-Without ``--dev``: strip the development suffix ahead of a release.


This seems like an unrelated change?

This is from #550.

sayakpaul · 2026-05-18T10:05:46Z

    from kernels import get_kernel, get_local_kernel

    if is_local:
-        kernel = get_local_kernel(Path(repo_id), "activation")


Unrelated change?

This is from #555.

sayakpaul · 2026-05-18T10:06:13Z

                .split('/')
                .next_back()
-                .is_some_and(|n| n.starts_with("benchmark"))
+                .is_some_and(|n| n.starts_with("benchmark") && n.ends_with(".py"))


Unrelated change?

This is from #543.

sayakpaul · 2026-05-18T10:06:24Z

@@ -1,4 +1,4 @@
-.PHONY: style kernel-builder-cli-docs quality bump-dev bump-dev-dry-run bump-release bump-release-dry-run pin-actions
+.PHONY: style kernel-builder-cli-docs quality bump-dev bump-dev-dry-run bump-dev-major bump-dev-major-dry-run bump-release bump-release-dry-run bump-major bump-major-dry-run pin-actions


Unrelated change?

This is from #550.

Fixing some paths due to the skill living in the agent-specific location, outside of `kernel-builder/skills/xpu-kernels/`.

danielfleischer · 2026-05-18T11:42:59Z

Should I not have cherry picked main?

sayakpaul · 2026-05-18T12:45:54Z

If we merge the upstram main then those changes should disappear.

sayakpaul

Sweet!

HuggingFaceDocBuilderDev · 2026-05-19T12:54:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul · 2026-05-19T14:50:06Z

@danieldk could you review the changes in skill.rs?

danieldk · 2026-05-20T08:09:41Z

@danieldk could you review the changes in skill.rs?

Looks good! 👍

github-actions Bot closed this May 13, 2026

danieldk reopened this May 13, 2026

github-actions Bot closed this May 13, 2026

sayakpaul reopened this May 14, 2026

github-actions Bot closed this May 14, 2026

sayakpaul reopened this May 15, 2026

sayakpaul and others added 5 commits May 15, 2026 10:46

Merge branch 'main' into xpu-skill

9a81e30

upload: fix benchmark deletion filter to match upload filter (hugging…

bf179d5

…face#543)

get_local_kernel api changed, leaving backend (second arg) empty for …

722a55e

…auto discovery (huggingface#555)

sayakpaul reviewed May 18, 2026

View reviewed changes

Paths fix

0ed1515

Fixing some paths due to the skill living in the agent-specific location, outside of `kernel-builder/skills/xpu-kernels/`.

Merge upstream

bf0a397

sayakpaul previously approved these changes May 19, 2026

View reviewed changes

Merge branch 'main' into xpu-skill

ab2bd13

update enum

14f34a2

danielfleischer dismissed sayakpaul’s stale review via 14f34a2 May 19, 2026 13:41

Merge branch 'main' into xpu-skill

399cc59

sayakpaul requested a review from danieldk May 19, 2026 14:50

Merge branch 'main' into xpu-skill

975538f

sayakpaul approved these changes May 20, 2026

View reviewed changes

sayakpaul merged commit d9d3a5d into huggingface:main May 20, 2026
58 of 59 checks passed

		@@ -1,4 +1,4 @@
		.PHONY: style kernel-builder-cli-docs quality bump-dev bump-dev-dry-run bump-release bump-release-dry-run pin-actions
		.PHONY: style kernel-builder-cli-docs quality bump-dev bump-dev-dry-run bump-dev-major bump-dev-major-dry-run bump-release bump-release-dry-run bump-major bump-major-dry-run pin-actions

Conversation

danielfleischer commented May 13, 2026

What's included

Next steps / guidance welcome

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 13, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

sayakpaul May 18, 2026

Choose a reason for hiding this comment

Uh oh!

danielfleischer May 18, 2026

Choose a reason for hiding this comment

Uh oh!

sayakpaul May 18, 2026

Choose a reason for hiding this comment

Uh oh!

danielfleischer May 18, 2026

Choose a reason for hiding this comment

Uh oh!

sayakpaul May 18, 2026

Choose a reason for hiding this comment

Uh oh!

danielfleischer May 18, 2026

Choose a reason for hiding this comment

Uh oh!

sayakpaul May 18, 2026

Choose a reason for hiding this comment

Uh oh!

danielfleischer May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

danielfleischer commented May 18, 2026

Uh oh!

sayakpaul commented May 18, 2026

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented May 19, 2026

Uh oh!

sayakpaul commented May 19, 2026

Uh oh!

danieldk commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

danielfleischer May 18, 2026 •

edited

Loading