Skip to content

v0.13.0

Choose a tag to compare

@github-actions github-actions released this 10 Apr 14:31
· 158 commits to main since this release

New features

kernels 0.13.0 is a feature-packed release with among other things an improved CLI for building kernels (kernel-builder), Torch 2.11 support, and a tech-preview of TVM FFI support.

kernel-builder CLI overhaul

The build2cmake command has been renamed to kernel-builder. This new tool can be used to develop, build, and upload kernels without directly using Nix.

These are the main subcommands for the new kernel-builder CLI:

  • kernel-builder init: scaffold a new kernel, including tests and benchmarks.
  • kernel-builder build: build a kernel.
  • kernel-builder build-and-copy: build a kernel and copy artifacts to the build directory.
  • kernel-builder build-and-upload: build a kernel and upload it to the Hub.
  • kernel-builder create-pyproject: create Python project such as pyproject.toml to develop kernels in IDEs and editors.
  • kernel-builder devshell / kernel-builder testshell — drop into a development or test shell for a kernel.
  • kernel-builder upload: upload a built kernel to the Hugging Face Hub.
  • kernel-builder list-variants — list all supported build variants for a kernel.

The build, devshell, and testshell subcommands accept a --variant flag to select a specific build variant. All subcommands accept a directory argument instead of requiring a specific working directory.

An installation script is also provided to help new users get a working kernel-builder environment set up quickly, including Nix, the binary cache, and the required trusted-user configuration. Go to the following page for information on how to get started:

https://huggingface.co/docs/kernels/main/en/builder/writing-kernels#quick-install

PyTorch 2.11 support

kernel-builder now supports Torch 2.11. Torch 2.9 support has been removed in accordance with our policy of supporting the two latest PyTorch versions.

TVM FFI kernels (tech preview)

kernels 0.13 adds support for TVM FFI kernels. TVM FFI aims to be a single ABI for multiple frameworks, such as Torch, JAX, NumPy, and CuPy. TVM FFI support is a tech preview. For instance, we might still make changes to the build.toml options for TVM FFI, change the kernel source layout, or change the provided helper functions.

The kernels examples directory provides ReLU and CUTLASS example kernels that use TVM FFI.

Card filling

kernel-builder now supports card filling. If the kernel source repository contains a CARD.md template, building a kernel will fill the template with details about the kernel. When a kernel is uploaded (with kernel-builder upload or kernel-builder build-and-upload), the card will be uploaded as the README.md of the Hub repository. The default card template can be generated with kernels init.

kernels skills

We added a new CLI command for installing an agent-compatible skill. Use kernels skills add to install the skills for AI coding assistants like Claude, Codex, and OpenCode. For now, only the cuda-kernels skill is supported. Skill files are downloaded from the huggingface/kernels directory in this repository. ROCm kernel skills are on the way.

Local kernel overrides

Kernels can now be overridden locally without changing any get_kernel call sites. Set the LOCAL_KERNELS environment variable to a colon-separated list of org/repo=local_path pairs:

LOCAL_KERNELS=kernels-community/activation=/path/to/local/activation

This is useful for testing kernel changes locally before uploading them to the Hub.

This is useful when running some operations on CPU while the rest of the model runs on a GPU.

More reliable uploads of kernels with a very large number of files

Large kernel uploads are now automatically split across multiple commits to stay within Hub limits, rather than failing or requiring manual intervention for kernels with many files.

What's Changed

  • Use lowercase for ninja install in the Windows builder by @danieldk in #237
  • update Dockerfile override with monorepo by @drbh in #239
  • Ensure that metadata.json is correctly added to the output of Windows builds by @danieldk in #242
  • Set version to 0.12.2.dev0 by @danieldk in #238
  • update relative paths and readme cleanups by @drbh in #240
  • Move all kernel component handling to CMake functions by @danieldk in #243
  • Fix torchVersions argument of genKernelFlakeOutputs by @danieldk in #246
  • build2cmake: always generate kernel components for all backends by @danieldk in #245
  • Factor out render_binding and render_extensions by @danieldk in #248
  • Use single setup.py and move writing to common module by @danieldk in #250
  • Move writing of CMake utility fails and ops wrapper to common by @danieldk in #249
  • Improve benchmark command by @drbh in #244
  • Factor out render_deps function by @danieldk in #251
  • Fix build set issues by @danieldk in #252
  • CI: relax timeouts for Hub-based tests by @danieldk in #254
  • Combine CMake preambles for all backends into a single preamble by @danieldk in #253
  • Remove the last backend-specific writer functions by @danieldk in #255
  • Ignore flake locks in examples by @danieldk in #257
  • Fix XPU build by @danieldk in #256
  • Fix the XPU compilation issue by @YangKai0616 in #258
  • Remove previous team members from authors by @julien-c in #261
  • feat: include benchmark dir in bundle by @drbh in #260
  • Remove backend-specific generation and also use CMake variant generation in Nix by @danieldk in #259
  • cmake: merge loops for handing Python and data extensions by @danieldk in #266
  • add init command that pulls template repo by @drbh in #247
  • Add the backend to the ops name by @danieldk in #267
  • Support local kernels in benchmark by @drbh in #265
  • Make CLI-related modules submodules of cli by @danieldk in #269
  • Add support for overriding kernels locally by @danieldk in #271
  • Fix versions torch dependency by @danieldk in #272
  • CMake: merge two condition blocks by @danieldk in #273
  • Upgrade GitHub Actions to latest versions by @salmanmkc in #232
  • Cleanup huggingface hub integration by @drbh in #274
  • Rename cutlass-sycl to sycl-tla by @YangKai0616 in #277
  • get_kernel: support specifying the backend by @danieldk in #268
  • feat: move template into project by @drbh in #275
  • build2cmake: add support for family suffix in CUDA capabilities by @danieldk in #280
  • Benchmark graphics by @drbh in #270
  • [FEATURE] add kernels skills add to the cli by @burtenshaw in #278
  • add cachix to flake and update buildSet by @drbh in #282
  • gen-flake-outputs: add ci-test package by @danieldk in #281
  • add utilities to generate template repo cards by @sayakpaul in #210
  • include repo_id in the card usage. by @sayakpaul in #284
  • Fix aarch64-linux and add it to CI by @danieldk in #286
  • chore: fix minor markdown backtick mistake by @HyperBlaze456 in #289
  • feat: enforce strict kernel name by @drbh in #290
  • pass revision to to cmake template by @drbh in #291
  • builder: support no-arch builds without Nix by @danieldk in #288
  • fix: adjust the template publish workflow by @drbh in #295
  • fix: update template and init to use new repo and format by @drbh in #296
  • fix: adjust token for upload to hub by @drbh in #297
  • update init command to respect naming convention by @drbh in #294
  • feat: add init and benchmarks commands to docs by @drbh in #276
  • run make style and fix Makefile. by @sayakpaul in #301
  • feat: support redirection by @drbh in #293
  • Fix metadata filename for no-arch kernels by @danieldk in #305
  • build2cmake: add GPU archs to metadata.json by @danieldk in #304
  • ci-test: do not try to use a cache directory by @danieldk in #292
  • add terraform scripts by @sayakpaul in #303
  • Set version to 0.13.0.dev0 by @danieldk in #311
  • builder: fix python_dependencies.json path by @danieldk in #310
  • terraform docs. by @sayakpaul in #309
  • python3Packages.tvm-ffi: init at 0.19 by @danieldk in #318
  • build-and-upload: do not default to kernels-community by @danieldk in #306
  • [core]: implement upload to a new branch respecting versioning policy by @sayakpaul in #322
  • kernels: add support from tvm-ffi build variants by @danieldk in #320
  • Add support for tvm-ffi to build2cmake and the Nix builder by @danieldk in #328
  • Add CUDA 12.9 build variant for Torch 2.9 by @danieldk in #332
  • [core] feat: improve variant resolution by @sayakpaul in #330
  • [docs] start a doc on why kernels. by @sayakpaul in #325
  • Revert "[core] feat: improve variant resolution" by @sayakpaul in #334
  • minor edits to why kernels by @sayakpaul in #333
  • python3Packages.nvidia-cutlass-dsl: 4.3.0 -> 4.4.1 by @danieldk in #335
  • feat: upload in multiple commits when many files by @drbh in #300
  • [ci] don't trigger heavy builds for doc changes. by @sayakpaul in #337
  • build2cmake -> kernel-builder, move nix bits to nix-builder by @danieldk in #340
  • Fix Windows build by @danieldk in #341
  • Refactor variant handling and add CUDA fallback by @danieldk in #339
  • warn users when they are using older versions. by @sayakpaul in #342
  • Kernel docs followups by @sayakpaul in #287
  • kernel-builder generate -> kernel-builder create-pyproject by @danieldk in #344
  • kernel-builder: restructure Torch templates by @danieldk in #358
  • kernel-builder: add CUDA capability detection for tvm-ffi builds by @danieldk in #357
  • Assorted changes to prepare for FA4 support by @danieldk in #363
  • Refine dependency specification by @danieldk in #364
  • [ci] filter heavy ci on just doc related changes by @sayakpaul in #361
  • Fix a rename gone wrong: pyproject::compat -> pyproject::common by @danieldk in #366
  • [docs] add a page to discuss how kernels is integrated. by @sayakpaul in #360
  • replace workflow with huggingface/cuda-toolkit by @sayakpaul in #367
  • kernel-builder: pass number of nvcc threads when building for tvm-ffi by @danieldk in #369
  • kernel-builder: add support for tvm-ffi XPU kernels by @danieldk in #368
  • Fix link to terraform directory in writing-kernels.md by @tomaarsen in #373
  • trigger doc builds for release tags. by @sayakpaul in #374
  • add a workflow to fill docs for prev version tags. by @sayakpaul in #375
  • feat: remove docker files and references by @drbh in #371
  • Revert "add a workflow to fill docs for prev version tags." by @sayakpaul in #377
  • nix-builder: wire up get-kernel-check in tvm-ffi builds by @danieldk in #380
  • ReLU: example, fix unaligned loads, zero size arrays by @danieldk in #381
  • nix-builder: add tvm-ffi to the test shell for tvm-ffi kernels by @danieldk in #382
  • [builder] feat: implement upstream field for build.toml. by @sayakpaul in #379
  • kernel-builder: use tvm-ffi dynamic library extension by platform by @danieldk in #384
  • nix-builder: cache einops by @danieldk in #386
  • kernel-builder: add support for C++ dependencies in tvm-ffi kernels by @danieldk in #385
  • Make all kernel-builder subcommands take a directory by @danieldk in #387
  • [test] add user agent test suite and fix typo. by @sayakpaul in #365
  • move makefile to project root. by @sayakpaul in #390
  • kernel-builder: add completions subcommand for generating shell completions by @danieldk in #388
  • always upload kernel system card to main by @sayakpaul in #389
  • Fix handling of noarch and tvm-ffi variants by @danieldk in #391
  • kernel-builder: add devshell/testshell by @danieldk in #394
  • feat: always include backend in metadata by @drbh in #392
  • Move metadata.json and build.toml datastructures to separate crate by @danieldk in #395
  • kernel-builder: share common Nix options and add option to print build logs by @danieldk in #397
  • ci: speed up and improve kernels tests CI by @danieldk in #398
  • Add Torch 2.11 support for CUDA/CPU and remove Torch 2.9 by @danieldk in #399
  • feat: implement strict type validation using strict from huggingface_hub by @sayakpaul in #393
  • python3Packages.transformers: 4.57.3 -> 5.3.0 by @danieldk in #400
  • feat: add init and upload in kernel-builder cli by @drbh in #378
  • Add ROCm support for Torch 2.11 by @danieldk in #401
  • feat: remove init from python cli by @drbh in #407
  • Use kernel-builder upload in the Nix builder by @danieldk in #406
  • feat: remove card logic from python cli by @drbh in #408
  • Add XPU support for Torch 2.11 by @danieldk in #409
  • nix-builder: add Torch 2.11 supported ROCm archs by @danieldk in #411
  • Support fast kernel build variant enumeration and expose variants by @danieldk in #419
  • kernel-builder: add variants support by @danieldk in #420
  • add an onboarding script for new kernel-builder users by @sayakpaul in #418
  • Update nixpkgs and other flake dependencies by @danieldk in #425
  • [ci] e2e kernel builder tests by @sayakpaul in #416
  • nix-builder: incorrect variant use in torch attr by @danieldk in #426
  • kernels: add a function system_variants by @danieldk in #423
  • 🔒 Pin GitHub Actions to commit SHAs by @paulinebm in #424
  • kernel-builder: use tag argument instead of rev for tags in fetchFromGitHub by @NickCao in #432
  • nix-builder: fix typo causing blas to be used instead of lapack when useMKL is false by @NickCao in #433
  • nix-builder: set missing meta.sourceProvenance and other meta attributes by @NickCao in #434
  • Add JAX and NumPy examples by @danieldk in #431
  • kernels: remove unused init/generate-readme subcommands by @danieldk in #438
  • Update the docs to use kernel-builder in place of nix by @danieldk in #437

New Contributors

Full Changelog: v0.12.3...v0.13.0