Skip to content

@alexreinking alexreinking released this Sep 16, 2020 · 28 commits to release/10.x since this release

We are pleased to announce the release of Halide 10.0.0!

This is a major update over the previous version, Halide 8.0.0, and contains many new features and a few breaking changes.

What happened to version 9?

For major version numbers, we now use the included LLVM version. We aim to release new versions of Halide at the same cadence as LLVM (every six months or so).


  • There are now multiple autoschedulers, and they have been reworked as plugins. They are each named for the research paper that produced them. The existing autoscheduler is now Mullapudi2016. See the generator documentation for more details.
  • The Adams2019 autoscheduler has been added. It is optimized for x86 CPUs and includes an autotuning mode.
  • The Li2018 autoscheduler has been added and generates CUDA schedules. It is optimized for pipelines using gradient descent features.


  • The CMake build has been rewritten. See for details.
  • The minimum CMake version is now 3.16
  • The old halide.cmake module has been removed in favor of find_package(Halide).
  • We no longer support the MinGW toolchain.

Language features

  • The atomic scheduling directive, which gives you another way to parallelize associative reductions (e.g. histograms, or summations) by emitting atomic instructions when available (and compare-and-swap loops or locks when not).
  • Support for horizontal vector reduction instructions, including dot-product instructions useful in machine learning, via combining the vectorize and atomic directives
  • Integer division or mod by zero now returns zero instead of being undefined behavior.
  • The simplifier is now formally verified.
  • You can now store Funcs that are compute_at GPU blocks in global memory, which is useful if they won't fit in shared memory.
  • Allocation size inference is more precise in a variety of cases.
  • Various bugfixes for compute_with.

Backends and targets

  • Better Direct3D 12 support
  • Added support for macOS and Windows on ARM.
  • We no longer support the legacy buffer_t type.
  • Explicit support for Volta, Turing, Ampere GPUs
Assets 9
  • v8.0.0
  • 65c26cb
  • Compare
    Choose a tag to compare
    Search for a tag
  • v8.0.0
  • 65c26cb
  • Compare
    Choose a tag to compare
    Search for a tag

@abadams abadams released this Aug 27, 2019

New features since last release include:

  • Generate custom pytorch ops from Halide pipelines
  • Automatic differentiation of Halide pipelines
  • A Webassembly backend
  • A Direct3D backend
  • An opt-in caching allocator for cuda to reduce the amount of time spent in cuMemAlloc, cuMemFree
  • float16 and bfloat16 support
  • Faster compilation of very large pipelines
  • New ways to assert properties of arguments, including unchecked assertions, and more aggressive simplifications that exploit these
  • The ability to place Funcs in stack/heap/shared/register memory explicitly with store_in
  • Runtime configuration of Generator inputs/outputs
  • Support for DMA transfers on Hexagon
  • Generate python extension modules from Halide pipelines
  • Lower overhead when calling realize repeatedly on small pipelines
  • Optional strict floating point semantics for single expressions or entire pipelines
  • Producer-consumer task parallelism with Func::async
  • Numerous improvements to Halide::Runtime::Buffer. Consider replacing your custom halide_buffer_t wrappers with it.
  • Many many more small improvements and bug fixes (it has been a while since our last release)

Edit: This release was renamed to use the included llvm version instead of the date. It was formerly named Halide 2019/08/27

Assets 10
Aug 26, 2019
Merge pull request #4174 from halide/srj-tidy
Remove unused 'using' decls to appease clang-tidy
Aug 26, 2019
Merge pull request #4174 from halide/srj-tidy
Remove unused 'using' decls to appease clang-tidy

@steven-johnson steven-johnson released this Feb 15, 2018

You probably want halide-linux-64-trunk, halide-mac-64-trunk or halide-win-distro-64-trunk for linux, os x, and windows respectively. For linux, pay attention to the various gcc versions and download the one that matches your compiler version. You may get linker errors if you download the wrong one.

Notable changes include:

  • Scheduling:
    • New scheduling directive: compute_with
  • Codegen:
    • Better instruction selection for Hexagon
    • Less integer math in cuda kernels
    • Support for warp shuffle instructions on cuda
    • Support for MSAN in Clang
    • X86 Runtime: various AVX2 improvements
  • Fixes:
    • Buffer now uses halide_device_crop API from within the Buffer class instead of just discarding any device allocation when a Buffer is cropped
    • Auto-scheduler: unbounded function bugs
    • halide_print() now defaults to output to stdout rather than stderr
    • Various fixes to corner cases of Buffer<> with const types
  • API:
    • Completely rewrote Python bindings using PyBind11 (not yet complete but much more robust and well-supported)
    • Removed long-deprecated variants of gpu_tile()
    • Added IRMutator2, deprecated IRMutator
  • Apps:
    • replaced apps/hexagon_matmul with apps/nn_ops, which provides fast implementations of common
      deep learning network operations on all platforms that Halide supports
  • Generators:
    • Revised LoopLevel to allow deferred-evaluation, making it easier to compose separate pieces of Halide code (e.g. when the compute_at or store_at may not be known yet)
    • remove Generator::ScheduleParam entirely; added support for GeneratorParam instead
    • Simplified Stubs to no longer be stateful, but just a single "generate" method
  • Build:
    • All prebuilt libHalide versions (both static and dynamic) are now built with RTTI enabled (previously they were built with RTTI disabled)
    • Much better CMake support, including 'make distrib', 'make install', and better test targets
    • Drop support for LLVM 3.9
Assets 18
You can’t perform that action at this time.