Skip to content

'goading' - platform-agnostic SIMD relying on autovectorization of small loops #264

@kfjahnke

Description

@kfjahnke

As a follow-up to personal communication with @jan-wassenberg, I'd like to propose a technique here which I implemented as an intermediate layer between code intending to 'behave' in a SIMD fashion and the layer which implements the actual SIMD code. The reasoning goes like this:

  • to implement a SIMD algorithm efficiently, the program has to provide a SIMD-friendly structure
  • this structure is best expressed by using appropriate data types
  • generation of the actual SIMD code should remain an implementation detail, and allow for varying 'engines'
  • a system of generic types can be used to express the first point in a platform-agnostic way
  • and to speed things up, it can be specialized to use a more efficient engine where possible or necessary

In my b-spline library I use this approach to good effect: I define a type vspline::simd_type which acts as a catch-all implementation of a class template structuring intended SIMD programming. This template 'behaves' like a 'proper' SIMD type, but internally, it simply has small fixed-size std::array-like containers and the 'SIMD' operations are implemented as small loops. The template mimicks Vc's SimdArray, and one layer 'above' the 'simdized' type used by vspline for a given elementary type is inherited from Vc::SimdArray if possible, and from vspline::simd_type otherwise. The resulting type is meant to behave in the same way, no matter what it's 'progeny' is. This construction has several nice effects:

  • the code can be run without a SIMD library by relying entirely on vspline::simd_type
  • when using a SIMD library, one can fine-tune the specialization
  • scalar code can be enforced at the intermediate layer by specifying a vector size of one
  • platforms not covered by a given SIMD library can still be addressed

The first point is especially important. It allows to run the SIMD-structured code on any machine. At first sight this looks like a good option for testing, but deeper inspection reveals that by communicating the SIMD intent clearly through the code's structure, it can also be 'picked up' by the compiler's optimizer, and, if 'goading' the compiler works as intended, the resulting code may perform a bona fide SIMD operation without the need to actually tell the compiler how to arrive at this solution (like, with intrinsics). It turns out that for a fair amount of code the compiler can provide near-optimal solutions, which it might not have found had the code merely stated it's intent in a scalar form. There is even a chance to get better machine code: If a SIMD library is used to explicitly code SIMD instructions, this process may be opaque to the optimizer. The 'goading' approach is entirely transparent, there are no hard-to-penetrate intrinsics, and optimization can possibly restructure the machine code in a way a human programmer might find hard to fathom and write explicitly.

In my experience, clang++ is especially good at 'picking up' the SIMD intent in goading code, but this is a moving target, compilers evolve quickly, and my experience is limited.

SIMD libraries tend to offer a fallback to scalar code where 'hardware' SIMD can't be had, but this still requires the entire library to be present. Using a small header like vspline::simd_type.h, which has no dependencies, removes the dependency from a given library and makes SIMD usable even on exotic platforms. On the compiler side, it's helpful if the compiler writers specify which small-loop constructs they will likely vectorize, and clang++ offers such documentation.

There is yet another aspect here: 'goading' expresses a SIMD intent, but the compiler, seeing the 'big picture', may decide that SIMD is inappropriate for a given construct, due to a cost model which may turn out that the code is better run without using SIMD instructions. This option may be blocked by explicit SIMD instruction use - a case of overly zealous early specialization.

I hope that this little write-up can add to the effort to make use of SIMD programming more widespread, which I think is critical, especially for number-crunching applications. vspline::simd_type is FOSS, licensed under the expat library. I'd like to see an echo to this post; @jan-wassenberg proposed to start discussing SIMD-related topics on highway's issue tracker, so here we go.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions