Skip to content

ISPC Development Guide

Alexey Nurmukhametov edited this page Aug 29, 2023 · 21 revisions

Table of Contents

Build system

You may follow the steps below or use docker files (available for Linux only) as a reference on how to build LLVM and ISPC.

Prerequisites

  • CMake 3.15 or later - use your package manager to install or to go http://www.cmake.org/cmake/resources/software.html
  • Python 3.6 or later - use your package manager to install or to go http://www.python.org/download/
  • Bison 3.0 or later - use your package manager to install. Note that the default version that comes with macOS is too old (2.3), using HomeBrew to get a newer one is recommended.``
  • Flex 2.6 or later - use your package manager to install.
  • m4 1.4 or later - use your package manager to install.
  • git client to check out LLVM and ISPC sources accordingly. Make sure that git has access to the Internet, i.e. all proxy settings are in place.
Additionally, on Windows you need: On Windows, Cygwin package manager is recommended, but other ways to install required dependencies should be ok.

Additionally, on Linux you need:

  • ncurses 5.0 library or later - use your package manager to install.
  • libc library for all target platforms - use your package manager to install (e.g. `g++-multilib` on Debian/Ubuntu).

1. Building LLVM

First, build the LLVM headers and libraries and the clang compiler on your system. For successful ISPC build LLVM must be built with proper flags. The recommended way to build LLVM is to use alloy.py script from ispc repo. It passes all required flags depending on LLVM version and applies required patches. An alternative is to build it manually.

Note: all examples below are using LLVM 14.0 version but you can use any supported LLVM version.

[Recommended] Building LLVM using alloy.py script

  • You need to set LLVM_HOME environment variable to the directory where your LLVM build will reside. It's also recommended to set ISPC_HOME environment variable to the path of ispc repository (i.e. where alloy.py is), but it's optional.
  • Run the script: alloy.py -b --version=14.0. Alloy will automatically patch known LLVM problems using patches from $ISPC_HOME/llvm_patches/* if the patch has the name of the current building LLVM version in his name. It will take about an hour on an average quad-core laptop. It is strongly recommended to use --selfbuild switch on Linux/Mac OS, it will ensure that both ISPC and LLVM libraries are built with the same compiler. This switch is not supported on Windows at the moment. Also, you may want to use --debug switch that allows you to build debug version of LLVM
Alloy will use three folders: $LLVM_HOME/llvm-version, $LLVM_HOME/build-version, and $LLVM_HOME/bin-version. If you want to name the folder yourself you can use --folder= switch, this will result in $LLVM_HOME/bin-your_folder_name, etc. If you already have folder $LLVM_HOME/bin-name alloy will report fatal error, but you can use --force to rebuild LLVM in existing directory.

Logs will be in alloy_results[date] directory.

Building LLVM manually

  • git clone https://github.com/llvm/llvm-project.git llvm-14.0
  • cd llvm-14.0
  • git checkout -b llvmorg-14.0.1 llvmorg-14.0.1
  • Apply all patches using git apply related to the required LLVM version from ```$ISPC_HOME/llvm_patches``` ls /ispc/llvm_patches | grep 12_0
  • Create build-14.0 and bin-14.0 directories next to llvm-14.0.
Configure LLVM build using CMake:

On Linux:

  • Run cmake from build-14.0 directory with the following options:
cmake  -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/absolute/path/to/bin-14.0 -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_DUMP=ON   -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_INSTALL_UTILS=ON  -DLLVM_TARGETS_TO_BUILD=AArch64\;ARM\;X86  -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=WebAssembly  ../llvm-14.0/llvm

On Mac OS:

  • Run cmake from build-14.0 directory with the following options:
cmake  -G "Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/absolute/path/to/bin-14.0 -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang;libcxx;libcxxabi" -DLLVM_ENABLE_DUMP=ON   -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_INSTALL_UTILS=ON  -DLLVM_TARGETS_TO_BUILD=AArch64\;ARM\;X86  -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=WebAssembly  ../llvm-14.0/llvm

Run make, and then make install. You may want to speed up your build with -jN switch, where N is the number of parallel jobs to run.

On Windows:

  • Run cmake from build-14.0 directory with the following options:
cmake -G "Visual Studio 16" -Thost=x64 -DLLVM_LIT_TOOLS_DIR=/absolute/path/to/GnuWin32/bin -DCMAKE_INSTALL_PREFIX=/absolute/path/to/bin-14.0 -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_ENABLE_DUMP=ON -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_INSTALL_UTILS=ON -DLLVM_TARGETS_TO_BUILD=AArch64\;ARM\;X86 -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=WebAssembly ../llvm-14.0/llvm
  • Open the LLVM.sln file in the LLVM directory. Select the Release configuration, `x64` platform.
  • Right-click on ALL_BUILD in the Solution Explorer, and choose Build.
  • Right-click on INSTALL in the Solution Explorer, and choose Build. This copies the appropriate files into the install directory.
Once LLVM is installed, make sure that the bin directory it installs its binaries into is in your PATH. (In particular, the tools llvm-config and clang must be available to build ISPC.

2. Building ISPC

2.1 Building ISPC for CPU development

You're now ready to build ISPC:

  • Create a build directory mkdir build in ISPC root folder.
On Linux/Mac OS:
  • Run CMake from build directory with the following options: cmake -DCMAKE_INSTALL_PREFIX=/absolute/path/to/ispc/install/folder path/to/ispc/source/root
  • Run make, and then make install. You may want to speed up your build with -jN switch, where N is a number of parallel jobs to run.
Known issues and workarounds:
  • If you are building on RHEL 6.x or CentOS 6.x, and are seeing /usr/bin/ld: cannot find -lpthread during link of ispc, you will need to install the glibc-static package.
  • If you are building on RHEL 6.x or CentOS 6.x, and are seeing /usr/bin/ld: cannot find -lstdc++</<code> during link of ispc, you will need to install the <code>libstdc++-static package.
  • If you run into the following error: /usr/include/gnu/stubs.h:7:11: fatal error: 'gnu/stubs-32.h' file not found you will need to install the 32-bit version of the glibc-devel package as explained at http://stackoverflow.com/questions/7412548/gnu-stubs-32-h-no-such-file-or-directory.
  • The static repository may need to be explicitly enabled. For example on RHEL 6.x, edit /etc/yum.repos.d/redhat.repo and set enabled = 1 in the rhel-6-workstation-optional-rpms section.
On Windows:
  • Run cmake from build directory with the following options: cmake -G "Visual Studio 16" -Thost=x64 -DCMAKE_INSTALL_PREFIX=/absolute/path/to/ispc/install/folder path/to/ispc/source/root
  • Open ispc.sln from the build directory. Select the Release configuration and x64 platform and then build the solution
  • Right-click on INSTALL in the Solution Explorer, and choose Build. This copies the appropriate files into the install directory.
Add /absolute/path/to/ispc/install/folder/bin to your PATH and you're ready to go!

A note regarding versions of LLVM: LLVM 13.0 and later are supported. ISPC usually can be built with recent versions of the LLVM development top-of-tree, though there are occasionally bugs with ToT. It is currently recommended that you build with LLVM 15.0 unless you have a specific reason for using any other LLVM branch.

ISPC-specific variables

  • ARM_ENABLED - build ISPC with ARM support
  • ISPC_INCLUDE_EXAMPLES - generate build targets for the ISPC examples
  • ISPC_INCLUDE_TESTS - generate build targets for the ISPC tests
  • ISPC_INCLUDE_UTILS - generate build targets for the utils
  • ISPC_PREPARE_PACKAGE - generate build targets for ISPC package. It also disables generation of build targets for examples and sets static linking for Linux packages.
You can specify path to required tools using the following variables:
  • BISON_EXECUTABLE - absolute path to Bison executable
  • FLEX_EXECUTABLE - absolute path to Flex executable
  • M4_EXECUTABLE - absolute path to m4 executable

2.2 Building ISPC with cross-OS compilation support

You can build ISPC with cross-OS compilation support by passing -DISPC_CROSS=ON on Linux/macOS and -DISPC_CROSS=ON -DISPC_GNUWIN32_PATH=/absolute/path/to/GnuWin32/bin on Windows to CMake command described in 2.1. Depending on your host system ISPC may be built for Windows, Linux, FreeBSD, macOS, Android, iOS, and PS targets. You can disable any target system by passing -DISPC_[WINDOWS|LINUX|FREEBSD|MACOS|ANDROID|IOS|PS]_TARGET=OFF to CMake. By default the following combinations are supported:

  • Windows host - Windows, Linux, FreeBSD, Android, PS targets
  • Linux host - Linux, FreeBSD, Android targets
  • macOS host - macOS, Android, Linux, iOS targets
Depending on which target OSes are selected you will be asked to provide additional paths:
  • ISPC_GNUWIN32_PATH - path to the root of [GnuWin32](http://gnuwin32.sourceforge.net/)
  • ISPC_MACOS_SDK_PATH - path to the root of Mac OS SDK (is available on macOS systems)
  • ISPC_ANDROID_NDK_PATH - path to the root of [Android](https://developer.android.com/ndk)
  • ISPC_IOS_SDK_PATH - path to the root of iOS SDK (is available on macOS systems)

2.3 Building ISPC with Xe support

Xe-enabled build is supported on Windows and Linux, and it has three additional dependencies:

Please use exact commits SHA as used in Dockerfile and the same LLVM as you plan to build ISPC. CMake commands used in Dockerfile are applicable for both Windows and Linux. Install build artifacts from SPIRV-LLVM-Translator and vc-intrinsics to one folder (XE_DEPS_DIR) and artifacts from level-zero build to another folder (L0_ROOT_DIR). Now you're ready to build ISPC with Xe support:

On Linux:

  • cmake .. -DXE_ENABLED=ON -DXE_DEPS_DIR=$XE_DEPS -DLEVEL_ZERO_ROOT=$L0_ROOT_DIR -DCMAKE_INSTALL_PREFIX=/absolute/path/to/ispc/install/folder && make install

On Windows:

  • cmake -G "Visual Studio 16" -Thost=x64 -DXE_ENABLED=ON -DXE_DEPS_DIR=%XE_DEPS% -DLEVEL_ZERO_ROOT=%L0_ROOT_DIR% -DCMAKE_INSTALL_PREFIX=/absolute/path/to/ispc/install/folder
  • Open Visual Studio and build the solution as described in 2.1

2.4 Building ISPC package with cross-compilation and Xe support

If you want to build a complete ISPC package with cross-compilation and Xe support, you need to have all dependencies prepared as described in sections 2.2 and 2.3, and run the following CMake commands:

On Linux:

  • cmake .. -DISPC_PREPARE_PACKAGE=ON -DISPC_CROSS=ON -DXE_ENABLED=ON -DXE_DEPS_DIR=$XE_DEPS -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/absolute/path/to/ispc/install/folder && make -j8 package

On Windows:

  • cmake -G "Visual Studio 16" -Thost=x64 -DXE_ENABLED=ON -DXE_DEPS_DIR=%XE_DEPS% -DLEVEL_ZERO_ROOT=%L0_ROOT_DIR% -DISPC_CROSS=ON -DISPC_GNUWIN32_PATH=/absolute/path/to/GnuWin32/bin -DISPC_PREPARE_PACKAGE=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/absolute/path/to/ispc/install/folder
  • Open ispc.sln from the build directory. Select the Release configuration and x64 platform and then build the solution.
  • Right-click on PACKAGE in the Solution Explorer, and choose Build. This copies the appropriate files into the install directory.

Test system

Test system structure

The system consists of three main scripts:

  • alloy.py - the main part of test system. This script executes other scripts and gathers their results.
  • run_tests.py - stability part of test system. The script reports the names of failing tests for a given target.
  • perf.py - performance part of test system. The script runs performance tests for CPU targets and reports out results for a given compiler.
Also system has some additional files:
  • fail_db.txt - file with list of known fails.
  • test_static.cpp - C++ file, which executes ISPC functions in stability testing for CPU targets.
  • test_static_l0.cpp - C++ file, which executes ISPC functions in stability testing for Xe targets.
  • test_static.isph - ISPC header containing CPU "export" wrappers for "task" functions used in tests.
  • perf.ini - configuration file for performance testing.
  • common.py - file with common functions of test system.
  • check_env.py - script which reports which tools, SDE and compilers are in your PATHs and other relevant environment variables
Tests are in three directories:
  • examples - performance tests. Each test in its folder
  • tests - stability tests. Executes through test_static.cpp/test_static_l0.cpp
  • tests_errors - negative stability tests (which should report an error).
  • tests/lit-tests - LLVM LIT regression tests

If you want to validate stability

If you want to check stability of ISPC compiler after your changes you should use run_tests.py or alloy.py -r --only=stability. Note that you must set LLVM_HOME and ISPC_HOME for alloy.py

Run_tests.py will execute all tests from “tests” and “tests_errors” directories for a given target, arch and opt_level. Then it will report which failed tests (runfail or compfail). Also run_tests.py will check fail_db.txt file with known fails and will report if the fails are new or known.

If you want to have more general view you should use alloy.py -r --only=stability. This will run run_tests.py for all/selected supported targets, archs, opt_levels and LLVM versions (Note that generic targets will run only with x86-64 arch, this is by design). Then you will have full report about fails, new_fails, passes and new_passes. If you don’t have appropriate LLVM version alloy will silently download and build it. You can select combinations for your test runs by using option --only=”” and select targets by using --only-targets=””. Note that alloy will automatically detect targets of your system and SDE (if you set SDE_HOME environment variable).

If you want to change fail_db.txt file by adding or deleting fails of current run you should use option --update-errors=F (update fails) or --update-errors=FP (update fails and passes) both in alloy.py and run_tests.py.

Test System Design

Each test in tests folder is in a self-contained ispc source file checking particular functionality; it must define two functions:

  • result(uniform float[]), which returns the result that the test function should return
  • a test function, named one of f_v, f_f, f_fu, f_di, f_du, or f_duf. These various names encode the signature of the test function.
The test script compiles each test function to object file for CPU targets or SPIR-V or ZE Binary for Xe targets and then runs the functions above, passing the test function particular values of particular types based on its signature. It then checks to make sure that the values returned from the call to result() match the values returned from the call to the test function; if they don't the differing values are printed along with an error message.

To make this concrete, here is a example of a test (a cleaned-up version of tests/bool-float-typeconv.ispc). It does a quick sanity check of bool to float type conversion.

    #include "../test_static.isph"
    task void f_f(uniform float RET[], uniform float aUniform[]) {
        float a = aUniform[programIndex];
        RET[programIndex] = a < 3.;
    }
    
    task void result(uniform float RET[]) { 
        RET[programIndex] = 0; RET[0] = RET[1] = 1; 
    }

The test starts from #include "../test_static.isph". It contains CPU wrappers for task functions used in tests. It is motivated by different entry points of ISPC program on CPU and Xe - on CPU it is an export function, on Xe it is a task. To have unified set of tests for CPU and Xe we use task modifier in test functions and wrap them into export functions with a launch task inside for CPU using test_static.isph. For example, the wrapper for the f_f task will be:

task void f_f(uniform float res[], uniform float vfloat[]);
export void f_f_cpu_entry_point(uniform float res[], uniform float vfloat[]) { launch[1] f_f(res, vfloat); }
Now let's look into ISPC functions themselves. First, notice that the test function defined here is f_f. In addition to the array in which to store the result values computed by the function, the RET parameter, functions with the name f_f are also passed an array of floats, with values {1,2,3,4,5,6,7,8,9,10,11,12,13,14,15}. The test function converts this to a varying value a by directly indexing into this array with programIndex, giving a the value one in the first program instance and so forth. By inspection, we can see that the boolean test in the last line of f_f should evaluate to true for the first two program instances running, and false for all of the rest, and that the conversion of those bools to floats should put 1 in the first two program instances result values and zero in the rest. These, in turn, are the values that result() reports are expected.

Here are the types and values passed as parameters by the test driver for functions with the various signatures listed above:

    task void f_v(uniform float RET[]);  // i.e. no parameters passed other than the output array

    // a[] = { 1, 2, 3, ... };
    task void f_f(uniform float RET[], uniform float a[]);

    // a[] = { 1, 2, 3, ... };
    // b = 5;
    task void f_fu(uniform float RET[], uniform float a[], float b);

    // a[] = { 1, 2, 3, ... };
    // b[] = { 2, 4, 6, ... };
    task void f_fi(uniform float RET[], uniform float a[], int b[]);

    // a[] = { 1, 2, 3, ... };
    // b[] = { 5, 6, 7, ... };
    task void f_di(uniform float RET[], uniform double a[], int b[]);

    // a[] = { 1, 2, 3, ... };
    // b = 5;
    task void f_du(uniform float RET[], uniform double a[], double b);

    // a[] = { 1, 2, 3, ... };
    // b = 5;
    task void f_duf(uniform float RET[], uniform double a[], float b);

There is a slightly different group of tests related to print (e.g. print_uf, print_f, print_fuf). Both testing function and result prints output to stdout and run_tests.py checks that that output is correct.

Writing New Tests

New functionality should have targeted tests that exercise the set of features that the functionality introduces. If the functionality is in any way dependent on the mask, it's important to exercise a few cases like 'mask all on', 'mask all off', and 'mixed mask'.

Main usage of run_tests.py to understand stability

To run the tests, run the run_tests.py python script in the top-level ispc source directory.

If successful, the test script will print output like:

% ./run_tests.py
Executed 1517 / 1543 (26 skipped)

PASSRATE (1517/1517) = 100%

1517 / 1543 tests PASSED
0 / 1543 tests FAILED compilation
0 / 1543 tests FAILED execution
26 / 1543 tests SKIPPED

<List of skipped tests>

No new fails                            
%

If some tests fail, the test system will generate an additional output indicating which test failed and how it failed. The exit code is equal to the number of tests that failed. Thus, if all pass, it generates a regular exit code of 0.

Useful commands:

  • run_tests.py will run all tests and report about fails, passes, new fails, new passes
  • run_tests.py -a xe64 -t gen9-x16 --platform=skl will run all tests for gen9-x16 target and skl device, and report about fails, passes, new fails, new passes
  • run_tests.py --target= avx1-i32x16 --arch=x86 --no-opt will run all tests with selected target, arch and opt level
  • run_tests.py --target=avx2-i32x16 --wrap-exe=”sde -hsw -- ” will run target through sde
  • run_tests.py --verify will verify file fail_db.txt
  • To add skip rules – add STRING to .ispc file of test: STRING: “// rule: [skip,] on” + [arch=archname,]*

Main usage of alloy.py to validate stability

  • alloy.py -r --only=stability will run tests with (all targets); -O2; (x86, x86-64); (LLVM 3.3, trunk). Each with each.
  • alloy.py -r --only=stability --notify=mail@mail.com will send results to your e-mail
  • alloy.py -r --only=stability --update-errors=F will add new fails to fail_db.txt

Tips for running tests on Windows

When running on Windows, failing tests may cause a dialog window suggesting to find a solution on the web or debug a problem. This may be annoying. To turn it off, you need to do two things:

  • Turn debugging off by adding registry DWORD key "DontShowUI" equal to 1 in "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting". More info is here:
 https://docs.microsoft.com/en-us/windows/win32/wer/wer-settings
  • Turn problem reporting off. Start->Search for "Choose how to report problems"->Choose "Never check for solutions (not recommended)".

ISPC lit tests

In addition to functional tests described in previous sections, ISPC has a set of lit tests in tests/lit-tests based on LLVM test infrastructure. They are mainly used to check compiler code generation or to check compiler diagnostics. To write ISPC lit tests, follow the same rules as for LLVM FileCheck. ISPC lit tests can be run using make check-all on Linux or Tests/check-all target in Visual Studio on Windows. There is a set of features that you can use in lit tests if you need to set up execution rules:

  • WINDOWS_HOST, LINUX_HOST, MACOS_HOST - will run only on particular host
  • X86_ENABLED, ARM_ENABLED, WASM_ENABLED, XE_ENABLED - will execute test if ISPC was built with support of the particular target
  • LLVM_12_0+, LLVM_13_0+, LLVM_14_0+ - will execute test if ISPC was built with particular version of LLVM.
The whole list of supported features, you can find in tests/lit-tests/lit.cfg

Example of ISPC lit test:

// The test checks that cpu definitions (including all synonyms) are successfully consumed by compiler.

// RUN: %{ispc} %s -o %t.o --nostdlib --target=sse2-i32x4 --cpu=znver3
// RUN: %{ispc} %s -o %t.o --nostdlib --target=sse2-i32x4 --cpu=alderlake
// RUN: %{ispc} %s -o %t.o --nostdlib --target=sse2-i32x4 --cpu=adl
// RUN: %{ispc} %s -o %t.o --nostdlib --target=sse2-i32x4 --cpu=sapphirerapids
// RUN: %{ispc} %s -o %t.o --nostdlib --target=sse2-i32x4 --cpu=spr

// REQUIRES: X86_ENABLED
// REQUIRES: LLVM_12_0+

uniform int i;

void foo() {}

All compiler changes should be covered by lit tests.

If you want to validate performance (applied to CPU only)

If you want to validate how your changes influence ISPC performance you should use perf.py or alloy -r --only=performance. Note that you must set LLVM_HOME and ISPC_HOME for alloy.py

If you want to measure performance of your changes use perf.py. Perf.py will build and run all tests listed in perf.ini from “examples/cpu” directory and report performance numbers. If you want to compare performance of two ISPC compilers you should use perf.py --ref=reference_compiler. This will generate a comparison report between two compiler versions.

If you want to compare two branches of ISPC (For example branch with your changes and master) you should use alloy -r --only=performance. This will build the newest LLVM if needed (Note that LLVM will be built silently. If you want selfbuild or source from tar you should use alloy -b first), build your ISPC compiler, switched to “master” branch, build reference compiler and then execute perf.py. Logs will be in alloy_results[date] directory. Option --compare-with=name_of_chekout_or_branch will change reference branch.

If you get suspicious results of runs you can increase the number of runs using -nX ( the switch is available in both alloy.py and perf.py).

Main usage of perf.py to measure performance

  • perf.py will run each test from perf.ini three times and prints results
  • perf.py -n10 will run each test ten times. Use if results have big difference.
  • perf.py -o to_excel.txt will write output file in machine-readable format.
  • perf.py --ref=ispc_ref will compare current ISPC with other ISPC called ispc_ref.
  • To skip tests – comment them in perf.ini

Main usage of alloy.py to compare performance

  • alloy.py -r --only=performance will build test and ref compilers and run perf.py
  • alloy.py -r --only=performance -n10 will run each test ten times
  • alloy.py -r --only=performance --notify=mail@mail.com will send results to your e-mail
  • alloy.py -r --only=performance --compare-with=”old_version” will compare to ISPC from “old_version” branch