Skip to content
Go to file


Failed to load latest commit information.
Latest commit message
Commit time


Kokkos: Core Libraries

Kokkos Core implements a programming model in C++ for writing performance portable applications targeting all major HPC platforms. For that purpose it provides abstractions for both parallel execution of code and data management. Kokkos is designed to target complex node architectures with N-level memory hierarchies and multiple types of execution resources. It currently can use CUDA, HPX, OpenMP and Pthreads as backend programming models with several other backends in development.

Kokkos Core is part of the Kokkos C++ Performance Portability Programming EcoSystem, which also provides math kernels (, as well as profiling and debugging tools (

Learning about Kokkos

A programming guide can be found on the Wiki, the API reference is under development.

For questions find us on Slack: or open a github issue.

For non-public questions send an email to crtrott(at)

A separate repository with extensive tutorial material can be found under

Furthermore, the 'example/tutorial' directory provides step by step tutorial examples which explain many of the features of Kokkos. They work with simple Makefiles. To build with g++ and OpenMP simply type 'make' in the 'example/tutorial' directory. This will build all examples in the subfolders. To change the build options refer to the Programming Guide in the compilation section.

To learn more about Kokkos consider watching one of our presentations:

Contributing to Kokkos

We are open and try to encourage contributions from external developers. To do so please first open an issue describing the contribution and then issue a pull request against the develop branch. For larger features it may be good to get guidance from the core development team first through the github issue.

Note that Kokkos Core is licensed under standard 3-clause BSD terms of use. Which means contributing to Kokkos allows anyone else to use your contributions not just for public purposes but also for closed source commercial projects. For specifics see the LICENSE file contained in the repository or distribution.


Primary tested compilers on X86 are:

  • GCC 4.8.4
  • GCC 4.9.3
  • GCC 5.1.0
  • GCC 5.4.0
  • GCC 5.5.0
  • GCC 6.1.0
  • GCC 7.2.0
  • GCC 7.3.0
  • GCC 8.1.0
  • Intel 15.0.2
  • Intel 16.0.1
  • Intel 17.0.1
  • Intel 17.4.196
  • Intel 18.2.128
  • Clang 3.6.1
  • Clang 3.7.1
  • Clang 3.8.1
  • Clang 3.9.0
  • Clang 4.0.0
  • Clang 6.0.0 for CUDA (CUDA Toolkit 9.0)
  • Clang 7.0.0 for CUDA (CUDA Toolkit 9.1)
  • Clang 8.0.0 for CUDA (CUDA Toolkit 9.2)
  • PGI 18.7
  • NVCC 9.1 for CUDA (with gcc 6.1.0)
  • NVCC 9.2 for CUDA (with gcc 7.2.0)
  • NVCC 10.0 for CUDA (with gcc 7.4.0)
  • NVCC 10.1 for CUDA (with gcc 7.4.0)

Primary tested compilers on Power 8 are:

  • GCC 6.4.0 (OpenMP,Serial)
  • GCC 7.2.0 (OpenMP,Serial)
  • IBM XL 16.1.0 (OpenMP, Serial)
  • NVCC 9.2.88 for CUDA (with gcc 7.2.0 and XL 16.1.0)

Primary tested compilers on Intel KNL are:

  • Intel 16.4.258 (with gcc 4.7.2)
  • Intel 17.2.174 (with gcc 4.9.3)
  • Intel 18.2.199 (with gcc 4.9.3)

Primary tested compilers on ARM (Cavium ThunderX2)

  • GCC 7.2.0
  • ARM/Clang 18.4.0

Other compilers working:

  • X86:
    • Cygwin 2.1.0 64bit with gcc 4.9.3
    • GCC 8.1.0 (not warning free)

Known non-working combinations:

  • Power8:
    • Pthreads backend
  • ARM
    • Pthreads backend

Primary tested compiler are passing in release mode with warnings as errors. They also are tested with a comprehensive set of backend combinations (i.e. OpenMP, Pthreads, Serial, OpenMP+Serial, ...). We are using the following set of flags:

  • GCC:

       -Wall -Wunused-parameter -Wshadow -pedantic
       -Werror -Wsign-compare -Wtype-limits
       -Wignored-qualifiers -Wempty-body
       -Wclobbered -Wuninitialized
  • Intel:

      -Wall -Wunused-parameter -Wshadow -pedantic
      -Werror -Wsign-compare -Wtype-limits
  • Clang:

      -Wall -Wunused-parameter -Wshadow -pedantic
      -Werror -Wsign-compare -Wtype-limits
  • NVCC:

      -Wall -Wunused-parameter -Wshadow -pedantic
      -Werror -Wsign-compare -Wtype-limits

Other compilers are tested occasionally, in particular when pushing from develop to master branch. These are tested less rigorously without -Werror and only for a select set of backends.

Building and Installing Kokkos

Kokkos provide a CMake build system and a raw Makefile build system. The CMake build system is strongly encouraged and will be the most rigorously supported in future releases. Full details are given in the build instructions. Basic setups are shown here:


The best way to install Kokkos is using the CMake build system. Assuming Kokkos lives in $srcdir:

cmake $srcdir \
  -DCMAKE_CXX_COMPILER=$path_to_compiler \
  -DCMAKE_INSTALL_PREFIX=$path_to_install \
  -DKokkos_ARCH_HSW=On \
  -DKokkos_ENABLE_HWLOC=On \

then simply type make install. The Kokkos CMake package will then be installed in $path_to_install to be used by downstream packages.

To validate the Kokkos build, configure with


and run make test after completing the build.

For your CMake project using Kokkos, code such as the following:

target_link_libraries(myTarget Kokkos::kokkos)

should be added to your CMakeLists.txt. Your configure should additionally include




for the install location given above.


An alternative to manually building with the CMake is to use the Spack package manager. To get started, download the Spack repo.

A basic installation would be done as:
> spack install kokkos

Spack allows options and and compilers to be tuned in the install command.

> spack install kokkos@3.0 %gcc@7.3.0 +openmp

This example illustrates the three most common parameters to Spack:

  • Variants: specified with, e.g. +openmp, this activates (or deactivates with, e.g. ~openmp) certain options.
  • Version: immediately following kokkos the @version can specify a particular Kokkos to build
  • Compiler: a default compiler will be chosen if not specified, but an exact compiler version can be given with the %option.

For a complete list of Kokkos options, run:

> spack info kokkos

Spack currently installs packages to a location determined by a unique hash. This hash name is not really "human readable". Generally, Spack usage should never really require you to reference the computer-generated unique install folder. More details are given in the build instructions. If you must know, you can locate Spack Kokkos installations with:

> spack find -p kokkos ...

where ... is the unique spec identifying the particular Kokkos configuration and version. Some more details can found in the Kokkos spack documentation or the Spack website.

Raw Makefile

A bash script is provided to generate raw makefiles. To install Kokkos as a library create a build directory and run the following

> $KOKKOS_PATH/generate_makefile.bash --prefix=$path_to_install

Once the Makefile is generated, run:

> make kokkoslib
> make install

To additionally run the unit tests:

> make build-test
> make test

Run generate_makefile.bash --help for more detailed options such as changing the device type for which to build.

Inline Builds vs. Installed Package

For individual projects, it may be preferable to build Kokkos inline rather than link to an installed package. The main reason is that you may otherwise need many different configurations of Kokkos installed depending on the required compile time features an application needs. For example there is only one default execution space, which means you need different installations to have OpenMP or Pthreads as the default space. Also for the CUDA backend there are certain choices, such as allowing relocatable device code, which must be made at installation time. Building Kokkos inline uses largely the same process as compiling an application against an installed Kokkos library.

For CMake, this means copying over the Kokkos source code into your project and adding add_subdirectory(kokkos) to your CMakeLists.txt.

For raw Makefiles, see the example benchmarks/bytes_and_flops/Makefile which can be used with an installed library and or an inline build.

Kokkos and CUDA UVM

Kokkos does support UVM as a specific memory space called CudaUVMSpace. Allocations made with that space are accessible from host and device. You can tell Kokkos to use that as the default space for Cuda allocations. In either case UVM comes with a number of restrictions:

  • You can't access allocations on the host while a kernel is potentially running. This will lead to segfaults. To avoid that you either need to call Kokkos::Cuda::fence() (or just Kokkos::fence()), after kernels, or you can set the environment variable CUDA_LAUNCH_BLOCKING=1.
  • In multi socket multi GPU machines without NVLINK, UVM defaults to using zero copy allocations for technical reasons related to using multiple GPUs from the same process. If an executable doesn't do that (e.g. each MPI rank of an application uses a single GPU [can be the same GPU for multiple MPI ranks]) you can set CUDA_MANAGED_FORCE_DEVICE_ALLOC=1. This will enforce proper UVM allocations, but can lead to errors if more than a single GPU is used by a single process.

Citing Kokkos

If you publish work which mentions Kokkos, please cite the following paper:

  title = "Kokkos: Enabling manycore performance portability through polymorphic memory access patterns ",
  journal = "Journal of Parallel and Distributed Computing ",
  volume = "74",
  number = "12",
  pages = "3202 - 3216",
  year = "2014",
  note = "Domain-Specific Languages and High-Level Frameworks for High-Performance Computing ",
  issn = "0743-7315",
  doi = "",
  url = "",
  author = "H. Carter Edwards and Christian R. Trott and Daniel Sunderland"


Under the terms of Contract DE-NA0003525 with NTESS, the U.S. Government retains certain rights in this software.

You can’t perform that action at this time.