C++ Shell CMake Makefile Other
Failed to load latest commit information.
algorithms Rename macros and remove KOKKOS_ENABLE_CXX11 macro usage within Kokkos Jan 26, 2017
benchmarks Benchmarks: update parameters for bytes_and_flops Nov 29, 2016
bin Update nvcc_wrapper to 5383abfb2eece58444299c6709d9e6767e2ee9e1 Dec 5, 2016
cmake CMAKE: More work on pure cmake support Feb 6, 2017
config Merge branch 'develop' tag 2.02.15 Feb 10, 2017
containers KokkosContainers: Adding block partitioning capability to StaticCrsGraph Jan 27, 2017
core Core: trying to fix issue #634 Feb 9, 2017
doc Revert "Update design_notes_space_instances.md" Dec 8, 2016
example Rename macros and remove KOKKOS_ENABLE_CXX11 macro usage within Kokkos Jan 26, 2017
tpls/gtest/gtest Gtes: commenting out unusued function in anonymous namespace Apr 30, 2015
.gitignore Added .cproject and .project to gitignore Apr 20, 2015
CHANGELOG.md Update Changelog for Release 2.02.15 Feb 10, 2017
CMakeLists.txt CMake: add c++11 flag Jan 10, 2017
Copyright.txt Initial extraction of Kokkos from Sandia's Trilinos repository. Apr 9, 2015
HOW_TO_SNAPSHOT Initial extraction of Kokkos from Sandia's Trilinos repository. Apr 9, 2015
LICENSE State that this is 3-clause BSD. Jan 26, 2017
Makefile.kokkos Merge pull request #585 from nmhamster/develop Dec 16, 2016
Makefile.targets Fix warning in casting pointer to uintptr_t for alignment error check. Dec 7, 2016
README State clearly that this is a 3-clause BSD Jan 26, 2017
generate_makefile.bash Add option to specify -j width for compilation in GNU Make system Jan 16, 2017


Kokkos implements a programming model in C++ for writing performance portable
applications targeting all major HPC platforms. For that purpose it provides
abstractions for both parallel execution of code and data management.
Kokkos is designed to target complex node architectures with N-level memory
hierarchies and multiple types of execution resources. It currently can use
OpenMP, Pthreads and CUDA as backend programming models.

Kokkos is licensed under standard 3-clause BSD terms of use. For specifics
see the LICENSE file contained in the repository or distribution.

The core developers of Kokkos are Carter Edwards and Christian Trott
at the Computer Science Research Institute of the Sandia National

The KokkosP interface and associated tools are developed by the Application
Performance Team and Kokkos core developers at Sandia National Laboratories.

To learn more about Kokkos consider watching one of our presentations:
GTC 2015:

A programming guide can be found under doc/Kokkos_PG.pdf. This is an initial version
and feedback is greatly appreciated.

A separate repository with extensive tutorial material can be found under 

If you have a patch to contribute please feel free to issue a pull request against
the develop branch. For major contributions it is better to contact us first
for guidance.

For questions please send an email to

For non-public questions send an email to
hcedwar(at)sandia.gov and crtrott(at)sandia.gov


Primary tested compilers on X86 are:
  GCC 4.7.2
  GCC 4.8.4
  GCC 4.9.2
  GCC 5.1.0
  Intel 14.0.4
  Intel 15.0.2
  Intel 16.0.1
  Intel 17.0.098
  Clang 3.5.2
  Clang 3.6.1
  Clang 3.9.0

Primary tested compilers on Power 8 are:
  GCC 5.4.0 (OpenMP,Serial)
  IBM XL 13.1.3 (OpenMP, Serial) (There is a workaround in place to avoid a compiler bug)

Primary tested compilers on Intel KNL are:
   Intel 16.2.181 (with gcc 4.7.2)
   Intel 17.0.098 (with gcc 4.7.2)

Secondary tested compilers are:
  CUDA 7.0 (with gcc 4.7.2)
  CUDA 7.5 (with gcc 4.7.2)
  CUDA 8.0 (with gcc 5.3.0 on X86 and gcc 5.4.0 on Power8)
  CUDA/Clang 8.0 using Clang/Trunk compiler

Other compilers working:
   PGI 15.4
   Cygwin 2.1.0 64bit with gcc 4.9.3

Known non-working combinations:
   Pthreads backend

Primary tested compiler are passing in release mode
with warnings as errors. They also are tested with a comprehensive set of 
backend combinations (i.e. OpenMP, Pthreads, Serial, OpenMP+Serial, ...).
We are using the following set of flags:
GCC:   -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits
       -Wignored-qualifiers -Wempty-body -Wclobbered -Wuninitialized
Intel: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized
Clang: -Wall -Wshadow -pedantic -Werror -Wsign-compare -Wtype-limits -Wuninitialized

Secondary compilers are passing without -Werror.
Other compilers are tested occasionally, in particular when pushing from develop to 
master branch, without -Werror and only for a select set of backends.

====Getting started=========================================================

In the 'example/tutorial' directory you will find step by step tutorial
examples which explain many of the features of Kokkos. They work with
simple Makefiles. To build with g++ and OpenMP simply type 'make'
in the 'example/tutorial' directory. This will build all examples in the
subfolders. To change the build options refer to the Programming Guide
in the compilation section. 

====Running Unit Tests======================================================

To run the unit tests create a build directory and run the following commands

make build-test
make test

Run KOKKOS_PATH/generate_makefile.bash --help for more detailed options such as
changing the device type for which to build.

====Install the library=====================================================

To install Kokkos as a library create a build directory and run the following

KOKKOS_PATH/generate_makefile.bash --prefix=INSTALL_PATH
make lib
make install

KOKKOS_PATH/generate_makefile.bash --help for more detailed options such as
changing the device type for which to build.


The CMake files contained in this repository require Tribits and are used
for integration with Trilinos. They do not currently support a standalone
CMake build.

====Kokkos and CUDA UVM====================================================

Kokkos does support UVM as a specific memory space called CudaUVMSpace. 
Allocations made with that space are accessible from host and device. 
You can tell Kokkos to use that as the default space for Cuda allocations.
In either case UVM comes with a number of restrictions:
(i) You can't access allocations on the host while a kernel is potentially 
running. This will lead to segfaults. To avoid that you either need to 
call Kokkos::Cuda::fence() (or just Kokkos::fence()), after kernels, or
you can set the environment variable CUDA_LAUNCH_BLOCKING=1.
Furthermore in multi socket multi GPU machines, UVM defaults to using 
zero copy allocations for technical reasons related to using multiple
GPUs from the same process. If an executable doesn't do that (e.g. each
MPI rank of an application uses a single GPU [can be the same GPU for 
multiple MPI ranks]) you can set CUDA_MANAGED_FORCE_DEVICE_ALLOC=1.
This will enforce proper UVM allocations, but can lead to errors if 
more than a single GPU is used by a single process.


Contributions to Kokkos are welcome. In order to do so, please open an issue
where a feature request or bug can be discussed. Then issue a pull request
with your contribution. Pull requests must be issued against the develop branch.