Vector Particle-In-Cell (VPIC) Project
VPIC is a general purpose particle-in-cell simulation code for modeling kinetic plasmas in one, two, or three spatial dimensions. It employs a second-order, explicit, leapfrog algorithm to update charged particle positions and velocities in order to solve the relativistic kinetic equation for each species in the plasma, along with a full Maxwell description for the electric and magnetic fields evolved via a second- order finite-difference-time-domain (FDTD) solve. The VPIC code has been optimized for modern computing architectures and uses Message Passing Interface (MPI) calls for multi-node application as well as data parallelism using threads. VPIC employs a variety of short-vector, single-instruction-multiple-data (SIMD) intrinsics for high performance and has been designed so that the data structures align with cache boundaries. The current feature set for VPIC includes a flexible input deck format capable of treating a wide variety of problems. These include: the ability to treat electromagnetic materials (scalar and tensor dielectric, conductivity, and diamagnetic material properties); multiple emission models, including user-configurable models; arbitrary, user-configurable boundary conditions for particles and fields; user- definable simulation units; a suite of "standard" diagnostics, as well as user-configurable diagnostics; a Monte-Carlo treatment of collisional processes capable of treating binary and unary collisions and secondary particle generation; and, flexible checkpoint-restart semantics enabling VPIC checkpoint files to be read as input for subsequent simulations. VPIC has a native I/O format that interfaces with the high-performance visualization software Ensight and Paraview. While the common use cases for VPIC employ low-order particles on rectilinear meshes, a framework exists to treat higher-order particles and curvilinear meshes, as well as more advanced field solvers.
Researchers who use the VPIC code for scientific research are asked to cite the papers by Kevin Bowers listed below.
Bowers, K. J., B. J. Albright, B. Bergen, L. Yin, K. J. Barker and D. J. Kerbyson, "0.374 Pflop/s Trillion-Particle Kinetic Modeling of Laser Plasma Interaction on Road-runner," Proc. 2008 ACM/IEEE Conf. Supercomputing (Gordon Bell Prize Finalist Paper). http://dl.acm.org/citation.cfm?id=1413435
K.J. Bowers, B.J. Albright, B. Bergen and T.J.T. Kwan, Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulation, Phys. Plasmas 15, 055703 (2008); http://dx.doi.org/10.1063/1.2840133
K.J. Bowers, B.J. Albright, L. Yin, W. Daughton, V. Roytershteyn, B. Bergen and T.J.T Kwan, Advances in petascale kinetic simulations with VPIC and Roadrunner, Journal of Physics: Conference Series 180, 012055, 2009
Getting the Code
To checkout the VPIC source, do the following:
git clone https://github.com/lanl/vpic.git
The stable release of vpic exists on
master, the default branch.
For more cutting edge features, consider using the
User contributions should target the
The primary requirement to build VPIC is a C++11 capable compiler and an up-to-date version of MPI.
VPIC uses the CMake build system. To configure a build, do the following from the top-level source directory:
mkdir build cd build
./arch directory also contains various cmake scripts (including specific build options) which can help with building, but the user is left to select which compiler they wish to use. The scripts are largely organized into folders by compiler, with specific flags and options set to match the target compiler.
Any of the arch scripts can be invoked specifying the file name from inside a build directory:
After configuration, simply type:
Three scripts in the
./arch directory are of particular note: lanl-ats1-hsw, lanl-ats1-knl and lanl-cts1. These scripts
provide a default way to build VPIC on LANL ATS-1 clusters such as Trinity and Trinitite and LANL CTS-1 clusters. The LANL
ATS-1 clusters are the first generation of DOE Advanced Technology Systems and consist of a partition of dual socket Intel
Haswell nodes and a partition of single socket Intel Knights Landing nodes. The LANL CTS-1 clusters are the first generation
of DOE Commodity Technology Systems and consist of dual socket Intel Broadwell nodes running the TOSS 3.3 operating system.
The lanl-ats1-hsw, lanl-ats1-knl and lanl-cts1 scripts are heavily documented and can be configured to provide a large
variety of custom builds for their respective platform types. These scripts could also serve as a good starting point for
development of a build script for other platform types. Because these scripts also configure the users build environment
via the use of module commands, the scripts run both the cmake and make commands.
From the user created build directory, these scripts can be invoked as follows:
Advanced users may choose to instead invoke
cmake directly and hand select options. Documentation on valid ways
to select these options may be found in the lanl-ats1 and lanl-cts1 build scripts mentioned above.
GCC users should ensure the
-fno-strict-aliasing compiler flag is set (as shown in
Building an example input deck
After you have successfully built VPIC, you should have an executable in
bin directory called
./bin/vpic). To build an executable from one of
the sample input decks (found in
./sample), simply run:
where input_deck is the name of your sample deck. For example, to build the harris input deck in the sample subdirectory (assuming that your build directory is located in the top-level source directory):
Beginners are advised to read the harris deck thoroughly, as it provides many examples of common uses cases.
Command Line Arguments
Note: Historic VPIC users should note that the format of command line arguments was changed in the first open source release. The equals symbol is no longer accepted, and two dashes are mandatory.
In general, command line arguments take the form
--command value, in which two dashes are followed by a keyword, with a space delimiting the command and the value.
The following specific syntax is available to the users:
Threading (per MPI rank) can be enabled using the following syntax:
./binary.Linux --tpp n
Where n specifies the number of threads
mpirun -n 2 ./binary.Linux --tpp 2
To run with VPIC with two threads per MPI rank.
VPIC can restart from a checkpoint dump file, using the following syntax:
./binary.Linux --restore <path to file>
./binary.Linux --restore ./restart/restart0
To restart VPIC using the restart file
Compile Time Arguments
Currently, the following options are exposed at compile time for the users consideration:
Particle Array Resizing
OFF): Enable to disable the use of dynamic particle resizing
SET_MIN_NUM_PARTICLES(default 128 [4kb]): Set the minimum number of particles allowable when dynamically resizing
USE_PTHREADS: Use Pthreads for threading model, (default
USE_OPENMP: Use OpenMP for threading model
The following CMake variables are used to control the vector implementation that VPIC uses for each SIMD width. Currently, there is support for 128 bit, 256 bit and 512 bit SIMD widths. The default is for each of these CMake variables to be disabled which means that an unvectorized reference implementation of functions will be used.
USE_V4_SSE: Enable 4 wide (128-bit) SSE
USE_V4_AVX: Enable 4 wide (128-bit) AVX
USE_V4_AVX2: Enable 4 wide (128-bit) AVX2
USE_V4_ALTIVEC: Enable 4 wide (128-bit) Altivec
USE_V4_PORTABLE: Enable 4 wide (128-bit) portable implementation
USE_V8_AVX: Enable 8 wide (256-bit) AVX
USE_V8_AVX2: Enable 8 wide (256-bit) AVX2
USE_V8_PORTABLE: Enable 8 wide (256-bit) portable implementation
USE_V16_AVX512: Enable 16 wide (512-bit) AVX512
USE_V16_PORTABLE: Enable 16 wide (512-bit) portable implementation
Several functions in VPIC have vector implementations for each of the three SIMD widths. Some only have a single implementation. An example of the latter is move_p which only has a reference implementation and a V4 implementation.
It is possible to have a single CMake vector variable configured as ON for each of the three supported SIMD vector widths. It is recommended to always have a CMake variable configured as ON for the 128 bit SIMD vector width so that move_p will be vectorized. In addition, it is recommended to configure as ON the CMake variable that is associated with the native SIMD vector width of the processor that VPIC is targeting. If a CMake variable is configured as ON for each of the three available SIMD vector widths, then for a given function in VPIC, the implementation which supports the largest SIMD vector length will be chosen. If a V16 implementation exists, it will be chosen. If a V16 implementation does not exist but V8 and V4 implementations exist, the V8 implementation will be chosen. If V16 and V8 implementations do not exist but a V4 implementation does, it will be chosen. If no SIMD vector implementation exists, the unvectorized reference implementation will be chosen.
In summary, when using vector versions on a machine with 256 bit SIMD, the V4 and V8 implementations should be configured as ON. When using a machine with 512 bit SIMD, V4 and V16 implementations should be configured as ON. When choosing a vector implementation for a given SIMD vector length, the implementation that is closest to the SIMD instruction set for the targeted processor should be chosen. The portable versions are most commonly used for debugging the implementation of new intrinsics versions. However, the portable versions are generally more performant than the unvectorized reference implemenation. So, one might consider using the V4_PORTABLE version on ARM processors until a V4_NEON implementation becomes available.
VPIC_PRINT_MORE_DIGITS: Enable more digits in timing output of status reports
Particle sorting implementation
The CMake variable below allows building VPIC to use the legacy, thread serial implementation of the particle sort algorithm.
USE_LEGACY_SORT: Use legacy thread serial particle sort, (default
The legacy particle sort implementation is the thread serial particle sort implementation from the legacy v407 version of VPIC. This implementation supports both in-place and out-of-place sorting of the particles. It is very competitive with the thread parallel sort implementation for a small number of threads per MPI rank, i.e. 4 or less, especially on KNL because sorting the particles in-place allows the fraction of particles stored in High Bandwidth Memory (HBM) to remain stored in HBM. Also, the memory footprint of VPIC is reduced by the memory of a particle array which can be significant for particle dominated problems.
The default particle sort implementation is a thread parallel implementation. Currently, it can only perform out-of-place sorting of the particles. It will be more performant than the legacy implementation when using many threads per MPI rank but uses more memory because of the out-of-place sort.
Contributors are asked to be aware of the following workflow:
- Pull requests are accepted into
develupon tests passing
mastershould reflect the stable state of the code
- Periodic releases will be made from
Feedback, comments, or issues can be raised through GitHub issues.
A mailing list for open collaboration can also be found here
Version release summary:
V1.2 (October 2020)
- Improved Neon intrinsics support
- Added Takizuka-Abe collision operator
- Threaded hydro_p pipelines
- Added unit documentation
V1.1 (March 2019)
- Added V8 and V16 functionality
- Improved documentation and build processes
- Significantly improved testing and correctness capabilities
This software has been approved for open source release and has been assigned LA-CC-15-109.
© (or copyright) 2020. Triad National Security, LLC. All rights reserved. This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S. Department of Energy/National Nuclear Security Administration. All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.
VPIC is distributed under a BSD license.