GPU-accelerated molecular-dynamics force engine based on the Embedded Atom
Method (EAM). CREAM takes atomic coordinates, element types, and an EAM
potential, and returns energies and forces. It supports multi-element alloys
(.eam.alloy) and runs on native GPU (Vulkan / Metal / DX12) from a single
Rust codebase.
*These codes were developed using LLMs, mainly Claude.
CREAM is a pure force engine. It does not advance time, visualize trajectories,
integrate the equations of motion, or apply thermostats. Integration and
analysis happen in the host application; CREAM supplies the per-step forces
and energies the integrator needs. The recommended integration point for
atomistic-simulation users is the ASE Calculator interface, included in the
Python package.
| Capability | Status |
|---|---|
Single-element and multi-element .eam.alloy parsing |
Supported |
| Orthorhombic PBC | Supported |
| Triclinic PBC | Supported |
| Non-periodic (cluster) calculations | Supported |
| O(N²) AllPairs GPU backend | Supported |
| O(N) Cell List with CPU-built neighbour list | Supported |
O(N) Cell List fully on GPU (cellist_gpu feature) |
Supported |
| CPU-only engine for reference and GPU-less hosts | Supported |
| Python / ASE bindings | Supported |
CREAM consumes standard LAMMPS setfl (.eam.alloy) potential files. The
canonical source is the NIST Interatomic Potentials Repository:
Download the .eam.alloy file for your element(s) from NIST and pass its
path to CREAM. Common starting points:
| Potential file | Elements | NIST page |
|---|---|---|
Cu01.eam.alloy |
Cu | NIST Cu single-element potentials |
Mishin-Ni-Al-2009.eam.alloy |
Ni, Al | NIST Ni-Al binary alloy potentials |
No potential files are bundled with the public release. Make sure the file
you download matches the element order you pass to the calculator or
engine (CREAM's atom_types indices are 0-based into that order).
pip install eam-cream-pythonCREAM is distributed on PyPI as a source distribution (sdist) built
with maturin sdist. A pip install therefore compiles the Rust code
on your machine the first time; expect two or three minutes for the
initial build while wgpu and its transitive dependencies finish
compiling. Subsequent installs into the same Python environment reuse
the compiled cache.
What you need on your machine:
- A Rust toolchain (stable). The easiest install is rustup; any Rust ≥ 1.75 works.
- A C compiler that Rust's
cccrate can find (cc,clang, or MSVC on Windows). Rustup's default install bundles what you need. - ASE is installed automatically as a runtime dependency.
The build is configured (via pyproject.toml) to compile the GPU
cell-list pipeline by default (--features python,cellist_gpu), so
you get a wheel with the full GPU backend without having to set
anything on the command line.
This is the path you want whenever a GPU is available. It drives the
Rust + wgpu engine, the four-pass compute pipeline, and the Morton-
ordered cell list, all on the GPU.
from ase.build import bulk
from cream import CreamCalculator
atoms = bulk("Cu", "fcc", a=3.615) * (10, 10, 10) # 4 000 atoms
# `backend="gpu"` is the default; the argument is shown here for clarity.
# `use_cell_list=True` selects the O(N) Morton cell-list pipeline.
# Without it, AllPairs (O(N²)) is used — fine for N ≲ 2 000, too slow above.
atoms.calc = CreamCalculator(
"Cu01.eam.alloy",
backend="gpu",
use_cell_list=True,
)
energy = atoms.get_potential_energy() # eV (GPU)
forces = atoms.get_forces() # eV/Å, (N,3) (GPU)
per_atom = atoms.get_potential_energies() # eV, (N,) (GPU)
# Verify which backend is actually active.
print(atoms.calc.backend) # → "gpu"If no GPU adapter is available on the host, constructing a
CreamCalculator with backend="gpu" raises a ValueError with
a message starting GPU device lost:. The fail-fast behaviour is
deliberate — a silent fallback to CPU in the middle of an MD
production run would hide 10–100× performance regressions. Users
running on GPU-less hosts should select the CPU backend explicitly
(see below).
CreamCalculator exposes the parsed potential metadata directly. This
is the recommended way to confirm element order before building your
Atoms object.
calc = CreamCalculator("Mishin-Ni-Al-2009_eam.alloy", use_cell_list=True)
# Low-level engine is always accessible.
eng = calc._engine
print(eng.elements) # → ["Ni", "Al"] — atom_types indexing order
print(eng.n_elements) # → 2
print(eng.cutoff) # → 6.5 (Å, from the file header)
print(eng.backend) # → "gpu"Use the CPU backend on machines without a GPU, in CI runners, or when you need bit-for-bit reproducibility across reruns (the CPU path is deterministic; the GPU path is subject to floating-point accumulation order inside workgroups).
atoms.calc = CreamCalculator(
"Cu01.eam.alloy",
backend="cpu",
use_cell_list=True,
)
# Same call surface — only the compute device changes.
e = atoms.get_potential_energy()The CPU backend is rayon-parallel and uses the same Morton cell list as
the GPU path when use_cell_list=True. It scales well to several
million atoms on a modern workstation CPU; it is several × slower than
the GPU path on the same hardware for the same system.
If you are writing your own integrator, or wrapping CREAM from a
non-ASE workflow, call CreamEngine directly. It takes raw NumPy
arrays and returns energy, forces, and per-atom energies.
import numpy as np
from cream import CreamEngine
# Build the engine once — pipelines and potential tables are cached.
engine = CreamEngine(
"Cu01.eam.alloy",
use_cell_list=True,
backend="gpu", # "gpu" (default) or "cpu"
)
# Inputs:
# positions : (N, 3) float64, Cartesian coordinates in Å
# atom_types : (N,) int32, indices into engine.elements
# cell : (3, 3) float64, lattice vectors as rows (ASE/LAMMPS
# convention), or None for a non-periodic cluster
positions = np.array([[0.0, 0.0, 0.0],
[1.8, 1.8, 0.0]], dtype=np.float64)
atom_types = np.array([0, 0], dtype=np.int32)
cell = np.diag([10.0, 10.0, 10.0])
energy, forces, energy_per_atom = engine.compute(
positions, atom_types, cell,
)
# energy : float, eV
# forces : (N, 3) float64, eV/Å
# energy_per_atom : (N,) float64, eV (sum ≈ energy)The engine releases Python's GIL during compute(), so driving many
engines from threads for ensemble sampling is safe and scales with the
number of GPUs and CPU cores available.
For profiling or bug investigations the engine exposes
compute_with_debug, which returns every intermediate buffer of the
cell-list pipeline:
energy, forces, energy_per_atom, debug = engine.compute_with_debug(
positions, atom_types, cell,
)
# debug is None for backend="cpu" or when use_cell_list is False.
# Otherwise a dict with keys:
# n_atoms, n_cells, n_cells_pad, n_morton
# cell_ids (N,) uint32
# sorted_atoms (N,) uint32
# cell_start (n_morton+1,) uint32
# cell_counts (n_morton,) uint32
# reordered_positions (N, 4) float32
# reordered_types (N,) uint32
# densities (N,) float32
# debug_flags (32,) uint32| System size | Recommended settings |
|---|---|
| N ≲ 2 000 | backend="gpu", use_cell_list=False (AllPairs) |
| 2 000 ≲ N ≲ 10⁴ | backend="gpu", use_cell_list=True |
| 10⁴ ≲ N | backend="gpu", use_cell_list=True (mandatory for speed) |
| any N, no GPU host | backend="cpu", use_cell_list=True |
| debugging / CI | backend="cpu" for determinism |
Multi-element alloys work the same way — supply the corresponding
.eam.alloy file and make sure the element order in the file matches
the order of species in your Atoms object.
use cream::{
engine::ComputeEngine,
potential::{eam::EamPotential, NeighborStrategy},
ortho_cell,
};
use std::path::Path;
let mut engine = pollster::block_on(
ComputeEngine::new(NeighborStrategy::AllPairs)
)?;
let potential = EamPotential::from_file(Path::new("Cu01.eam.alloy"))?;
let positions = vec![[0.0f32, 0.0, 0.0, 0.0], [1.8, 1.8, 0.0, 0.0]];
let atom_types = vec![0u32, 0];
let cell = Some(ortho_cell(10.0, 10.0, 10.0));
let result = engine.compute_sync(&positions, &atom_types, cell, &potential)?;
println!("Energy: {:.6} eV", result.energy);CREAM dispatches four GPU compute passes per force evaluation. With the cell list enabled, three additional short passes build the neighbour structure first:
| Pass | Role |
|---|---|
cell_pass0a_assign |
Assign each atom a Morton (Z-order) cell code |
cell_pass0b_sort |
Count atoms per cell via atomic increment |
| CPU prefix sum | Compute cell_start[] boundaries over Morton space |
cell_pass0c_range |
Scatter atoms into sorted order |
eam_pass1_* |
Electron density ρᵢ = Σⱼ fβ(r) (Kahan-compensated) |
eam_pass2_* |
Embedding derivative, forces, and per-atom pair energies |
eam_pass3_reduce |
Tree-reduction of the energy (falls back to CPU for N ≥ 4096) |
Morton-code ordering improves GPU L2 cache hit rates in passes 1 and 2 because spatially adjacent cells map to adjacent indices in the sorted buffer.
All EAM tables (f_α(r), F_α(ρ), F'_α(ρ), φ_αβ(r), f'_α(r),
φ'_αβ(r)) are packed into a single storage buffer so that every shader
stage stays within WebGPU's eight-binding limit.
Detailed architecture documentation lives in docs/ and renders on
Read the Docs.
CREAM supports both orthorhombic and fully triclinic cells. The cell matrix
H stores lattice vectors as rows (the ASE / LAMMPS convention):
H = [[ax, ay, az], ← a
[bx, by, bz], ← b
[cx, cy, cz]] ← c
The minimum-image convention is applied in fractional coordinates
(s = d · H⁻¹ − round(s), then back to Cartesian). Passing None (Rust) or
cell=None (Python) selects the non-periodic cluster path.
| Tool | Version |
|---|---|
| Rust | stable (see rust-toolchain.toml) |
| Python | ≥ 3.9 |
| maturin | ≥ 1.5, < 2.0 |
| ASE | ≥ 3.22 (installed automatically with the Python package) |
git clone https://github.com/ss0832/cream.git
cd cream
maturin develop --release --features python,cellist_gpu--release is recommended even during development: the CPU engine is ~10×
slower in debug mode and some tests time out without it. The
cellist_gpu feature is baked into the default wheel configuration
(pyproject.toml → [tool.maturin].features); you only need to name it
explicitly when invoking maturin develop by hand.
# Python / ASE tests (no GPU needed)
pytest tests_python/ -v
# Rust unit and integration tests (CPU-only)
cargo test --release
# Include GPU tests (requires a GPU or software rasterizer such as llvmpipe / WARP)
cargo test --release -- --include-ignored# Rust
cargo fmt # format
cargo clippy --features python,cellist_gpu -- -D warnings # lint (zero warnings)
# Python
ruff check python/ tests_python/ --fix
ruff format python/ tests_python/
mypy python/cream/ --ignore-missing-imports --no-site-packagesmaturin build --release # binary wheel for the current platform
maturin sdist # source distribution
twine upload target/wheels/cream_python-*.tar.gzmaturin build picks up the feature list from pyproject.toml
(python,cellist_gpu), so no --features flag is needed at release
time.
The package name on PyPI is eam-cream-python. Install with
pip install eam-cream-python; import as from cream import ....
CREAM ships with a suite of diagnostic binaries under src/bin/. They are
not part of the library's public API but can be run directly to reproduce
published performance numbers or to investigate a suspected regression.
| Binary | What it does |
|---|---|
bench_cpu |
CPU engine throughput across a range of atom counts |
bench_gpu |
GPU vs CPU wall-clock sweep over 7 crystal systems × 23 sizes (up to ~10⁷ atoms) |
bench_nial |
Ni-Al alloy potential benchmark (2-element EAM) |
repro_cpu_cl |
Focused CPU AllPairs vs CPU CellList agreement sweep (minimal regression probe) |
diagnose_cellist |
Per-case comparison of GPU and CPU cell-list paths with shader counter dumps |
diagnose_mismatch |
Regression guard for historical cubic / tetragonal / orthorhombic MISMATCH cases |
diagnose_nial |
4-way comparison (CPU-AP / CPU-CL / GPU-AP / GPU-CL) on Mishin Ni-Al 2009 |
profile_gpu_v2 |
GPU workload metrics and per-phase wall-clock breakdown |
Typical invocation:
cargo run --release --bin bench_cpu
cargo run --release --features cellist_gpu --bin bench_gpuBenchmarks that require a 2-element EAM potential (bench_nial,
diagnose_nial) expect Mishin-Ni-Al-2009_eam.alloy at the project root —
download it from NIST first (see Getting an EAM potential above).
bench_gpuat N ≳ 10⁷ can exhaust host RAM because the harness retains force and reference arrays in double precision for multiple backends simultaneously. This is a benchmark-harness limitation, not an engine limit; the engine itself has been exercised up to N ≈ 1.6 × 10⁷ atoms.- The fully GPU-resident cell-list path (
--features cellist_gpu) subdivides its atom-count dispatch atN = 4 194 240(the WebGPU per-dimension workgroup limit × 64). At that boundary there is a small non-monotone step in wall-clock time; seedocs/known_limits.md.
The v0.1.0 release freezes the current public API and data structures. Work planned for subsequent releases:
- Browser / WebAssembly entry point — a WebGPU build is implemented in the source tree but has not yet been validated end-to-end on the current shader set, so it is not advertised as a supported path for v0.1.0.
- Command-line front-end (
cream-cli) — present in the source tree but similarly not yet validated against the current pipeline; it will be re-documented once that work completes.
cream/
├── src/
│ ├── lib.rs # Crate root, re-exports
│ ├── error.rs # CreamError (thiserror)
│ ├── engine.rs # ComputeEngine — GPU 4-pass dispatch
│ ├── cpu_engine.rs # CpuEngine — rayon-parallel CPU fallback
│ ├── cell_list.rs # Morton-ordered cell list (shared CPU/GPU helpers)
│ ├── neighbor_list.rs # CSR neighbour list used by the default CellList path
│ ├── reference.rs # O(N²) f64 CPU reference with PBC helpers
│ ├── python.rs # PyO3 bindings (feature = "python")
│ ├── wasm.rs # WASM bindings (experimental, not advertised)
│ ├── potential/
│ │ ├── mod.rs # GpuPotential trait, NeighborStrategy
│ │ └── eam.rs # EamPotential + .eam.alloy parser
│ ├── shaders/ # WGSL shaders (common + pass0/1/2/3 variants)
│ └── bin/ # Benchmarks and diagnostic tools
├── cream-cli/ # CLI frontend (experimental, not advertised)
├── python/cream/ # ASE Calculator wrapper
├── examples/ # Small example programs
├── tests/ # Rust integration tests
├── tests_python/ # Python / PyO3 tests
├── docs/ # Sphinx + MyST documentation source
├── pyproject.toml # maturin build config
├── Cargo.toml
└── rust-toolchain.toml
-
A. H. Larsen, J. J. Mortensen, J. Blomqvist, I. E. Castelli, R. Christensen, M. Dułak, J. Friis, M. N. Groves, B. Hammer, C. Hargus, E. D. Hermes, P. C. Jennings, P. B. Jensen, J. Kermode, J. R. Kitchin, E. L. Kolsbjerg, J. Kubal, K. Kaasbjerg, S. Lysgaard, J. B. Maronsson, T. Maxson, T. Olsen, L. Pastewka, A. Peterson, C. Rostgaard, J. Schiøtz, O. Schütt, M. Strange, K. S. Thygesen, T. Vegge, L. Vilhelmsen, M. Walter, Z. Zeng, and K. W. Jacobsen,
"The atomic simulation environment — a Python library for working with atoms,"
J. Phys.: Condens. Matter 29, 273002 (2017).
DOI: 10.1088/1361-648X/aa680e -
S. R. Bahn and K. W. Jacobsen,
"An object-oriented scripting interface to a legacy electronic structure code,"
Comput. Sci. Eng. 4, 56–66 (2002).
DOI: 10.1109/5992.998641
-
M. S. Daw and M. I. Baskes,
"Embedded-atom method: Derivation and application to impurities, surfaces, and other defects in metals,"
Phys. Rev. B 29, 6443–6453 (1984).
DOI: 10.1103/PhysRevB.29.6443 -
S. M. Foiles, M. I. Baskes, and M. S. Daw,
"Embedded-atom-method functions for the fcc metals Cu, Ag, Au, Ni, Pd, Pt, and their alloys,"
Phys. Rev. B 33, 7983–7991 (1986).
DOI: 10.1103/PhysRevB.33.7983 -
M. S. Daw, S. M. Foiles, and M. I. Baskes,
"The embedded-atom method: a review of theory and applications,"
Mater. Sci. Rep. 9, 251–310 (1993).
DOI: 10.1016/0920-2307(93)90001-U
- Y. Mishin, M. J. Mehl, and D. A. Papaconstantopoulos,
"Embedded-atom potential for B2-NiAl,"
Phys. Rev. B 65, 224114 (2002).
DOI: 10.1103/PhysRevB.65.224114
CREAM is dual-licensed. The Rust engine, PyO3 bindings, WGSL shaders,
and documentation are MIT-licensed. The ASE Calculator subclass in
python/cream/calculator.py is LGPL-3.0-or-later because it derives from
ASE (ase.calculators.calculator.Calculator), which itself is LGPL-3.0.
| Component | License |
|---|---|
Core Rust library (src/) |
MIT |
Python bindings (src/python.rs) |
MIT |
WGSL shaders (src/shaders/) |
MIT |
Documentation (docs/, README.md) |
MIT |
ASE Calculator wrapper (python/cream/calculator.py) |
LGPL-3.0-or-later |
The effect of this dual licensing on the distributed Python package
(cream-python) is that the binary wheel contains MIT-licensed compiled
code plus an LGPL-3.0-or-later Python file. If you redistribute or
link against cream-python, you must comply with the LGPL-3.0-or-later
terms for the calculator.py component — in particular, any modifications
to calculator.py must be made available under LGPL-3.0-or-later, and
users of your redistributed package must be able to relink against a
modified calculator.py. The LGPL does not require your own
application code to be open-sourced.
See LICENSE and LICENSE.LGPL for the full license texts.