Summary
When using SplinedPotential with the GPU backend (f32), potentials with small σ values (e.g. σ=2 Å for coarse-grained titratable site beads) produce inaccurate results at long separations, while the reference backend (exact f64) works correctly.
Context
In cif2top, coarse-grained protein models have two classes of beads:
- Backbone beads: σ = 4.5–6.78 Å (Calvados 3 model)
- Titratable site beads: σ = 2.0 Å (point-charge-like, minimal LJ)
When used with duello, the GPU backend fails to produce correct energies (doesn't decay to zero at long separations), while the reference backend works as expected.
Analysis
The issue stems from f32 precision loss in the spline pipeline:
-
Wide grid span: lower_cutoff = σ × 0.6, so for σ=2, r_min = 1.2 Å. With a typical cutoff of 20–50 Å, the spline grid spans a 17–42× range, vs 5–12× for backbone beads.
-
f64 → f32 coefficient truncation: Spline coefficients are computed in f64 but cast to f32 for GPU upload (gpu.rs:pack_coeffs). Near the steep LJ wall (r ≈ 1.2 Å, u ≈ 1500 kJ/mol), coefficients have large magnitudes. After f32 truncation, Horner evaluation accumulates rounding errors that propagate into the tail region.
-
PowerLaw2 grid concentration: With the default PowerLaw2 grid, points are concentrated near r_min, leaving the tail region (where energies should be near-zero) with fewer, wider polynomial intervals — amplifying f32 residual errors.
Potential fixes
- Per-pair validation: After f32 conversion, evaluate the splined potential at a few tail points and warn if the residual exceeds a threshold (e.g. 0.01 kJ/mol).
- Adaptive grid density: For pairs with large r_max/r_min ratios, automatically increase
n_points or switch grid type.
- f32 coefficient rescaling: Normalize coefficients per interval to reduce dynamic range before GPU upload, then rescale in the shader.
- Separate short/long-range splines: Split into two grids at the WCA crossover (2^(1/6)σ) to keep coefficient magnitudes bounded in each region.
Reproduction
# Generate coarse-grained topology with σ=2 titratable sites
cif2top 4lzt.cif convert -o t.xyz --top topology.yaml
# GPU backend — incorrect long-range behavior
duello scan --mol1 t.xyz --mol2 t.xyz --top topology.yaml \
--rmin 24 --rmax 40 --dr 1.0 -r 1.0 --backend gpu
# Reference backend — correct (decays to zero)
duello scan --mol1 t.xyz --mol2 t.xyz --top topology.yaml \
--rmin 24 --rmax 40 --dr 1.0 -r 1.0 --backend reference
Environment
- interatomic 0.5.0 (from git)
- macOS (Apple Silicon)
- Default
SplineConfig (2000 points, PowerLaw2, shift_energy=true)
Summary
When using
SplinedPotentialwith the GPU backend (f32), potentials with small σ values (e.g. σ=2 Å for coarse-grained titratable site beads) produce inaccurate results at long separations, while the reference backend (exact f64) works correctly.Context
In cif2top, coarse-grained protein models have two classes of beads:
When used with duello, the GPU backend fails to produce correct energies (doesn't decay to zero at long separations), while the
referencebackend works as expected.Analysis
The issue stems from f32 precision loss in the spline pipeline:
Wide grid span:
lower_cutoff = σ × 0.6, so for σ=2, r_min = 1.2 Å. With a typical cutoff of 20–50 Å, the spline grid spans a 17–42× range, vs 5–12× for backbone beads.f64 → f32 coefficient truncation: Spline coefficients are computed in f64 but cast to f32 for GPU upload (
gpu.rs:pack_coeffs). Near the steep LJ wall (r ≈ 1.2 Å, u ≈ 1500 kJ/mol), coefficients have large magnitudes. After f32 truncation, Horner evaluation accumulates rounding errors that propagate into the tail region.PowerLaw2 grid concentration: With the default
PowerLaw2grid, points are concentrated near r_min, leaving the tail region (where energies should be near-zero) with fewer, wider polynomial intervals — amplifying f32 residual errors.Potential fixes
n_pointsor switch grid type.Reproduction
Environment
SplineConfig(2000 points, PowerLaw2, shift_energy=true)