Skip to content

GPU spline accuracy degrades for small σ values #7

@mlund

Description

@mlund

Summary

When using SplinedPotential with the GPU backend (f32), potentials with small σ values (e.g. σ=2 Å for coarse-grained titratable site beads) produce inaccurate results at long separations, while the reference backend (exact f64) works correctly.

Context

In cif2top, coarse-grained protein models have two classes of beads:

  • Backbone beads: σ = 4.5–6.78 Å (Calvados 3 model)
  • Titratable site beads: σ = 2.0 Å (point-charge-like, minimal LJ)

When used with duello, the GPU backend fails to produce correct energies (doesn't decay to zero at long separations), while the reference backend works as expected.

Analysis

The issue stems from f32 precision loss in the spline pipeline:

  1. Wide grid span: lower_cutoff = σ × 0.6, so for σ=2, r_min = 1.2 Å. With a typical cutoff of 20–50 Å, the spline grid spans a 17–42× range, vs 5–12× for backbone beads.

  2. f64 → f32 coefficient truncation: Spline coefficients are computed in f64 but cast to f32 for GPU upload (gpu.rs:pack_coeffs). Near the steep LJ wall (r ≈ 1.2 Å, u ≈ 1500 kJ/mol), coefficients have large magnitudes. After f32 truncation, Horner evaluation accumulates rounding errors that propagate into the tail region.

  3. PowerLaw2 grid concentration: With the default PowerLaw2 grid, points are concentrated near r_min, leaving the tail region (where energies should be near-zero) with fewer, wider polynomial intervals — amplifying f32 residual errors.

Potential fixes

  • Per-pair validation: After f32 conversion, evaluate the splined potential at a few tail points and warn if the residual exceeds a threshold (e.g. 0.01 kJ/mol).
  • Adaptive grid density: For pairs with large r_max/r_min ratios, automatically increase n_points or switch grid type.
  • f32 coefficient rescaling: Normalize coefficients per interval to reduce dynamic range before GPU upload, then rescale in the shader.
  • Separate short/long-range splines: Split into two grids at the WCA crossover (2^(1/6)σ) to keep coefficient magnitudes bounded in each region.

Reproduction

# Generate coarse-grained topology with σ=2 titratable sites
cif2top 4lzt.cif convert -o t.xyz --top topology.yaml

# GPU backend — incorrect long-range behavior
duello scan --mol1 t.xyz --mol2 t.xyz --top topology.yaml \
  --rmin 24 --rmax 40 --dr 1.0 -r 1.0 --backend gpu

# Reference backend — correct (decays to zero)
duello scan --mol1 t.xyz --mol2 t.xyz --top topology.yaml \
  --rmin 24 --rmax 40 --dr 1.0 -r 1.0 --backend reference

Environment

  • interatomic 0.5.0 (from git)
  • macOS (Apple Silicon)
  • Default SplineConfig (2000 points, PowerLaw2, shift_energy=true)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions