Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Barnes-Hut calculations are wrong when ESPResSo is compiled in Debug mode #4774

Closed
jngrad opened this issue Aug 16, 2023 · 1 comment
Closed

Comments

@jngrad
Copy link
Member

jngrad commented Aug 16, 2023

When compiling CUDA code with the nvcc compiler and the -G flag (generate device debug symbols), the behavior of the Barnes-Hut algorithm changes. The octree permutation vector is wrong, which causes variables n and i to be assigned incorrect values. This affects the distance calculation in the following loop:

auto tmp = 0.0f; // compute distance squared
for (int l = 0; l < 3; l++) {
dr[l] = -bhpara->r[3 * n + l] + bhpara->r[3 * i + l];
tmp += dr[l] * dr[l];
}

In the end, the computed forces, torques and energies are wrong. The deviation from the correct value is random, and the forces on individual particles can differ by an order of magnitude. The total energy can be quite close to the real value, and a small fraction of the particles will have the correct forces (up to machine precision), so this bug is easy to overlook when tracking the total energy or the force of a lucky particle.

Something might be fundamentally broken here, probably undefined behavior is invoked. Running all tools from the NVIDIA Compute Sanitizer suite (memcheck, racecheck, initcheck, synccheck) didn't return any error. The bug isn't reproducible with the nvcc -g flag (generate host debug symbols), nor with the clang --cuda-noopt-device-debug flag (generate device debug info).

Here is a MWE adapted from dawaanr-and-bh-gpu.py:

import numpy as np
import espressomd.magnetostatics

np.random.seed(42)

system = espressomd.System(box_l=[1, 1, 1])
system.box_l = [15., 15., 15.]
system.periodicity = 3 * [False]
system.time_step = 1E-4
system.cell_system.skin = 0.1

n_part = 3
part_pos = np.random.random((n_part, 3)) * system.box_l[0]
part_dip = np.random.random((n_part, 3)) * 1.3
system.part.add(pos=part_pos, dip=part_dip)

system.actors.add(espressomd.magnetostatics.DipolarDirectSumGpu(prefactor=1.))
system.integrator.run(steps=0, recalc_forces=True)
dawaanr_f = np.copy(system.part.all().f)
dawaanr_t = np.copy(system.part.all().torque_lab)
dawaanr_e = system.analysis.energy()["total"]

system.actors.clear()

system.actors.add(espressomd.magnetostatics.DipolarBarnesHutGpu(
    prefactor=1., epssq=200., itolsq=8.))
system.integrator.run(steps=0, recalc_forces=True)
bhgpu_f = np.copy(system.part.all().f)
bhgpu_t = np.copy(system.part.all().torque_lab)
bhgpu_e = system.analysis.energy()["total"]

assert np.linalg.norm(bhgpu_f - dawaanr_f) < 1e-6
assert np.linalg.norm(bhgpu_t - dawaanr_t) < 1e-6
assert np.linalg.norm(bhgpu_e - dawaanr_e) < 1e-6

ESPResSo was compiled with maxset and these CMake options:

CC=gcc-10 CXX=g++-10 CUDACXX=/usr/local/cuda-11.5/bin/nvcc /usr/bin/cmake .. \
  -D ESPRESSO_BUILD_WITH_CUDA=ON -D CUDAToolkit_ROOT=/usr/local/cuda-11.5 \
  -D ESPRESSO_BUILD_WITH_CCACHE=ON -D ESPRESSO_BUILD_WITH_STOKESIAN_DYNAMICS=ON -D ESPRESSO_BUILD_WITH_WALBERLA=ON \
  -D ESPRESSO_BUILD_WITH_WALBERLA_FFT=ON -D ESPRESSO_BUILD_WITH_WALBERLA_AVX=ON \
  -D ESPRESSO_BUILD_WITH_SCAFACOS=OFF -D ESPRESSO_BUILD_WITH_HDF5=OFF -D ESPRESSO_BUILD_WITH_GSL=ON  \
  -D CMAKE_CUDA_FLAGS="--compiler-bindir=/usr/bin/g++-10" -D CMAKE_BUILD_TYPE=Debug
@jngrad
Copy link
Member Author

jngrad commented Nov 16, 2023

Barnes-Hut is also broken on AMD GPUs: #3895

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant