Multiple simulations on the same GPU are interfering with each other! #13

raimis · 2020-07-23T17:35:37Z

If two or more simulations are running on the same GPU and using TorchForce, they produce incorrect results. Other GPU processes (i.e. VMD) also affect TorchForce, but the effects are less deterministic. Typically this manifests as random "explosions" of the system.

Versions:

OpenMM-Torch is built from master using Recipe for a conda package #9.
PyTorch 1.4.0 is from https://anaconda.org/anaconda/pytorch

The problem can be reproduced with:

from sys import stdout
from simtk import unit as u
import simtk.openmm as mm
from simtk.openmm import app

system = mm.System()
system.addParticle(1)

if True:
    import torch
    from openmmtorch import TorchForce

    class Force(torch.nn.Module):
        def forward(self, positions):
            return torch.sum(positions**2)
    torch.jit.script(Force()).save('model.pt')

    system.addForce(TorchForce('model.pt'))

else:
    force = mm.CustomExternalForce('x^2+y^2+z^2')
    force.addParticle(0)
    system.addForce(force)

integrator = mm.VerletIntegrator(1*u.femtosecond)
platform = mm.Platform.getPlatformByName('CUDA')
properties = {'CudaPrecision': 'mixed', 'DeviceIndex': '2'}

simulation = app.Simulation(app.Topology(), system, integrator, platform, properties)
simulation.context.setPositions([[1, 2, 3]])
simulation.reporters.append(app.StateDataReporter(stdout, 1000, step=True, totalEnergy=True, speed=True, separator='\t'))
simulation.step(100000)

If only one simulation is running, the total energy is conserved as expected:

#"Step" "Total Energy (kJ/mole)"        "Speed (ns/day)"
1000    14.000000179612432      0
2000    14.000005361777896      349
3000    14.000001366072164      367
4000    14.000005119669993      362
5000    14.000003331103173      336
6000    14.000002685719684      325
7000    14.000005601208187      328
8000    14.000000713319633      330
9000    14.000007006230264      338
10000   14.000000022274946      344
11000   14.00000683213267       349
12000   14.000000738332375      353
13000   14.00000512025951       350
14000   14.00000300080587       347
15000   14.00000395686458       332
16000   14.000004841749305      324
17000   14.000001304157916      318
18000   14.000006327369546      309
19000   14.000000236521004      303
20000   14.000007625364647      296
...

If two simulations are running, the energy conservation degrades:

#"Step" "Total Energy (kJ/mole)"        "Speed (ns/day)"
1000    14.000159708061787      0
2000    14.000443962602837      345
3000    14.000840991380645      335
4000    14.000969743602514      346
5000    14.001341178967788      344
6000    14.00135998002807       328
7000    14.001780500680749      313
8000    14.001861254934145      303
9000    14.001974845722069      297
10000   14.002129244844477      297
11000   14.002374428011779      299
12000   13.9968509309506        296
13000   14.020662965297788      291
14000   14.044255590957363      290
15000   14.058228920702625      291
16000   14.037448129767334      290
17000   14.030292677709525      283
18000   14.034618070407525      280
19000   14.037064627386965      278
20000   14.034805715731942      274
...

#"Step" "Total Energy (kJ/mole)"        "Speed (ns/day)"
1000    14.007740775328157      0
2000    14.015824312238987      260
3000    14.0087181449308        279
4000    14.022696349015083      284
5000    14.010502216207124      275
6000    14.012584191982722      265
7000    14.026440971902506      260
8000    14.030638631492987      260
9000    14.038191858222456      257
10000   14.043751931821303      252
11000   14.051324684105513      253
12000   14.055297832296048      252
13000   14.060540097186657      249
14000   14.08421743769247       250
15000   14.107282767704632      252
16000   14.101948425425924      253
17000   14.112720479074394      253
18000   14.130359066794343      252
19000   14.136352142606668      252
20000   14.146669114115653      252
...

For reference, I tried the same setup with CustomExternalForce. No problems have been observed!

The text was updated successfully, but these errors were encountered:

peastman · 2020-07-23T22:20:08Z

I'm having trouble running your script, because I can't import both torch and openmmtorch in the same script. We need a build of libtorch that uses the C++11 ABI. They provide those for the standalone C++ library, but all the builds that also include Python use the older ABI. Did you compile PyTorch from source? Or maybe you're using Windows, which doesn't have this issue?

raimis · 2020-07-24T10:27:44Z

I use PyTorch from the main Anaconda channel:

conda install pytorch=1.4.0 cudatoolkit=10.0

As mentioned in #9 (comment), libtorch comes with pytorch just at a non-standard location.

peastman · 2020-07-24T15:42:42Z

Right, but it's compiled with the pre-C++11 ABI. When you download LibTorch from https://pytorch.org/get-started/locally/ they give you a choice of two packages, one with each ABI. But for conda packages they only provide the old ABI.

raimis · 2020-07-24T16:19:02Z

pytorch from the main Anaconda channel is built with C++11 ABI. I'm using the conda compilers and no problem with the ABI.

peastman · 2020-07-24T18:14:20Z

Good to know. I'll try that version. Thanks!

raimis · 2020-07-28T09:46:23Z

@peastman Have you managed to reproduce the issue? Any ideas how to fix?

raimis · 2020-07-29T17:37:17Z

I have tried different versions of dependencies/compilers:

PyTorch 1.4.0 and 1.6.0
CUDA 10.0 and 10.2
GCC 7.5 and 9.2
CentOS 7 and Ubuntu 18.04
The problem persists!

@peastman we would appreciate if you could help solve this. It is important and blocking our progress.

peastman · 2020-07-29T18:50:05Z

I'm still struggling to get it to compile. I switched to the conda build of PyTorch, but when I try to compile the plugin it gets pages of link errors like

/home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so: undefined reference to `vdLn'
/home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so: undefined reference to `vmsErf'
/home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so: undefined reference to `zgetri_'

raimis · 2020-07-30T09:16:39Z

I haven't seen such errors. My first guess: you are missing MKL.

raimis · 2020-07-30T13:15:00Z

I have tried to run one MD step under cuda-memcheck:

$ cuda-memcheck --tool initcheck python test.py
========= CUDA-MEMCHECK
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800000 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800020 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800040 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800060 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800080 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b028000a0 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b028000c0 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b028000e0 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800100 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800120 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800140 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800160 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800180 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b028001a0 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b028001c0 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b028001e0 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4ae9a00000 of size 12 bytes
=========     Uninitialized access at 0x7f4ae9a00000 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoDAsync_v2 + 0x219) [0x2d3229]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x3942e]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x15183]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 (cudaMemcpyAsync + 0x176) [0x56416]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native21copy_device_to_deviceERNS_14TensorIteratorEb + 0x428) [0x2088d48]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so [0x208a7dc]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xc70f7b]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xc6e30f]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native5copy_ERNS_6TensorERKS1_b + 0x53) [0xc6fb43]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK2at6Tensor5copy_ERKS0_b + 0x113) [0x13f4713]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native7pow_outERNS_6TensorERKS1_N3c106ScalarE + 0x7e9) [0xde2ac9]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native3powERKNS_6TensorEN3c106ScalarE + 0x9d) [0xde43dd]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so [0x31d2434]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xac4cc1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_NS_6ScalarEEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEESA_ + 0x12f) [0x131dccf]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at3powERKNS_6TensorEN3c106ScalarE + 0x88) [0x1293c08]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x2f1056b]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xac4cc1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_NS_6ScalarEEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEESA_ + 0x12f) [0x131dccf]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK2at6Tensor3powEN3c106ScalarE + 0x88) [0x141d558]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd9generated12PowBackward05applyEOSt6vectorIN2at6TensorESaIS5_EE + 0x195) [0x2d85915]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x33ab382]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine17evaluate_functionERSt10shared_ptrINS0_9GraphTaskEEPNS0_4NodeERNS0_11InputBufferERKS2_INS0_10ReadyQueueEE + 0x1655) [0x33a6405]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine11thread_mainERKSt10shared_ptrINS0_9GraphTaskEE + 0x239) [0x33a6c39]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine11thread_initEiRKSt10shared_ptrINS0_10ReadyQueueEEb + 0x9a) [0x33a2d4a]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libstdc++.so.6 [0xc8421]
=========     Host Frame:/lib/x86_64-linux-gnu/libpthread.so.0 [0x76db]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (clone + 0x3f) [0x121a3f]
=========
========= Uninitialized __global__ memory read of size 4
=========     at 0x000075d0 in _ZN2at6native27unrolled_elementwise_kernelIZZZZNS0_81_GLOBAL__N__57_tmpxft_0000082b_00000000_10_Activation_compute_75_cpp1_ii_150aa41218GeluCUDAKernelImplERNS_14TensorIteratorEENKUlvE_clEvENKUlvE2_clEvENKUlvE_clEvEUlN3c108BFloat16EE_NS_6detail5ArrayIPcLi2EEE16OffsetCalculatorILi1EjESG_NS0_6memory15LoadWithoutCastENSH_16StoreWithoutCastEEEviT_T0_T1_T2_T3_T4_
=========     by thread (2,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00400
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x1a547]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x1a5d7]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 (cudaLaunchKernel + 0x225) [0x51885]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native15gpu_kernel_implI18__nv_hdl_wrapper_tILb0ELb0E11__nv_dl_tagIPFvRNS_14TensorIteratorERKS2_ILb0ELb1ES3_IPFvS5_EXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEJEEEXadL_ZNS0_23gpu_kernel_with_scalarsISA_EEvS5_RKT_EELj2EEFffEJSB_fEEEEvS5_SI_ + 0x16f5) [0x1d3f475]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native10gpu_kernelI18__nv_hdl_wrapper_tILb0ELb0E11__nv_dl_tagIPFvRNS_14TensorIteratorERKS2_ILb0ELb1ES3_IPFvS5_EXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEJEEEXadL_ZNS0_23gpu_kernel_with_scalarsISA_EEvS5_RKT_EELj2EEFffEJSB_fEEEEvS5_SI_ + 0x2f4) [0x1d40274]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native23gpu_kernel_with_scalarsI18__nv_hdl_wrapper_tILb0ELb1E11__nv_dl_tagIPFvRNS_14TensorIteratorEEXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEIEEEEvS5_RKT_ + 0x465) [0x1d42cc5]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native15mul_kernel_cudaERNS_14TensorIteratorE + 0x17a) [0x1c9c56a]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xc13807]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native3mulERKNS_6TensorES3_ + 0x65) [0xc0a4f5]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native3mulERKNS_6TensorEN3c106ScalarE + 0x4b) [0xc0c6bb]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x136b0e4]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xac4cc1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_NS_6ScalarEEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEESA_ + 0x12f) [0x131dccf]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at3mulERKNS_6TensorEN3c106ScalarE + 0x88) [0x12935a8]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x2f85788]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xac4cc1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_NS_6ScalarEEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEESA_ + 0x12f) [0x131dccf]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK2at6Tensor3mulEN3c106ScalarE + 0x88) [0x141d118]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd9generated12PowBackward05applyEOSt6vectorIN2at6TensorESaIS5_EE + 0x1db) [0x2d8595b]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x33ab382]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine17evaluate_functionERSt10shared_ptrINS0_9GraphTaskEEPNS0_4NodeERNS0_11InputBufferERKS2_INS0_10ReadyQueueEE + 0x1655) [0x33a6405]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine11thread_mainERKSt10shared_ptrINS0_9GraphTaskEE + 0x239) [0x33a6c39]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine11thread_initEiRKSt10shared_ptrINS0_10ReadyQueueEEb + 0x9a) [0x33a2d4a]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libstdc++.so.6 [0xc8421]
=========     Host Frame:/lib/x86_64-linux-gnu/libpthread.so.0 [0x76db]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (clone + 0x3f) [0x121a3f]
=========
========= Uninitialized __global__ memory read of size 4
=========     at 0x000075d0 in _ZN2at6native27unrolled_elementwise_kernelIZZZZNS0_81_GLOBAL__N__57_tmpxft_0000082b_00000000_10_Activation_compute_75_cpp1_ii_150aa41218GeluCUDAKernelImplERNS_14TensorIteratorEENKUlvE_clEvENKUlvE2_clEvENKUlvE_clEvEUlN3c108BFloat16EE_NS_6detail5ArrayIPcLi2EEE16OffsetCalculatorILi1EjESG_NS0_6memory15LoadWithoutCastENSH_16StoreWithoutCastEEEviT_T0_T1_T2_T3_T4_
=========     by thread (1,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00400
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x1a547]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x1a5d7]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 (cudaLaunchKernel + 0x225) [0x51885]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native15gpu_kernel_implI18__nv_hdl_wrapper_tILb0ELb0E11__nv_dl_tagIPFvRNS_14TensorIteratorERKS2_ILb0ELb1ES3_IPFvS5_EXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEJEEEXadL_ZNS0_23gpu_kernel_with_scalarsISA_EEvS5_RKT_EELj2EEFffEJSB_fEEEEvS5_SI_ + 0x16f5) [0x1d3f475]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native10gpu_kernelI18__nv_hdl_wrapper_tILb0ELb0E11__nv_dl_tagIPFvRNS_14TensorIteratorERKS2_ILb0ELb1ES3_IPFvS5_EXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEJEEEXadL_ZNS0_23gpu_kernel_with_scalarsISA_EEvS5_RKT_EELj2EEFffEJSB_fEEEEvS5_SI_ + 0x2f4) [0x1d40274]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native23gpu_kernel_with_scalarsI18__nv_hdl_wrapper_tILb0ELb1E11__nv_dl_tagIPFvRNS_14TensorIteratorEEXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEIEEEEvS5_RKT_ + 0x465) [0x1d42cc5]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native15mul_kernel_cudaERNS_14TensorIteratorE + 0x17a) [0x1c9c56a]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xc13807]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native3mulERKNS_6TensorES3_ + 0x65) [0xc0a4f5]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native3mulERKNS_6TensorEN3c106ScalarE + 0x4b) [0xc0c6bb]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x136b0e4]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xac4cc1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_NS_6ScalarEEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEESA_ + 0x12f) [0x131dccf]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at3mulERKNS_6TensorEN3c106ScalarE + 0x88) [0x12935a8]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x2f85788]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xac4cc1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_NS_6ScalarEEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEESA_ + 0x12f) [0x131dccf]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK2at6Tensor3mulEN3c106ScalarE + 0x88) [0x141d118]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd9generated12PowBackward05applyEOSt6vectorIN2at6TensorESaIS5_EE + 0x1db) [0x2d8595b]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x33ab382]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine17evaluate_functionERSt10shared_ptrINS0_9GraphTaskEEPNS0_4NodeERNS0_11InputBufferERKS2_INS0_10ReadyQueueEE + 0x1655) [0x33a6405]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine11thread_mainERKSt10shared_ptrINS0_9GraphTaskEE + 0x239) [0x33a6c39]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine11thread_initEiRKSt10shared_ptrINS0_10ReadyQueueEEb + 0x9a) [0x33a2d4a]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libstdc++.so.6 [0xc8421]
=========     Host Frame:/lib/x86_64-linux-gnu/libpthread.so.0 [0x76db]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (clone + 0x3f) [0x121a3f]
=========
========= Uninitialized __global__ memory read of size 4
=========     at 0x000075d0 in _ZN2at6native27unrolled_elementwise_kernelIZZZZNS0_81_GLOBAL__N__57_tmpxft_0000082b_00000000_10_Activation_compute_75_cpp1_ii_150aa41218GeluCUDAKernelImplERNS_14TensorIteratorEENKUlvE_clEvENKUlvE2_clEvENKUlvE_clEvEUlN3c108BFloat16EE_NS_6detail5ArrayIPcLi2EEE16OffsetCalculatorILi1EjESG_NS0_6memory15LoadWithoutCastENSH_16StoreWithoutCastEEEviT_T0_T1_T2_T3_T4_
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00400
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x1a547]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x1a5d7]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 (cudaLaunchKernel + 0x225) [0x51885]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native15gpu_kernel_implI18__nv_hdl_wrapper_tILb0ELb0E11__nv_dl_tagIPFvRNS_14TensorIteratorERKS2_ILb0ELb1ES3_IPFvS5_EXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEJEEEXadL_ZNS0_23gpu_kernel_with_scalarsISA_EEvS5_RKT_EELj2EEFffEJSB_fEEEEvS5_SI_ + 0x16f5) [0x1d3f475]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native10gpu_kernelI18__nv_hdl_wrapper_tILb0ELb0E11__nv_dl_tagIPFvRNS_14TensorIteratorERKS2_ILb0ELb1ES3_IPFvS5_EXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEJEEEXadL_ZNS0_23gpu_kernel_with_scalarsISA_EEvS5_RKT_EELj2EEFffEJSB_fEEEEvS5_SI_ + 0x2f4) [0x1d40274]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native23gpu_kernel_with_scalarsI18__nv_hdl_wrapper_tILb0ELb1E11__nv_dl_tagIPFvRNS_14TensorIteratorEEXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEIEEEEvS5_RKT_ + 0x465) [0x1d42cc5]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native15mul_kernel_cudaERNS_14TensorIteratorE + 0x17a) [0x1c9c56a]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xc13807]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native3mulERKNS_6TensorES3_ + 0x65) [0xc0a4f5]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native3mulERKNS_6TensorEN3c106ScalarE + 0x4b) [0xc0c6bb]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x136b0e4]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xac4cc1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_NS_6ScalarEEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEESA_ + 0x12f) [0x131dccf]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at3mulERKNS_6TensorEN3c106ScalarE + 0x88) [0x12935a8]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x2f85788]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xac4cc1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_NS_6ScalarEEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEESA_ + 0x12f) [0x131dccf]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK2at6Tensor3mulEN3c106ScalarE + 0x88) [0x141d118]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd9generated12PowBackward05applyEOSt6vectorIN2at6TensorESaIS5_EE + 0x1db) [0x2d8595b]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x33ab382]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine17evaluate_functionERSt10shared_ptrINS0_9GraphTaskEEPNS0_4NodeERNS0_11InputBufferERKS2_INS0_10ReadyQueueEE + 0x1655) [0x33a6405]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine11thread_mainERKSt10shared_ptrINS0_9GraphTaskEE + 0x239) [0x33a6c39]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine11thread_initEiRKSt10shared_ptrINS0_10ReadyQueueEEb + 0x9a) [0x33a2d4a]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libstdc++.so.6 [0xc8421]
=========     Host Frame:/lib/x86_64-linux-gnu/libpthread.so.0 [0x76db]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (clone + 0x3f) [0x121a3f]
=========
========= Uninitialized __global__ memory read of size 4
=========     at 0x000000f8 in addForces
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00c00
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM11CudaContext13executeKernelEP9CUfunc_stPPviij + 0x79) [0x3f929]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMTorchCUDA.so (_ZN11TorchPlugin24CudaCalcTorchForceKernel7executeERN6OpenMM11ContextImplEbb + 0x807) [0x8287]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0xca) [0xccaba]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM16VerletIntegrator4stepEi + 0x6e) [0x1747ce]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0xa874c]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0x387) [0x165497]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x14ea) [0x1cdcba]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x6a0) [0x1cce70]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Uninitialized __global__ memory read of size 4
=========     at 0x00000148 in addForces
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00c04
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM11CudaContext13executeKernelEP9CUfunc_stPPviij + 0x79) [0x3f929]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMTorchCUDA.so (_ZN11TorchPlugin24CudaCalcTorchForceKernel7executeERN6OpenMM11ContextImplEbb + 0x807) [0x8287]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0xca) [0xccaba]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM16VerletIntegrator4stepEi + 0x6e) [0x1747ce]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0xa874c]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0x387) [0x165497]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x14ea) [0x1cdcba]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x6a0) [0x1cce70]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Uninitialized __global__ memory read of size 4
=========     at 0x000001b0 in addForces
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00c08
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM11CudaContext13executeKernelEP9CUfunc_stPPviij + 0x79) [0x3f929]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMTorchCUDA.so (_ZN11TorchPlugin24CudaCalcTorchForceKernel7executeERN6OpenMM11ContextImplEbb + 0x807) [0x8287]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0xca) [0xccaba]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM16VerletIntegrator4stepEi + 0x6e) [0x1747ce]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0xa874c]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0x387) [0x165497]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x14ea) [0x1cdcba]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x6a0) [0x1cce70]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4ae9a00600 of size 4 bytes
=========     Uninitialized access at 0x7f4ae9a00600 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoHAsync_v2 + 0x219) [0x2d2ff9]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x3940f]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x15183]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 (cudaMemcpyAsync + 0x176) [0x56416]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so [0x1e5fd61]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native24_local_scalar_dense_cudaERKNS_6TensorE + 0x37) [0x1e629b7]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so [0x31d2de5]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so [0x32050c1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at19_local_scalar_denseERKNS_6TensorE + 0xee) [0x125f75e]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x2e46adc]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x11da581]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at19_local_scalar_denseERKNS_6TensorE + 0xee) [0x125f75e]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native4itemERKNS_6TensorE + 0x13b) [0xe8bf0b]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x13746b5]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x2e93b4d]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x11da581]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK2at6Tensor4itemEv + 0xee) [0x140a9ae]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK2at6Tensor4itemIdEET_v + 0x21) [0x140b401]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMTorchCUDA.so (_ZN11TorchPlugin24CudaCalcTorchForceKernel7executeERN6OpenMM11ContextImplEbb + 0x828) [0x82a8]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0xca) [0xccaba]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM16VerletIntegrator4stepEi + 0x6e) [0x1747ce]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0xa874c]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0x387) [0x165497]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x14ea) [0x1cdcba]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x6a0) [0x1cce70]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= ERROR SUMMARY: 24 errors

I guess, most of them are false positives, but these are a bit suspicious:

========= Uninitialized __global__ memory read of size 4
=========     at 0x000000f8 in addForces
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00c00
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM11CudaContext13executeKernelEP9CUfunc_stPPviij + 0x79) [0x3f929]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMTorchCUDA.so (_ZN11TorchPlugin24CudaCalcTorchForceKernel7executeERN6OpenMM11ContextImplEbb + 0x807) [0x8287]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0xca) [0xccaba]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM16VerletIntegrator4stepEi + 0x6e) [0x1747ce]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0xa874c]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0x387) [0x165497]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x14ea) [0x1cdcba]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x6a0) [0x1cce70]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Uninitialized __global__ memory read of size 4
=========     at 0x00000148 in addForces
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00c04
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM11CudaContext13executeKernelEP9CUfunc_stPPviij + 0x79) [0x3f929]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMTorchCUDA.so (_ZN11TorchPlugin24CudaCalcTorchForceKernel7executeERN6OpenMM11ContextImplEbb + 0x807) [0x8287]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0xca) [0xccaba]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM16VerletIntegrator4stepEi + 0x6e) [0x1747ce]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0xa874c]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0x387) [0x165497]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x14ea) [0x1cdcba]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x6a0) [0x1cce70]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Uninitialized __global__ memory read of size 4
=========     at 0x000001b0 in addForces
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00c08
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM11CudaContext13executeKernelEP9CUfunc_stPPviij + 0x79) [0x3f929]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMTorchCUDA.so (_ZN11TorchPlugin24CudaCalcTorchForceKernel7executeERN6OpenMM11ContextImplEbb + 0x807) [0x8287]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0xca) [0xccaba]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM16VerletIntegrator4stepEi + 0x6e) [0x1747ce]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0xa874c]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0x387) [0x165497]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x14ea) [0x1cdcba]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x6a0) [0x1cce70]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]

peastman · 2020-07-30T17:23:02Z

$ conda list | grep mkl
blas                      1.0                         mkl  
mkl                       2019.4                      243  
mkl-service               2.3.0            py37he904b0f_0  
mkl_fft                   1.1.0            py37h23d657b_0  
mkl_random                1.0.4            py37hd81dba3_0

It's there, but perhaps I need to do something extra to add it to the library path. This is a difference between the standard and conda builds. When I look at the version of libtorch downloaded from the PyTorch website it doesn't have that dependency:

$ ldd ~/libtorch/lib/libtorch_cpu.so 
	linux-vdso.so.1 =>  (0x00007fffd63de000)
	libgomp-753e6e92.so.1 => /home/peastman/libtorch/lib/libgomp-753e6e92.so.1 (0x00007fc60f3a6000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc60f189000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc60ef81000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc60ed6b000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc60eb67000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc60e85e000)
	libcudart-1b201d85.so.10.1 => /home/peastman/libtorch/lib/libcudart-1b201d85.so.10.1 (0x00007fc60e5df000)
	libc10.so => /home/peastman/libtorch/lib/libc10.so (0x00007fc60e38a000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc60e008000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc60dc3e000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fc61e8f9000)

But the conda build does:

$ ldd ~/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so 
	linux-vdso.so.1 =>  (0x00007ffca1ffd000)
	libgomp.so.1 => /home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch//lib/../../../../libgomp.so.1 (0x00007f7d4e2df000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7d498b9000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f7d496b1000)
	libgcc_s.so.1 => /home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch//lib/../../../../libgcc_s.so.1 (0x00007f7d4e2a8000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7d494ad000)
	libmkl_intel_lp64.so => /home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch//lib/../../../../libmkl_intel_lp64.so (0x00007f7d48935000)
	libmkl_gnu_thread.so => /home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch//lib/../../../../libmkl_gnu_thread.so (0x00007f7d470e2000)
	libmkl_core.so => /home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch//lib/../../../../libmkl_core.so (0x00007f7d42e0d000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7d42b04000)
	libc10.so => /home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch//lib/libc10.so (0x00007f7d4e25b000)
	libstdc++.so.6 => /home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch//lib/../../../../libstdc++.so.6 (0x00007f7d42990000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7d425c6000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f7d4e0e8000)

peastman · 2020-07-30T20:02:43Z

I finally managed to get it to compile. Now I get this error when I try to run the script:

simtk.openmm.OpenMMException: Cannot initialize CUDA without ATen_cuda library. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason. The CUDA library MUST be loaded, EVEN IF you don't directly use any symbols from the CUDA library! One common culprit is a lack of -Wl,--no-as-needed in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don't depend on any of their symbols. You can check if this has occurred by using ldd on your binary to see if there is a dependency on *_cuda.so library. (initCUDA at /tmp/pip-req-build-w9kte7xz/aten/src/ATen/detail/CUDAHooksInterface.h:63)

None of these problems happen when using the official PyTorch builds. I've already spent hours on this with no clear end in sight. I'm going to switch back to the official build, then rework your script to split out creating the .pt file from running the simulation.

peastman · 2020-07-30T21:42:04Z

I can reproduce the results you're seeing, but it doesn't seem to have anything to do with multiple simulations. It just depends on the platform. With OpenCL or CPU I get excellent energy conservation:

#"Step"	"Total Energy (kJ/mole)"	"Speed (ns/day)"
1000	14.000000179612432	0
2000	14.000006315452213	504
...
99000	14.00000036979219	521
100000	14.000007236732628	521

With CUDA the energy conservation is worse:

#"Step"	"Total Energy (kJ/mole)"	"Speed (ns/day)"
1000	14.000029598017427	0
2000	14.000038045238302	274
...
99000	13.991048758717174	273
100000	13.991120108011508	273

Furthermore, with CUDA the numbers vary from one run to the next, which shouldn't be the case. I'll keep investigating.

raimis · 2020-07-31T12:11:30Z

Thanks for your effort, @peastman! Just keep in mind, the problem is caused not only by the second simulation, but also by other processes on GPU (i.e. VMD, Chrome, etc).

peastman · 2020-07-31T19:54:39Z

The error seems to be happening when we copy the forces over from PyTorch's buffers to OpenMM's buffers. Usually it works correctly, but once in a while the forces come out as zero. This seems like a race condition, but scattering calls to cuCtxSynchronize() through the code doesn't fix it. I might just switch to downloading the forces to host and sending them back again, the same way the other platforms do. It's slower than copying them on the GPU, but at least it should work.

peastman · 2020-07-31T21:59:22Z

After changing that, the trajectory is identical to the one from OpenCL:

#"Step"	"Total Energy (kJ/mole)"	"Speed (ns/day)"
1000	14.000000179612432	0
2000	14.000005361777896	199
...
99000	14.00000036979219	199
100000	14.000007236732628	199

It's slower than before, but that's probably mostly because it's a trivial model. For a real system the overhead would be much less significant.

peastman · 2020-07-31T23:02:04Z

The fix is in #15. Could you try it out and see if it now works for you?

raimis · 2020-08-03T12:38:30Z

Regarding #15, I think, the problem is not the race condition, but the multiple-context problem:

torch::jit::load creates another CUDA context (Force libtorch to use CUDA context pytorch/pytorch#31565)
Device pointers from different context cannot be mixed (https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#context), but this happens when we try to pass the coordinates and return the forces. This explains the reported memory errors.
The kernel execution of different contexts is time-sliced (https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/index.html#multiple-contexts). This explains why the issue manifests as a synchronisation problem.

I guess, the a more correct fix:

Use the same context for OpenMM and Torch; or
Make copies with cuMemcpyPeer or cuMemcpyPeerAsync (https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__MEM.html#group__CUDA__MEM)

raimis · 2020-08-05T16:18:05Z

In addition to the previous comments:

Actually torch::jit::load does not create another CUDA context, but uses the special primary context (https://docs.nvidia.com/cuda/cuda-driver-api/group__CUDA__PRIMARY__CTX.html#group__CUDA__PRIMARY__CTX)
The primary context is unique per device rather than process. This means the separate processes running Torch on the same GPU are sharing that context.
Intentionally or not, a tensor storing gradients is somehow reused. So just making a copy of it (Fix a race condition by explicitly coping Torch forces #16), fixes the issue.

peastman mentioned this issue Jul 31, 2020

Avoid race condition while copying forces on GPU #15

Closed

This was referenced Aug 3, 2020

Force libtorch to use CUDA context pytorch/pytorch#31565

Open

Fix a race condition by explicitly coping Torch forces #16

Merged

peastman closed this as completed in #16 Aug 6, 2020

yaoyic mentioned this issue Sep 30, 2020

[BUGFIX] Synchronize CUDA context before force copying #17

Merged

JinyinZha mentioned this issue Sep 7, 2023

Is it possible to use openmm-torch to apply force on only several atoms or CVs #119

Closed

dmighty007 mentioned this issue Feb 11, 2024

OpenMMException when adding TorchForce to the system #134

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple simulations on the same GPU are interfering with each other! #13

Multiple simulations on the same GPU are interfering with each other! #13

raimis commented Jul 23, 2020

peastman commented Jul 23, 2020

raimis commented Jul 24, 2020

peastman commented Jul 24, 2020

raimis commented Jul 24, 2020

peastman commented Jul 24, 2020

raimis commented Jul 28, 2020

raimis commented Jul 29, 2020

peastman commented Jul 29, 2020

raimis commented Jul 30, 2020

raimis commented Jul 30, 2020

peastman commented Jul 30, 2020

peastman commented Jul 30, 2020

peastman commented Jul 30, 2020

raimis commented Jul 31, 2020

peastman commented Jul 31, 2020

peastman commented Jul 31, 2020

peastman commented Jul 31, 2020

raimis commented Aug 3, 2020

raimis commented Aug 5, 2020 •

edited

Loading

Multiple simulations on the same GPU are interfering with each other! #13

Multiple simulations on the same GPU are interfering with each other! #13

Comments

raimis commented Jul 23, 2020

peastman commented Jul 23, 2020

raimis commented Jul 24, 2020

peastman commented Jul 24, 2020

raimis commented Jul 24, 2020

peastman commented Jul 24, 2020

raimis commented Jul 28, 2020

raimis commented Jul 29, 2020

peastman commented Jul 29, 2020

raimis commented Jul 30, 2020

raimis commented Jul 30, 2020

peastman commented Jul 30, 2020

peastman commented Jul 30, 2020

peastman commented Jul 30, 2020

raimis commented Jul 31, 2020

peastman commented Jul 31, 2020

peastman commented Jul 31, 2020

peastman commented Jul 31, 2020

raimis commented Aug 3, 2020

raimis commented Aug 5, 2020 • edited Loading

raimis commented Aug 5, 2020 •

edited

Loading