Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple simulations on the same GPU are interfering with each other! #13

Closed
raimis opened this issue Jul 23, 2020 · 19 comments · Fixed by #16
Closed

Multiple simulations on the same GPU are interfering with each other! #13

raimis opened this issue Jul 23, 2020 · 19 comments · Fixed by #16

Comments

@raimis
Copy link
Contributor

raimis commented Jul 23, 2020

If two or more simulations are running on the same GPU and using TorchForce, they produce incorrect results. Other GPU processes (i.e. VMD) also affect TorchForce, but the effects are less deterministic. Typically this manifests as random "explosions" of the system.

Versions:

The problem can be reproduced with:

from sys import stdout
from simtk import unit as u
import simtk.openmm as mm
from simtk.openmm import app

system = mm.System()
system.addParticle(1)

if True:
    import torch
    from openmmtorch import TorchForce

    class Force(torch.nn.Module):
        def forward(self, positions):
            return torch.sum(positions**2)
    torch.jit.script(Force()).save('model.pt')

    system.addForce(TorchForce('model.pt'))

else:
    force = mm.CustomExternalForce('x^2+y^2+z^2')
    force.addParticle(0)
    system.addForce(force)

integrator = mm.VerletIntegrator(1*u.femtosecond)
platform = mm.Platform.getPlatformByName('CUDA')
properties = {'CudaPrecision': 'mixed', 'DeviceIndex': '2'}

simulation = app.Simulation(app.Topology(), system, integrator, platform, properties)
simulation.context.setPositions([[1, 2, 3]])
simulation.reporters.append(app.StateDataReporter(stdout, 1000, step=True, totalEnergy=True, speed=True, separator='\t'))
simulation.step(100000)

If only one simulation is running, the total energy is conserved as expected:

#"Step" "Total Energy (kJ/mole)"        "Speed (ns/day)"
1000    14.000000179612432      0
2000    14.000005361777896      349
3000    14.000001366072164      367
4000    14.000005119669993      362
5000    14.000003331103173      336
6000    14.000002685719684      325
7000    14.000005601208187      328
8000    14.000000713319633      330
9000    14.000007006230264      338
10000   14.000000022274946      344
11000   14.00000683213267       349
12000   14.000000738332375      353
13000   14.00000512025951       350
14000   14.00000300080587       347
15000   14.00000395686458       332
16000   14.000004841749305      324
17000   14.000001304157916      318
18000   14.000006327369546      309
19000   14.000000236521004      303
20000   14.000007625364647      296
...

If two simulations are running, the energy conservation degrades:

#"Step" "Total Energy (kJ/mole)"        "Speed (ns/day)"
1000    14.000159708061787      0
2000    14.000443962602837      345
3000    14.000840991380645      335
4000    14.000969743602514      346
5000    14.001341178967788      344
6000    14.00135998002807       328
7000    14.001780500680749      313
8000    14.001861254934145      303
9000    14.001974845722069      297
10000   14.002129244844477      297
11000   14.002374428011779      299
12000   13.9968509309506        296
13000   14.020662965297788      291
14000   14.044255590957363      290
15000   14.058228920702625      291
16000   14.037448129767334      290
17000   14.030292677709525      283
18000   14.034618070407525      280
19000   14.037064627386965      278
20000   14.034805715731942      274
...
#"Step" "Total Energy (kJ/mole)"        "Speed (ns/day)"
1000    14.007740775328157      0
2000    14.015824312238987      260
3000    14.0087181449308        279
4000    14.022696349015083      284
5000    14.010502216207124      275
6000    14.012584191982722      265
7000    14.026440971902506      260
8000    14.030638631492987      260
9000    14.038191858222456      257
10000   14.043751931821303      252
11000   14.051324684105513      253
12000   14.055297832296048      252
13000   14.060540097186657      249
14000   14.08421743769247       250
15000   14.107282767704632      252
16000   14.101948425425924      253
17000   14.112720479074394      253
18000   14.130359066794343      252
19000   14.136352142606668      252
20000   14.146669114115653      252
...

For reference, I tried the same setup with CustomExternalForce. No problems have been observed!

@peastman
Copy link
Member

I'm having trouble running your script, because I can't import both torch and openmmtorch in the same script. We need a build of libtorch that uses the C++11 ABI. They provide those for the standalone C++ library, but all the builds that also include Python use the older ABI. Did you compile PyTorch from source? Or maybe you're using Windows, which doesn't have this issue?

@raimis
Copy link
Contributor Author

raimis commented Jul 24, 2020

I use PyTorch from the main Anaconda channel:

conda install pytorch=1.4.0 cudatoolkit=10.0 

As mentioned in #9 (comment), libtorch comes with pytorch just at a non-standard location.

@peastman
Copy link
Member

Right, but it's compiled with the pre-C++11 ABI. When you download LibTorch from https://pytorch.org/get-started/locally/ they give you a choice of two packages, one with each ABI. But for conda packages they only provide the old ABI.

@raimis
Copy link
Contributor Author

raimis commented Jul 24, 2020

pytorch from the main Anaconda channel is built with C++11 ABI. I'm using the conda compilers and no problem with the ABI.

@peastman
Copy link
Member

Good to know. I'll try that version. Thanks!

@raimis
Copy link
Contributor Author

raimis commented Jul 28, 2020

@peastman Have you managed to reproduce the issue? Any ideas how to fix?

@raimis
Copy link
Contributor Author

raimis commented Jul 29, 2020

I have tried different versions of dependencies/compilers:

  • PyTorch 1.4.0 and 1.6.0
  • CUDA 10.0 and 10.2
  • GCC 7.5 and 9.2
  • CentOS 7 and Ubuntu 18.04
    The problem persists!

@peastman we would appreciate if you could help solve this. It is important and blocking our progress.

@peastman
Copy link
Member

I'm still struggling to get it to compile. I switched to the conda build of PyTorch, but when I try to compile the plugin it gets pages of link errors like

/home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so: undefined reference to `vdLn'
/home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so: undefined reference to `vmsErf'
/home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so: undefined reference to `zgetri_'

@raimis
Copy link
Contributor Author

raimis commented Jul 30, 2020

I haven't seen such errors. My first guess: you are missing MKL.

@raimis
Copy link
Contributor Author

raimis commented Jul 30, 2020

I have tried to run one MD step under cuda-memcheck:

$ cuda-memcheck --tool initcheck python test.py
========= CUDA-MEMCHECK
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800000 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800020 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800040 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800060 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800080 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b028000a0 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b028000c0 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b028000e0 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800100 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800120 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800140 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800160 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b02800180 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b028001a0 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b028001c0 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4b02800000 of size 512 bytes
=========     Uninitialized access at 0x7f4b028001e0 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoH_v2 + 0x1ec) [0x2d14ac]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZNK6OpenMM9CudaArray8downloadEPvb + 0x9b) [0x38b4b]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM25CudaUpdateStateDataKernel12setPositionsERNS_11ContextImplERKSt6vectorINS_4Vec3ESaIS4_EE + 0x276) [0x87806]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x41) [0xcc231]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM7Context12setPositionsERKSt6vectorINS_4Vec3ESaIS2_EE + 0x46) [0xc93e6]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0x13d488]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4ae9a00000 of size 12 bytes
=========     Uninitialized access at 0x7f4ae9a00000 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoDAsync_v2 + 0x219) [0x2d3229]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x3942e]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x15183]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 (cudaMemcpyAsync + 0x176) [0x56416]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native21copy_device_to_deviceERNS_14TensorIteratorEb + 0x428) [0x2088d48]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so [0x208a7dc]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xc70f7b]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xc6e30f]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native5copy_ERNS_6TensorERKS1_b + 0x53) [0xc6fb43]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK2at6Tensor5copy_ERKS0_b + 0x113) [0x13f4713]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native7pow_outERNS_6TensorERKS1_N3c106ScalarE + 0x7e9) [0xde2ac9]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native3powERKNS_6TensorEN3c106ScalarE + 0x9d) [0xde43dd]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so [0x31d2434]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xac4cc1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_NS_6ScalarEEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEESA_ + 0x12f) [0x131dccf]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at3powERKNS_6TensorEN3c106ScalarE + 0x88) [0x1293c08]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x2f1056b]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xac4cc1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_NS_6ScalarEEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEESA_ + 0x12f) [0x131dccf]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK2at6Tensor3powEN3c106ScalarE + 0x88) [0x141d558]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd9generated12PowBackward05applyEOSt6vectorIN2at6TensorESaIS5_EE + 0x195) [0x2d85915]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x33ab382]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine17evaluate_functionERSt10shared_ptrINS0_9GraphTaskEEPNS0_4NodeERNS0_11InputBufferERKS2_INS0_10ReadyQueueEE + 0x1655) [0x33a6405]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine11thread_mainERKSt10shared_ptrINS0_9GraphTaskEE + 0x239) [0x33a6c39]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine11thread_initEiRKSt10shared_ptrINS0_10ReadyQueueEEb + 0x9a) [0x33a2d4a]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libstdc++.so.6 [0xc8421]
=========     Host Frame:/lib/x86_64-linux-gnu/libpthread.so.0 [0x76db]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (clone + 0x3f) [0x121a3f]
=========
========= Uninitialized __global__ memory read of size 4
=========     at 0x000075d0 in _ZN2at6native27unrolled_elementwise_kernelIZZZZNS0_81_GLOBAL__N__57_tmpxft_0000082b_00000000_10_Activation_compute_75_cpp1_ii_150aa41218GeluCUDAKernelImplERNS_14TensorIteratorEENKUlvE_clEvENKUlvE2_clEvENKUlvE_clEvEUlN3c108BFloat16EE_NS_6detail5ArrayIPcLi2EEE16OffsetCalculatorILi1EjESG_NS0_6memory15LoadWithoutCastENSH_16StoreWithoutCastEEEviT_T0_T1_T2_T3_T4_
=========     by thread (2,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00400
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x1a547]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x1a5d7]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 (cudaLaunchKernel + 0x225) [0x51885]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native15gpu_kernel_implI18__nv_hdl_wrapper_tILb0ELb0E11__nv_dl_tagIPFvRNS_14TensorIteratorERKS2_ILb0ELb1ES3_IPFvS5_EXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEJEEEXadL_ZNS0_23gpu_kernel_with_scalarsISA_EEvS5_RKT_EELj2EEFffEJSB_fEEEEvS5_SI_ + 0x16f5) [0x1d3f475]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native10gpu_kernelI18__nv_hdl_wrapper_tILb0ELb0E11__nv_dl_tagIPFvRNS_14TensorIteratorERKS2_ILb0ELb1ES3_IPFvS5_EXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEJEEEXadL_ZNS0_23gpu_kernel_with_scalarsISA_EEvS5_RKT_EELj2EEFffEJSB_fEEEEvS5_SI_ + 0x2f4) [0x1d40274]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native23gpu_kernel_with_scalarsI18__nv_hdl_wrapper_tILb0ELb1E11__nv_dl_tagIPFvRNS_14TensorIteratorEEXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEIEEEEvS5_RKT_ + 0x465) [0x1d42cc5]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native15mul_kernel_cudaERNS_14TensorIteratorE + 0x17a) [0x1c9c56a]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xc13807]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native3mulERKNS_6TensorES3_ + 0x65) [0xc0a4f5]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native3mulERKNS_6TensorEN3c106ScalarE + 0x4b) [0xc0c6bb]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x136b0e4]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xac4cc1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_NS_6ScalarEEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEESA_ + 0x12f) [0x131dccf]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at3mulERKNS_6TensorEN3c106ScalarE + 0x88) [0x12935a8]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x2f85788]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xac4cc1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_NS_6ScalarEEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEESA_ + 0x12f) [0x131dccf]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK2at6Tensor3mulEN3c106ScalarE + 0x88) [0x141d118]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd9generated12PowBackward05applyEOSt6vectorIN2at6TensorESaIS5_EE + 0x1db) [0x2d8595b]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x33ab382]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine17evaluate_functionERSt10shared_ptrINS0_9GraphTaskEEPNS0_4NodeERNS0_11InputBufferERKS2_INS0_10ReadyQueueEE + 0x1655) [0x33a6405]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine11thread_mainERKSt10shared_ptrINS0_9GraphTaskEE + 0x239) [0x33a6c39]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine11thread_initEiRKSt10shared_ptrINS0_10ReadyQueueEEb + 0x9a) [0x33a2d4a]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libstdc++.so.6 [0xc8421]
=========     Host Frame:/lib/x86_64-linux-gnu/libpthread.so.0 [0x76db]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (clone + 0x3f) [0x121a3f]
=========
========= Uninitialized __global__ memory read of size 4
=========     at 0x000075d0 in _ZN2at6native27unrolled_elementwise_kernelIZZZZNS0_81_GLOBAL__N__57_tmpxft_0000082b_00000000_10_Activation_compute_75_cpp1_ii_150aa41218GeluCUDAKernelImplERNS_14TensorIteratorEENKUlvE_clEvENKUlvE2_clEvENKUlvE_clEvEUlN3c108BFloat16EE_NS_6detail5ArrayIPcLi2EEE16OffsetCalculatorILi1EjESG_NS0_6memory15LoadWithoutCastENSH_16StoreWithoutCastEEEviT_T0_T1_T2_T3_T4_
=========     by thread (1,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00400
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x1a547]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x1a5d7]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 (cudaLaunchKernel + 0x225) [0x51885]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native15gpu_kernel_implI18__nv_hdl_wrapper_tILb0ELb0E11__nv_dl_tagIPFvRNS_14TensorIteratorERKS2_ILb0ELb1ES3_IPFvS5_EXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEJEEEXadL_ZNS0_23gpu_kernel_with_scalarsISA_EEvS5_RKT_EELj2EEFffEJSB_fEEEEvS5_SI_ + 0x16f5) [0x1d3f475]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native10gpu_kernelI18__nv_hdl_wrapper_tILb0ELb0E11__nv_dl_tagIPFvRNS_14TensorIteratorERKS2_ILb0ELb1ES3_IPFvS5_EXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEJEEEXadL_ZNS0_23gpu_kernel_with_scalarsISA_EEvS5_RKT_EELj2EEFffEJSB_fEEEEvS5_SI_ + 0x2f4) [0x1d40274]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native23gpu_kernel_with_scalarsI18__nv_hdl_wrapper_tILb0ELb1E11__nv_dl_tagIPFvRNS_14TensorIteratorEEXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEIEEEEvS5_RKT_ + 0x465) [0x1d42cc5]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native15mul_kernel_cudaERNS_14TensorIteratorE + 0x17a) [0x1c9c56a]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xc13807]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native3mulERKNS_6TensorES3_ + 0x65) [0xc0a4f5]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native3mulERKNS_6TensorEN3c106ScalarE + 0x4b) [0xc0c6bb]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x136b0e4]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xac4cc1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_NS_6ScalarEEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEESA_ + 0x12f) [0x131dccf]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at3mulERKNS_6TensorEN3c106ScalarE + 0x88) [0x12935a8]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x2f85788]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xac4cc1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_NS_6ScalarEEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEESA_ + 0x12f) [0x131dccf]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK2at6Tensor3mulEN3c106ScalarE + 0x88) [0x141d118]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd9generated12PowBackward05applyEOSt6vectorIN2at6TensorESaIS5_EE + 0x1db) [0x2d8595b]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x33ab382]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine17evaluate_functionERSt10shared_ptrINS0_9GraphTaskEEPNS0_4NodeERNS0_11InputBufferERKS2_INS0_10ReadyQueueEE + 0x1655) [0x33a6405]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine11thread_mainERKSt10shared_ptrINS0_9GraphTaskEE + 0x239) [0x33a6c39]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine11thread_initEiRKSt10shared_ptrINS0_10ReadyQueueEEb + 0x9a) [0x33a2d4a]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libstdc++.so.6 [0xc8421]
=========     Host Frame:/lib/x86_64-linux-gnu/libpthread.so.0 [0x76db]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (clone + 0x3f) [0x121a3f]
=========
========= Uninitialized __global__ memory read of size 4
=========     at 0x000075d0 in _ZN2at6native27unrolled_elementwise_kernelIZZZZNS0_81_GLOBAL__N__57_tmpxft_0000082b_00000000_10_Activation_compute_75_cpp1_ii_150aa41218GeluCUDAKernelImplERNS_14TensorIteratorEENKUlvE_clEvENKUlvE2_clEvENKUlvE_clEvEUlN3c108BFloat16EE_NS_6detail5ArrayIPcLi2EEE16OffsetCalculatorILi1EjESG_NS0_6memory15LoadWithoutCastENSH_16StoreWithoutCastEEEviT_T0_T1_T2_T3_T4_
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00400
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x1a547]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x1a5d7]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 (cudaLaunchKernel + 0x225) [0x51885]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native15gpu_kernel_implI18__nv_hdl_wrapper_tILb0ELb0E11__nv_dl_tagIPFvRNS_14TensorIteratorERKS2_ILb0ELb1ES3_IPFvS5_EXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEJEEEXadL_ZNS0_23gpu_kernel_with_scalarsISA_EEvS5_RKT_EELj2EEFffEJSB_fEEEEvS5_SI_ + 0x16f5) [0x1d3f475]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native10gpu_kernelI18__nv_hdl_wrapper_tILb0ELb0E11__nv_dl_tagIPFvRNS_14TensorIteratorERKS2_ILb0ELb1ES3_IPFvS5_EXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEJEEEXadL_ZNS0_23gpu_kernel_with_scalarsISA_EEvS5_RKT_EELj2EEFffEJSB_fEEEEvS5_SI_ + 0x2f4) [0x1d40274]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native23gpu_kernel_with_scalarsI18__nv_hdl_wrapper_tILb0ELb1E11__nv_dl_tagIPFvRNS_14TensorIteratorEEXadL_ZNS0_15mul_kernel_cudaES5_EELj5EEFfffEIEEEEvS5_RKT_ + 0x465) [0x1d42cc5]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native15mul_kernel_cudaERNS_14TensorIteratorE + 0x17a) [0x1c9c56a]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xc13807]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native3mulERKNS_6TensorES3_ + 0x65) [0xc0a4f5]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native3mulERKNS_6TensorEN3c106ScalarE + 0x4b) [0xc0c6bb]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x136b0e4]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xac4cc1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_NS_6ScalarEEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEESA_ + 0x12f) [0x131dccf]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at3mulERKNS_6TensorEN3c106ScalarE + 0x88) [0x12935a8]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x2f85788]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0xac4cc1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK3c1010Dispatcher4callIN2at6TensorEJRKS3_NS_6ScalarEEEET_RKNS_19TypedOperatorHandleIFS7_DpT0_EEESA_ + 0x12f) [0x131dccf]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK2at6Tensor3mulEN3c106ScalarE + 0x88) [0x141d118]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd9generated12PowBackward05applyEOSt6vectorIN2at6TensorESaIS5_EE + 0x1db) [0x2d8595b]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x33ab382]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine17evaluate_functionERSt10shared_ptrINS0_9GraphTaskEEPNS0_4NodeERNS0_11InputBufferERKS2_INS0_10ReadyQueueEE + 0x1655) [0x33a6405]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine11thread_mainERKSt10shared_ptrINS0_9GraphTaskEE + 0x239) [0x33a6c39]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN5torch8autograd6Engine11thread_initEiRKSt10shared_ptrINS0_10ReadyQueueEEb + 0x9a) [0x33a2d4a]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libstdc++.so.6 [0xc8421]
=========     Host Frame:/lib/x86_64-linux-gnu/libpthread.so.0 [0x76db]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (clone + 0x3f) [0x121a3f]
=========
========= Uninitialized __global__ memory read of size 4
=========     at 0x000000f8 in addForces
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00c00
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM11CudaContext13executeKernelEP9CUfunc_stPPviij + 0x79) [0x3f929]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMTorchCUDA.so (_ZN11TorchPlugin24CudaCalcTorchForceKernel7executeERN6OpenMM11ContextImplEbb + 0x807) [0x8287]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0xca) [0xccaba]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM16VerletIntegrator4stepEi + 0x6e) [0x1747ce]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0xa874c]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0x387) [0x165497]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x14ea) [0x1cdcba]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x6a0) [0x1cce70]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Uninitialized __global__ memory read of size 4
=========     at 0x00000148 in addForces
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00c04
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM11CudaContext13executeKernelEP9CUfunc_stPPviij + 0x79) [0x3f929]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMTorchCUDA.so (_ZN11TorchPlugin24CudaCalcTorchForceKernel7executeERN6OpenMM11ContextImplEbb + 0x807) [0x8287]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0xca) [0xccaba]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM16VerletIntegrator4stepEi + 0x6e) [0x1747ce]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0xa874c]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0x387) [0x165497]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x14ea) [0x1cdcba]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x6a0) [0x1cce70]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Uninitialized __global__ memory read of size 4
=========     at 0x000001b0 in addForces
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00c08
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM11CudaContext13executeKernelEP9CUfunc_stPPviij + 0x79) [0x3f929]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMTorchCUDA.so (_ZN11TorchPlugin24CudaCalcTorchForceKernel7executeERN6OpenMM11ContextImplEbb + 0x807) [0x8287]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0xca) [0xccaba]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM16VerletIntegrator4stepEi + 0x6e) [0x1747ce]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0xa874c]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0x387) [0x165497]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x14ea) [0x1cdcba]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x6a0) [0x1cce70]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Host API memory access error at host access to 0x7f4ae9a00600 of size 4 bytes
=========     Uninitialized access at 0x7f4ae9a00600 on access by cudaMemcopy source.
=========     Saved host backtrace up to driver entry point at error
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuMemcpyDtoHAsync_v2 + 0x219) [0x2d2ff9]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x3940f]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 [0x15183]
=========     Host Frame:/home/user/conda/lib/plugins/../libcudart.so.10.2 (cudaMemcpyAsync + 0x176) [0x56416]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so [0x1e5fd61]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so (_ZN2at6native24_local_scalar_dense_cudaERKNS_6TensorE + 0x37) [0x1e629b7]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so [0x31d2de5]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cuda.so [0x32050c1]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at19_local_scalar_denseERKNS_6TensorE + 0xee) [0x125f75e]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x2e46adc]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x11da581]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at19_local_scalar_denseERKNS_6TensorE + 0xee) [0x125f75e]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZN2at6native4itemERKNS_6TensorE + 0x13b) [0xe8bf0b]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x13746b5]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x2e93b4d]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so [0x11da581]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK2at6Tensor4itemEv + 0xee) [0x140a9ae]
=========     Host Frame:/home/user/libtorch/lib/libtorch_cpu.so (_ZNK2at6Tensor4itemIdEET_v + 0x21) [0x140b401]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMTorchCUDA.so (_ZN11TorchPlugin24CudaCalcTorchForceKernel7executeERN6OpenMM11ContextImplEbb + 0x828) [0x82a8]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0xca) [0xccaba]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM16VerletIntegrator4stepEi + 0x6e) [0x1747ce]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0xa874c]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0x387) [0x165497]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x14ea) [0x1cdcba]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x6a0) [0x1cce70]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= ERROR SUMMARY: 24 errors

I guess, most of them are false positives, but these are a bit suspicious:

========= Uninitialized __global__ memory read of size 4
=========     at 0x000000f8 in addForces
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00c00
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM11CudaContext13executeKernelEP9CUfunc_stPPviij + 0x79) [0x3f929]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMTorchCUDA.so (_ZN11TorchPlugin24CudaCalcTorchForceKernel7executeERN6OpenMM11ContextImplEbb + 0x807) [0x8287]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0xca) [0xccaba]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM16VerletIntegrator4stepEi + 0x6e) [0x1747ce]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0xa874c]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0x387) [0x165497]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x14ea) [0x1cdcba]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x6a0) [0x1cce70]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Uninitialized __global__ memory read of size 4
=========     at 0x00000148 in addForces
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00c04
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM11CudaContext13executeKernelEP9CUfunc_stPPviij + 0x79) [0x3f929]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMTorchCUDA.so (_ZN11TorchPlugin24CudaCalcTorchForceKernel7executeERN6OpenMM11ContextImplEbb + 0x807) [0x8287]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0xca) [0xccaba]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM16VerletIntegrator4stepEi + 0x6e) [0x1747ce]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0xa874c]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0x387) [0x165497]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x14ea) [0x1cdcba]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x6a0) [0x1cce70]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]
=========
========= Uninitialized __global__ memory read of size 4
=========     at 0x000001b0 in addForces
=========     by thread (0,0,0) in block (0,0,0)
=========     Address 0x7f4ae9a00c08
=========     Saved host backtrace up to driver entry point 
=========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (cuLaunchKernel + 0x34e) [0x2d725e]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMCUDA.so (_ZN6OpenMM11CudaContext13executeKernelEP9CUfunc_stPPviij + 0x79) [0x3f929]
=========     Host Frame:/home/user/conda/lib/plugins/libOpenMMTorchCUDA.so (_ZN11TorchPlugin24CudaCalcTorchForceKernel7executeERN6OpenMM11ContextImplEbb + 0x807) [0x8287]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM11ContextImpl19calcForcesAndEnergyEbbi + 0xca) [0xccaba]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/../../../../libOpenMM.so (_ZN6OpenMM16VerletIntegrator4stepEi + 0x6e) [0x1747ce]
=========     Host Frame:/home/user/conda/lib/python3.7/site-packages/simtk/openmm/_openmm.cpython-37m-x86_64-linux-gnu.so [0xa874c]
=========     Host Frame:python (_PyMethodDef_RawFastCallKeywords + 0x1e0) [0x165c10]
=========     Host Frame:python (_PyCFunction_FastCallKeywords + 0x21) [0x165db1]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4dee) [0x1d15be]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x4a59) [0x1d1229]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0x387) [0x165497]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x14ea) [0x1cdcba]
=========     Host Frame:python (_PyFunction_FastCallKeywords + 0xfb) [0x16520b]
=========     Host Frame:python (_PyEval_EvalFrameDefault + 0x6a0) [0x1cce70]
=========     Host Frame:python (_PyEval_EvalCodeWithName + 0x2f9) [0x1152b9]
=========     Host Frame:python (PyEval_EvalCodeEx + 0x44) [0x1161d4]
=========     Host Frame:python (PyEval_EvalCode + 0x1c) [0x1161fc]
=========     Host Frame:python [0x22bf44]
=========     Host Frame:python (PyRun_FileExFlags + 0xa1) [0x2362b1]
=========     Host Frame:python (PyRun_SimpleFileExFlags + 0x1c3) [0x2364a3]
=========     Host Frame:python [0x2375d5]
=========     Host Frame:python (_Py_UnixMain + 0x3c) [0x2376fc]
=========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xe7) [0x21b97]
=========     Host Frame:python [0x1dc3c0]

@peastman
Copy link
Member

$ conda list | grep mkl
blas                      1.0                         mkl  
mkl                       2019.4                      243  
mkl-service               2.3.0            py37he904b0f_0  
mkl_fft                   1.1.0            py37h23d657b_0  
mkl_random                1.0.4            py37hd81dba3_0

It's there, but perhaps I need to do something extra to add it to the library path. This is a difference between the standard and conda builds. When I look at the version of libtorch downloaded from the PyTorch website it doesn't have that dependency:

$ ldd ~/libtorch/lib/libtorch_cpu.so 
	linux-vdso.so.1 =>  (0x00007fffd63de000)
	libgomp-753e6e92.so.1 => /home/peastman/libtorch/lib/libgomp-753e6e92.so.1 (0x00007fc60f3a6000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc60f189000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007fc60ef81000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fc60ed6b000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc60eb67000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc60e85e000)
	libcudart-1b201d85.so.10.1 => /home/peastman/libtorch/lib/libcudart-1b201d85.so.10.1 (0x00007fc60e5df000)
	libc10.so => /home/peastman/libtorch/lib/libc10.so (0x00007fc60e38a000)
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc60e008000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc60dc3e000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fc61e8f9000)

But the conda build does:

$ ldd ~/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so 
	linux-vdso.so.1 =>  (0x00007ffca1ffd000)
	libgomp.so.1 => /home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch//lib/../../../../libgomp.so.1 (0x00007f7d4e2df000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f7d498b9000)
	librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f7d496b1000)
	libgcc_s.so.1 => /home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch//lib/../../../../libgcc_s.so.1 (0x00007f7d4e2a8000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f7d494ad000)
	libmkl_intel_lp64.so => /home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch//lib/../../../../libmkl_intel_lp64.so (0x00007f7d48935000)
	libmkl_gnu_thread.so => /home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch//lib/../../../../libmkl_gnu_thread.so (0x00007f7d470e2000)
	libmkl_core.so => /home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch//lib/../../../../libmkl_core.so (0x00007f7d42e0d000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f7d42b04000)
	libc10.so => /home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch//lib/libc10.so (0x00007f7d4e25b000)
	libstdc++.so.6 => /home/peastman/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch//lib/../../../../libstdc++.so.6 (0x00007f7d42990000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7d425c6000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f7d4e0e8000)

@peastman
Copy link
Member

I finally managed to get it to compile. Now I get this error when I try to run the script:

simtk.openmm.OpenMMException: Cannot initialize CUDA without ATen_cuda library. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason. The CUDA library MUST be loaded, EVEN IF you don't directly use any symbols from the CUDA library! One common culprit is a lack of -Wl,--no-as-needed in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don't depend on any of their symbols. You can check if this has occurred by using ldd on your binary to see if there is a dependency on *_cuda.so library. (initCUDA at /tmp/pip-req-build-w9kte7xz/aten/src/ATen/detail/CUDAHooksInterface.h:63)

None of these problems happen when using the official PyTorch builds. I've already spent hours on this with no clear end in sight. I'm going to switch back to the official build, then rework your script to split out creating the .pt file from running the simulation.

@peastman
Copy link
Member

I can reproduce the results you're seeing, but it doesn't seem to have anything to do with multiple simulations. It just depends on the platform. With OpenCL or CPU I get excellent energy conservation:

#"Step"	"Total Energy (kJ/mole)"	"Speed (ns/day)"
1000	14.000000179612432	0
2000	14.000006315452213	504
...
99000	14.00000036979219	521
100000	14.000007236732628	521

With CUDA the energy conservation is worse:

#"Step"	"Total Energy (kJ/mole)"	"Speed (ns/day)"
1000	14.000029598017427	0
2000	14.000038045238302	274
...
99000	13.991048758717174	273
100000	13.991120108011508	273

Furthermore, with CUDA the numbers vary from one run to the next, which shouldn't be the case. I'll keep investigating.

@raimis
Copy link
Contributor Author

raimis commented Jul 31, 2020

Thanks for your effort, @peastman! Just keep in mind, the problem is caused not only by the second simulation, but also by other processes on GPU (i.e. VMD, Chrome, etc).

@peastman
Copy link
Member

The error seems to be happening when we copy the forces over from PyTorch's buffers to OpenMM's buffers. Usually it works correctly, but once in a while the forces come out as zero. This seems like a race condition, but scattering calls to cuCtxSynchronize() through the code doesn't fix it. I might just switch to downloading the forces to host and sending them back again, the same way the other platforms do. It's slower than copying them on the GPU, but at least it should work.

@peastman
Copy link
Member

After changing that, the trajectory is identical to the one from OpenCL:

#"Step"	"Total Energy (kJ/mole)"	"Speed (ns/day)"
1000	14.000000179612432	0
2000	14.000005361777896	199
...
99000	14.00000036979219	199
100000	14.000007236732628	199

It's slower than before, but that's probably mostly because it's a trivial model. For a real system the overhead would be much less significant.

@peastman
Copy link
Member

The fix is in #15. Could you try it out and see if it now works for you?

@raimis
Copy link
Contributor Author

raimis commented Aug 3, 2020

Regarding #15, I think, the problem is not the race condition, but the multiple-context problem:

I guess, the a more correct fix:

@raimis
Copy link
Contributor Author

raimis commented Aug 5, 2020

In addition to the previous comments:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants