CUDA platform error: CUDA_ERROR_INVALID_PTX #2251

tristanic · 2019-01-28T17:32:44Z

This seems to be a problem either with my specific environment or with how OpenMM has been ported into ChimeraX, but there's not a lot for me to go on in the traceback. This happens for both OpenMM 7.2.2 and 7.3, and isn't specific to a given simulation (I get the same error if I try to run OpenMM's benchmark.py using ChimeraX's python). Really just looking for pointers on where to start with debugging.

Traceback (most recent call last):
  File "/home/tic20/.local/share/ChimeraX/0.9/site-packages/chimerax/isolde/isolde.py", line 2482, in _start_sim_or_toggle_pause
    self.start_sim()
  File "/home/tic20/.local/share/ChimeraX/0.9/site-packages/chimerax/isolde/isolde.py", line 2508, in start_sim
    sm.start_sim()
  File "/home/tic20/.local/share/ChimeraX/0.9/site-packages/chimerax/isolde/openmm/openmm_interface.py", line 632, in start_sim
    sh.start_sim()
  File "/home/tic20/.local/share/ChimeraX/0.9/site-packages/chimerax/isolde/openmm/openmm_interface.py", line 1393, in start_sim
    self._prepare_sim()
  File "/home/tic20/.local/share/ChimeraX/0.9/site-packages/chimerax/isolde/openmm/openmm_interface.py", line 1355, in _prepare_sim
    integrator, platform)
  File "/opt/UCSF/ChimeraX-daily/lib/python3.7/site-packages/simtk/openmm/app/simulation.py", line 103, in __init__
    self.context = mm.Context(self.system, self.integrator, platform)
  File "/opt/UCSF/ChimeraX-daily/lib/python3.7/site-packages/simtk/openmm/openmm.py", line 12231, in __init__
    this = _openmm.new_Context(*args)
Exception: Error loading CUDA module: CUDA_ERROR_INVALID_PTX (218)

Exception: Error loading CUDA module: CUDA_ERROR_INVALID_PTX (218)

File "/opt/UCSF/ChimeraX-daily/lib/python3.7/site-packages/simtk/openmm/openmm.py", line 12231, in __init__
this = _openmm.new_Context(*args)

The text was updated successfully, but these errors were encountered:

peastman · 2019-01-28T20:05:49Z

Sorry, not much idea. That's the sort of error that mostly just shouldn't happen. It could be an error in the CUDA compiler, or a corrupted file on disk, or perhaps trying to load an out of date file, or perhaps linking against one CUDA toolkit but using the compiler from a different version, or various other things. For what it's worth, here's where that error message gets generated:

https://github.com/pandegroup/openmm/blob/master/platforms/cuda/src/CudaContext.cpp#L675

Perhaps you can figure out what's wrong with the PTX it's trying to load.

tristanic · 2019-02-01T11:44:26Z

Haven't dug into the code yet, but I have compiled OpenMM 7.3 against CUDA 8.0 on my Fedora 25 machine (seems I'll have to update to Fedora 27 if I want CUDA 9.2) and installed into the ChimeraX environment. That works fine. On the other hand, running from an installation of OpenMM 7.3 in a fresh Anaconda virtualenv on my CentOS 7 machine (with the CUDA9.2 library and bin dirs first in LD_LIBRARY_PATH and PATH respectively) gives the same CUDA_ERROR_INVALID_PTX. So it's an environment problem, nothing to do with ChimeraX. I guess the question now is, is this a problem with CentOS 7 in general, or just my machine? Will try some debugging and see.

tristanic · 2019-02-01T14:07:28Z

Edited CudaContext.cpp to report the nvcc command and the filename it fails on to stderr, then immediately die so I could catch the temp file. It dies on the very first one (.cu, .log and .ptx attached). So it's building it using the correct nvcc version and writing it safely, but failing on reading.

ptx.zip

tristanic · 2019-02-01T14:37:54Z

At the end of all that, it's a boring old driver incompatibility. Updated my display driver from 390 to 415, and all is well.

tristanic closed this as completed Feb 1, 2019

vaibhavadixit mentioned this issue Sep 1, 2021

simtk.openmm.OpenMMException: Error loading CUDA module: CUDA_ERROR_INVALID_PTX (218) seekrcentral/seekr2#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA platform error: CUDA_ERROR_INVALID_PTX #2251

CUDA platform error: CUDA_ERROR_INVALID_PTX #2251

tristanic commented Jan 28, 2019

peastman commented Jan 28, 2019

tristanic commented Feb 1, 2019

tristanic commented Feb 1, 2019

tristanic commented Feb 1, 2019

CUDA platform error: CUDA_ERROR_INVALID_PTX #2251

CUDA platform error: CUDA_ERROR_INVALID_PTX #2251

Comments

tristanic commented Jan 28, 2019

peastman commented Jan 28, 2019

tristanic commented Feb 1, 2019

tristanic commented Feb 1, 2019

tristanic commented Feb 1, 2019