Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation errors on jenkins #2648

Closed
peastman opened this issue Apr 20, 2020 · 14 comments
Closed

Compilation errors on jenkins #2648

peastman opened this issue Apr 20, 2020 · 14 comments

Comments

@peastman
Copy link
Member

Jenkins recently started reporting these errors on CUDA (see for example #2615):

The following tests FAILED:
	 51 - TestCudaCMAPTorsionForceSingle (Failed)
	 52 - TestCudaCMAPTorsionForceMixed (Failed)
	 53 - TestCudaCMAPTorsionForceDouble (Failed)
Errors while running CTest
Test project /scratch/jenkins-gpu/workspace/openmm_openmm_PR-2615
    Start 51: TestCudaCMAPTorsionForceSingle
    Start 52: TestCudaCMAPTorsionForceMixed
1/3 Test #51: TestCudaCMAPTorsionForceSingle ...***Failed    1.22 sec
exception: Error launching CUDA compiler: 256
/tmp/openmmTempKernel0x55df8528abd0_5912.cu(770): error: calling a constexpr __host__ function("fmin") from a __global__ function("computeBondedForces") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/tmp/openmmTempKernel0x55df8528abd0_5912.cu(771): error: calling a constexpr __host__ function("fmin") from a __global__ function("computeBondedForces") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

2 errors detected in the compilation of "/tmp/tmpxft_00001763_00000000-6_openmmTempKernel0x55df8528abd0_5912.cpp1.ii".

    Start 53: TestCudaCMAPTorsionForceDouble
2/3 Test #52: TestCudaCMAPTorsionForceMixed ....***Failed    1.68 sec
exception: Error launching CUDA compiler: 256
/tmp/openmmTempKernel0x564deb468b40_5913.cu(771): error: calling a constexpr __host__ function("fmin") from a __global__ function("computeBondedForces") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/tmp/openmmTempKernel0x564deb468b40_5913.cu(772): error: calling a constexpr __host__ function("fmin") from a __global__ function("computeBondedForces") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

2 errors detected in the compilation of "/tmp/tmpxft_0000177f_00000000-6_openmmTempKernel0x564deb468b40_5913.cpp1.ii".

3/3 Test #53: TestCudaCMAPTorsionForceDouble ...***Failed    1.49 sec
exception: Error launching CUDA compiler: 256
/tmp/openmmTempKernel0x5574c59a3dc0_5995.cu(771): error: calling a constexpr __host__ function("fmin") from a __global__ function("computeBondedForces") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/tmp/openmmTempKernel0x5574c59a3dc0_5995.cu(772): error: calling a constexpr __host__ function("fmin") from a __global__ function("computeBondedForces") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

2 errors detected in the compilation of "/tmp/tmpxft_00001799_00000000-6_openmmTempKernel0x5574c59a3dc0_5995.cpp1.ii".

It thinks fmin() is a host function. But the CUDA documentation claims it's defined as a device function: https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__DOUBLE.html#group__CUDA__MATH__DOUBLE_1gc970b9542e2d3e8e5d1e3ebb6a705dde.

@peastman
Copy link
Member Author

These tests pass when I run them locally. I'm using CUDA 10.2, compared to 10.1 on jenkins, but that shouldn't matter.

@YevChern
Copy link
Contributor

I'm able to reproduce these errors with gcc 7.3.0, but not with 5.4.0. CUDA 10.0.130 is used in both cases.

@thtrummer
Copy link
Contributor

When compiling a CUDA kernel with nvcc the cuda_runtime.h header file will be included implicitly. This in turn will pull in additional system headers.

For example on Windows:

λ nvcc -E empty.cu | grep -o ucrt.*h | sort | uniq
ucrt\\assert.h
ucrt\\corecrt.h
ucrt\\corecrt_malloc.h
ucrt\\corecrt_math.h
ucrt\\corecrt_memcpy_s.h
ucrt\\corecrt_memory.h
ucrt\\corecrt_search.h
ucrt\\corecrt_stdio_config.h
ucrt\\corecrt_terminate.h
ucrt\\corecrt_wstdio.h
ucrt\\corecrt_wstdlib.h
ucrt\\corecrt_wstring.h
ucrt\\corecrt_wtime.h
ucrt\\crtdbg.h
ucrt\\errno.h
ucrt\\malloc.h
ucrt\\math.h
ucrt\\stddef.h
ucrt\\stdio.h
ucrt\\stdlib.h
ucrt\\string.h
ucrt\\time.h

This includes the templated fmin function from C++11 which seems to be the preferred overload when one of the parameters is an integer.

@cadalyjr
Copy link

cadalyjr commented Jul 3, 2020

Hey OpenMM Devs - I experience a similar issue using CUDA 10.1. I did not compile it myself but I can get that info if it's helpful. The simulation runs fine with OpenCL.

Traceback (most recent call last):
  File "simulatePdb.py", line 16, in <module>
    simulation = Simulation(modeller.topology, system, integrator, platform)
  File "/homes/cdaly2/anaconda3/envs/openmm/lib/python3.7/site-packages/simtk/openmm/app/simulation.py", line 103, in __init__
    self.context = mm.Context(self.system, self.integrator, platform)
  File "/homes/cdaly2/anaconda3/envs/openmm/lib/python3.7/site-packages/simtk/openmm/openmm.py", line 18608, in __init__
    this = _openmm.new_Context(*args)
Exception: Error launching CUDA compiler: 256
/tmp/openmmTempKernel0x5564ab778e00_28902.cu(956): error: calling a constexpr __host__ function("fmin") from a __global__ function("computeBondedForces") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/tmp/openmmTempKernel0x5564ab778e00_28902.cu(957): error: calling a constexpr __host__ function("fmin") from a __global__ function("computeBondedForces") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

2 errors detected in the compilation of "/tmp/tmpxft_00007104_00000000-6_openmmTempKernel0x5564ab778e00_28902.cpp1.ii".

My python script:

from simtk.openmm.app import *
from simtk.openmm import *
from simtk.unit import *
from sys import stdout

pdb = PDBFile('step1_pdbreader.pdb')
modeller = Modeller(pdb.topology, pdb.positions)
forcefield = ForceField('charmm36.xml', 'charmm36/water.xml')
#modeller.addHydrogens(forcefield)
platform = Platform.getPlatformByName('OpenCL')
modeller.addSolvent(forcefield, padding=1.4*nanometers, ionicStrength=0.01*molar)
system = forcefield.createSystem(modeller.topology, nonbondedMethod=PME,
                                    nonbondedCutoff=1.4*nanometers,
                                    constraints=HBonds)
integrator = LangevinIntegrator(300*kelvin, 1/picosecond, 0.002*picoseconds)
simulation = Simulation(modeller.topology, system, integrator, platform)
simulation.context.setPositions(modeller.positions)
simulation.minimizeEnergy()
simulation.reporters.append(PDBReporter('output.pdb', 100))
simulation.reporters.append(StateDataReporter('output.dat', 100,
step=True, potentialEnergy=True, temperature=True, separator=' '))
simulation.reporters.append(StateDataReporter(stdout, 100,
step=True, potentialEnergy=True, temperature=True, separator=' '))
levels = 100000
simulation.step(10000000)

@peastman
Copy link
Member Author

peastman commented Jul 6, 2020

Thanks! This was actually fixed by #2694, so I'll go ahead and close it.

@peastman peastman closed this as completed Jul 6, 2020
@cadalyjr
Copy link

cadalyjr commented Jul 6, 2020

Hey Peastman - I looked at the fix in the previous github post. While I'm happy the issue was solved, I'm a little unsure how to apply the fix, which seemed to basically be a permissions issue, to my case. From that description, it seemed to be a machine specific issue, but I'm experiencing it as well.

@peastman
Copy link
Member Author

peastman commented Jul 6, 2020

Just copy the changes from https://github.com/openmm/openmm/pull/2694/files. It's just minor changes to a few lines.

@cadalyjr
Copy link

cadalyjr commented Jul 6, 2020

Thank you for your help, and for your patience with my ignorance. I'm having some trouble finding these files; it may be because I installed OpenMM via anaconda. As far as I can tell, via find ~anaconda3/ ".cc", there are no '.cc' files anywhere in my while anaconda3 installation, including my OpenMM environment. Have I misunderstood something? Should I download OpenMM via git and compile it directly to get the updated files? Or should I just wait till these changes are released in the next version of OpenMM?

@peastman
Copy link
Member Author

peastman commented Jul 6, 2020

If you want to make this change yourself you need to compile from source. Otherwise, they'll be in the next release.

@ajasja
Copy link

ajasja commented Jul 29, 2020

Hi, is this in the latest release? I have openmm 7.4.2 py37_cuda101_rc_1 omnia

and still get

Traceback (most recent call last):
  File "run_openmm_simulation.py", line 57, in <module>
    simulation = Simulation(topology, system, integrator, platform)
  File "/home/ajasja/.conda/envs/pyro_scripts/lib/python3.7/site-packages/simtk/openmm/app/simulation.py", line 103, in __init__
    self.context = mm.Context(self.system, self.integrator, platform)
  File "/home/ajasja/.conda/envs/pyro_scripts/lib/python3.7/site-packages/simtk/openmm/openmm.py", line 18608, in __init__
    this = _openmm.new_Context(*args)
Exception: Error launching CUDA compiler: 256
/scratch/ajasja/5366501/openmmTempKernel0x55555c7c6a50_531860.cu(956): error: calling a constexpr __host__ function("fmin") from a __global__ function("computeBondedForces") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

/scratch/ajasja/5366501/openmmTempKernel0x55555c7c6a50_531860.cu(957): error: calling a constexpr __host__ function("fmin") from a __global__ function("computeBondedForces") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.

@ajasja
Copy link

ajasja commented Jul 29, 2020

(Also, probably not in a release yet, I was confused by the fact that the issue was open in April, not solved in April:/)
Would a different CUDA version be a work around?

@peastman
Copy link
Member Author

Not yet. The fix was just merged on May 29. It seems to depend on what compiler you're using, but not on the CUDA version: #2648 (comment)

@ajasja
Copy link

ajasja commented Jul 29, 2020

Thanks! Would switching the cuda compiler help? I guess the only other options is compiling from source?

@ajasja
Copy link

ajasja commented Jul 29, 2020

Downgrading to 7.4.0 at least lets me run the cuda mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants