Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enoki PTX linker error #70

Closed
PidgeyBE opened this issue Mar 29, 2020 · 12 comments
Closed

Enoki PTX linker error #70

PidgeyBE opened this issue Mar 29, 2020 · 12 comments

Comments

@PidgeyBE
Copy link
Contributor

PidgeyBE commented Mar 29, 2020

Hey all!

First of all thank you very much for publishing/releasing mitsuba2!
I wanted to start experimenting with inverse rendering and tried multiple platforms (Google Colab and my own hardware), but I keep facing the exact same issue everywhere:

import mitsuba                                                                              
mitsuba.set_variant('gpu_autodiff_rgb') 

# The C++ type associated with 'Float' is enoki::DiffArray<enoki::CUDAArray<float>> 
from mitsuba.core import Float 
import enoki as ek 

# Initialize a dynamic CUDA floating point array with some values 
x = Float([1, 2, 3])                                                                        
# Tell Enoki that we'll later be interested in gradients of 
# an as-of-yet unspecified objective function with respect to 'x' 
ek.set_requires_gradient(x) 

# Example objective function: sum of squares 
y = ek.hsum(x * x) 

PTX linker error:
ptxas fatal : SM version specified by .target is higher than default SM version assumed
cuda_check(): driver API error = 0400 "CUDA_ERROR_INVALID_HANDLE" in ../ext/enoki/src/cuda/jit.cu:253.

I've tried different GPU's and the results are:

GPU Driver version CUDA version Result Computing Capability
Geforce 940M 440.64 10.0.130 Fails 5.0
K80 418.67 10.0.130 Fails 3.7
Tesla P4 418.67 10.0.130 WORKS 6.1
P100 418.67 10.0.130 Fails 6.0

-> The weird thing is that the issue does not occur on a Tesla P4 but it does on all the others

Does anyone have an idea what can cause this and how I can fix it?

Thanks a lot! Pieterjan

@PidgeyBE
Copy link
Contributor Author

PidgeyBE commented Mar 29, 2020

Could it be related to this?
https://github.com/mitsuba-renderer/mitsuba2/blob/dbcecba782a228fcda134558f9ae57fa91033967/resources/ptx/Makefile#L4

This seems to specify the compute capability of the GPU and 61 matches the P4 indeed... https://en.wikipedia.org/wiki/CUDA

Would it be possible to support more types?
http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

@wjakob
Copy link
Member

wjakob commented Mar 29, 2020

Hi,

you need a Maxwell-class GPU or newer to run Mitsuba's inverse renderer (e.g. GeForce 1080). The bottleneck here is actually not Enoki (it could likely compile and run with a much lower compute capability), but OptiX which requires Maxwell/Turing for the "RTGeometryTriangles" primitive that we depend on.

Best,
Wenzel

@PidgeyBE
Copy link
Contributor Author

Hi Wenzel

Glad to receive a reaction from the legend himself!
I was reading this page: http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

When you compile CUDA code, you should always compile only one ‘-arch‘ flag that matches your most used GPU cards. This will enable faster runtime, because code generation will occur during compilation.
If you only mention ‘-gencode‘, but omit the ‘-arch‘ flag, the GPU code generation will occur on the JIT compiler by the CUDA driver.

and was wondering if this script: https://github.com/mitsuba-renderer/mitsuba2/blob/dbcecba782a228fcda134558f9ae57fa91033967/resources/ptx/Makefile#L4 shouldn't contain the -arch flag to make it more performant?

@wjakob
Copy link
Member

wjakob commented Mar 29, 2020

We compile to PTX (compute_..) instead of specific device code (sm_..) because this enables the resulting shared library to be moved between systems that potentially have different GPUs. The downside is minimal, a few hundred milliseconds for JIT compilation the first time that Enoki is used. (The resulting native code is cached in ~/.nv in your home directory)

@PidgeyBE
Copy link
Contributor Author

PidgeyBE commented Mar 29, 2020

Aha ok, thanks for the explanation! You say a Maxwell class GPU (or newer) is needed for Optix, but all the GPU's I tested (except K80) are Maxwell or better...

I have the feeling that the code only runs on a GPU with compute capability 61, as the P4 or 1080...

@wjakob
Copy link
Member

wjakob commented Mar 29, 2020

Aha, that could well be -- maybe we're setting the flag too strictly. Can you try just manually setting it to something smaller? What is the C.C. of the other maxwell devices?

@PidgeyBE
Copy link
Contributor Author

I've updated the table in my initial post with the C.C's.
I've changed https://github.com/mitsuba-renderer/mitsuba2/blob/dbcecba782a228fcda134558f9ae57fa91033967/resources/ptx/Makefile this file (61 -> 50) and did

make clean
make all

Now I'm rebuilding mitsuba2 on my laptop (GeForce 940M, Maxwell, with CC=5.0) 🤞
I know this hardware is not ideal, but I just want to do some basic tests at this point...

@PidgeyBE
Copy link
Contributor Author

Still exactly the same error...

PTX linker error:
ptxas fatal : SM version specified by .target is higher than default SM version assumed
cuda_check(): driver API error = 0400 "CUDA_ERROR_INVALID_HANDLE" in ../ext/enoki/src/cuda/jit.cu:253.

I'm wondering where the "SM version" is still defined....

@merlinND
Copy link
Member

In CMake, there's a variable called ENOKI_CUDA_COMPUTE_CAPABILITY which you could try and set to match your change in the Makefile you pointed out:

set(ENOKI_CUDA_COMPUTE_CAPABILITY "61" CACHE STRING "Compute capability as specified by https://developer.nvidia.com/cuda-gpus")

@PidgeyBE
Copy link
Contributor Author

Hey Merlin
Thanks! That did the job (together with the other fix)!
I can now run the cbox example on my laptop's Geforce 940M! :)

Should I create a PR for this, or do you guys prefer keeping it as is?

@wjakob
Copy link
Member

wjakob commented Mar 30, 2020

Yes, that would be great -- please make a PR that downgrades the compute capabilities to the minimal version known to work.

@PidgeyBE
Copy link
Contributor Author

@wjakob wjakob closed this as completed Mar 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants