Fill() regression in 0.3.9 #19

slayoo · 2021-09-13T16:01:15Z

The following used to work up until 0.3.8 but fails with newer versions without any clear error message:

import ThrustRTC as T
v = T.device_vector("float", 10)
T.Fill(v, T.DVFloat(0.))

Here's a example on fresh Google Colab GPU runtime:

OK on 0.3.8:

fynv · 2021-09-24T10:25:34Z

I have an impression that 0.3.9 was a broken build which I uploaded to Pypi by accident. And Pypi doesn't allow me to overwrite such packages.

slayoo · 2021-09-24T16:58:03Z

This problem happens in all versions starting with 0.3.9, 0.3.13 included:

fynv · 2021-09-25T05:40:13Z

Can't reproduce so far..

fynv · 2021-09-25T06:04:26Z

Try deleting the file "__ptx_cache__.db". Chance is that there's some broken ptx code.

slayoo · 2021-09-26T01:00:01Z

Thanks, found the file here on Colab: /content/__ptx_cache__.db but removing it does not change the behavior

fynv · 2021-09-26T02:32:24Z

There should be some detailed failure message printed through stdout from C code. But using Colab, you can't see it. We shall take a look if it is also reproduced in an enviroment where stdout is visible.

slayoo · 2021-09-26T03:35:03Z

got it by executing python through shell:

!python -c "import ThrustRTC as T; v = T.device_vector('float', 10); T.Fill(v, T.DVFloat(0.))"

full output:

header_of_structs.h:
1	

saxpy.cu:
1	#define DEVICE_ONLY
2	#include "cstdint"
3	#include "cfloat"
4	#include "cuComplex.h"
5	#include "built_in.h"
6	#include "header_of_structs.h"
7	__device__ float _test;
8	

Errors:
nvrtc: error: invalid value for --gpu-architecture (-arch)

cuMemAlloc() failed with Error code: 2
Error Name: CUDA_ERROR_OUT_OF_MEMORY
Error Description: out of memory
header_of_structs.h:
1	struct _S_1bdca0bfccd39798
2	{
3	    typedef _S_1bdca0bfccd39798 CurType;
4	     view_vec;
5	    float value;
6	    template<class _T0>
7	    __device__ inline auto operator()(const _T0& idx)
8	    {
9	    view_vec[idx]=(decltype(view_vec)::value_t)value;    }
10	};
11	

saxpy.cu:
1	#define DEVICE_ONLY
2	#include "cstdint"
3	#include "cfloat"
4	#include "cuComplex.h"
5	#include "built_in.h"
6	#include "header_of_structs.h"
7	__device__ _S_1bdca0bfccd39798 _test;
8	__device__ size_t _res[3] = {(char*)&_test.view_vec - (char*)&_test, (char*)&_test.value - (char*)&_test, sizeof(_test)};
9	

Errors:
nvrtc: error: invalid value for --gpu-architecture (-arch)

header_of_structs.h:
1	struct _S_1bdca0bfccd39798
2	{
3	    typedef _S_1bdca0bfccd39798 CurType;
4	     view_vec;
5	    float value;
6	    template<class _T0>
7	    __device__ inline auto operator()(const _T0& idx)
8	    {
9	    view_vec[idx]=(decltype(view_vec)::value_t)value;    }
10	};
11	

saxpy.cu:
1	#define DEVICE_ONLY
2	#include "cstdint"
3	#include "cfloat"
4	#include "cuComplex.h"
5	#include "built_in.h"
6	#include "header_of_structs.h"
7	
8	extern "C" __global__
9	void saxpy(size_t n, _S_1bdca0bfccd39798 func)
10	{
11	    size_t tid =  threadIdx.x + blockIdx.x*blockDim.x;
12	    if(tid>=n) return;
13	    func(tid);
14	
15	}
16	

Errors:
nvrtc: error: invalid value for --gpu-architecture (-arch)

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/lib/python3.7/dist-packages/ThrustRTC/Transformations.py", line 4, in Fill
    check_i(native.n_fill(vec.m_cptr, value.m_cptr))
  File "/usr/local/lib/python3.7/dist-packages/ThrustRTC/Native.py", line 16, in check_i
    raise SystemError("An internal error happend")
SystemError: An internal error happend

fynv · 2021-09-27T07:30:19Z

What is the CUDA version and GPU type being used here?

In ThrustRTC, I use:
cuDeviceGetAttribute(&major, CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR, cuDevice);
to get the major compute capability (major is clamped to [2,7]). Then
sprintf(opt, "--gpu-architecture=compute_%d0", major);
is used to set the gpu-architecture.
The above prcess seems to have caused the issue.

Not very sure yet how to fix this. Most likely, the arch compute_70 is not supported by the CUDA (nvrtc) being used here. A CUDA version above 10 should not have such issue,

slayoo · 2021-09-27T14:09:24Z

Thanks

 from numba import cuda
 cuda.detect()

gives

Found 1 CUDA devices
id 0            b'Tesla K80'                              [SUPPORTED]
                      compute capability: 3.7
                           pci device id: 4
                              pci bus id: 0
Summary:
	1/1 devices are supported

True

slayoo · 2021-09-27T14:10:33Z

and !nvcc --version gives:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

fynv · 2021-09-28T02:02:57Z

Ok. That is compute_30 hitting the lower-bound of CUDA 11.1.
You need to down-grade your CUDA to 10.0 or lower, or upgrade your GPU to Maxwell or higher.

In a Kepler + CUDA 11.1 setup, basically any CUDA code that requires compilation won't work.

For ThrustRTC, there are 2 different cases where you'll hit a compilation problem:
A. Volta GPU (or higher) + CUDA 8.0 or lower, where compute_70 hits the upper-bound of CUDA
B. Kepler GPU (or lower) + CUDA 10.2 or higher, where compute_30 hits the lower bound of CUDA

slayoo · 2021-09-28T16:14:03Z

Thanks. There is no control over hardware in Colab. Will try getting non-default CUDA version... but it's tricky to ask users to do it every time...

Why is it working then with ThrustRTC 0.3.8?

Can we do any better in making this reported to the users by ThrustRTC?

fynv · 2021-09-28T23:41:15Z

I think it was "working" in 0.3.8 just because some checking was missing. It might give you some incorrect result silently back then.

slayoo · 2021-09-29T05:20:47Z

Confirming it works on Colab after downgrading CUDA.
Let me then close this issue and open a new one re reporting the need to change CUDA version

slayoo · 2021-09-30T18:07:26Z

Downgrading CUDA make the short example above work indeed, but trying with the actual code I've earlier extracted the minimal reproducer I now get (with CUDA 10.0):

nvrtc: error: failed to load builtins for compute_30.

Any hints?
Thanks

fynv · 2021-10-01T00:39:37Z

Try export LD_LIBRARY_PATH="/usr/local/cuda/lib64"
It might be solved by this or it could be an incomplete support of an arch, which will be tricky..

fynv · 2021-10-01T01:18:34Z

Especially, if you downgraded CUDA by hack, you need to make sure that libnvrtc-builtins.so from CUDA 10 is within LD_LIBRARY_PATH, not the one from CUDA 11.

slayoo · 2021-10-01T14:16:24Z

It works out of the box with default Colab Cuda with ThrustRTC 0.3.15!
Thank you !!!

slayoo mentioned this issue Sep 13, 2021

GPU examples stopped working on Colab! open-atmos/PySDM#566

Closed

slayoo added a commit to open-atmos/PySDM that referenced this issue Sep 19, 2021

require ThrustRTC==0.3.8 (due to fynv/ThrustRTC#19)

12ab1ce

slayoo mentioned this issue Sep 20, 2021

Bump thrustrtc from 0.3.8 to 0.3.13 open-atmos/PySDM#628

Closed

slayoo closed this as completed Sep 29, 2021

slayoo mentioned this issue Sep 29, 2021

raise an exception in case of incompatible CUDA version #20

Closed

slayoo reopened this Sep 30, 2021

slayoo closed this as completed Oct 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fill() regression in 0.3.9 #19

Fill() regression in 0.3.9 #19

slayoo commented Sep 13, 2021

fynv commented Sep 24, 2021 •

edited

slayoo commented Sep 24, 2021

fynv commented Sep 25, 2021

fynv commented Sep 25, 2021 •

edited

slayoo commented Sep 26, 2021

fynv commented Sep 26, 2021

slayoo commented Sep 26, 2021

fynv commented Sep 27, 2021 •

edited

slayoo commented Sep 27, 2021

slayoo commented Sep 27, 2021

fynv commented Sep 28, 2021

slayoo commented Sep 28, 2021

fynv commented Sep 28, 2021

slayoo commented Sep 29, 2021

slayoo commented Sep 30, 2021

fynv commented Oct 1, 2021 •

edited

fynv commented Oct 1, 2021

slayoo commented Oct 1, 2021

Fill() regression in 0.3.9 #19

Fill() regression in 0.3.9 #19

Comments

slayoo commented Sep 13, 2021

fynv commented Sep 24, 2021 • edited

slayoo commented Sep 24, 2021

fynv commented Sep 25, 2021

fynv commented Sep 25, 2021 • edited

slayoo commented Sep 26, 2021

fynv commented Sep 26, 2021

slayoo commented Sep 26, 2021

fynv commented Sep 27, 2021 • edited

slayoo commented Sep 27, 2021

slayoo commented Sep 27, 2021

fynv commented Sep 28, 2021

slayoo commented Sep 28, 2021

fynv commented Sep 28, 2021

slayoo commented Sep 29, 2021

slayoo commented Sep 30, 2021

fynv commented Oct 1, 2021 • edited

fynv commented Oct 1, 2021

slayoo commented Oct 1, 2021

fynv commented Sep 24, 2021 •

edited

fynv commented Sep 25, 2021 •

edited

fynv commented Sep 27, 2021 •

edited

fynv commented Oct 1, 2021 •

edited