cuda issue #8

brad0taylor · 2017-01-19T13:55:30Z

When i run this code

a = torch.Tensor(5, 3) # construct a 5x3 matrix, uninitialized
b = torch.Tensor(5, 3) # construct a 5x3 matrix, uninitialized
if torch.cuda.is_available():
aa= a.cuda()
bb = b.cuda()
aa + bb

I get the following error message
RuntimeError: cuda runtime error (8) : invalid device function at /data/users/soumith/miniconda2/conda-bld/pytorch-0.1.6_1484802121799/work/torch/lib/THC/generated/../generic/THCTensorMathPointwise.cu:246

The text was updated successfully, but these errors were encountered:

soumith · 2017-01-19T14:39:57Z

hey Brad. What GPU are you using?

brad0taylor · 2017-01-19T16:04:16Z

Hi - I'm using a Geforce GTX950M and I'm using Cuda 8.0. The GPU install works ok with Tensorflow and Theano. Also I have Cudnn installed. I'm fairly new to this, so if you can point me to any diags, I'd be happy to run them. Also the errror message is from the last line of code.

soumith · 2017-01-19T16:06:23Z

@ngimel I compile the binaries with 5.2+PTX, in that case it should work on 5.0 via JITting right? or is PTX only generated forward (like 5.2+PTX only generates for 5.2+ archs)?

ngimel · 2017-01-19T16:16:27Z

Jitting works only forward, so probably you should add compilation for 5.0.

soumith · 2017-01-19T16:24:23Z

oh ok good to know. I'll add 5.0 to the list as well.

@brad0taylor it's a screw up on my end. I'll fix it with issuing new binaries by tomorrow so that you can reinstall and it'll work. If you want to try things right away, please install via source: https://github.com/pytorch/pytorch/blob/master/README.md#from-source

brad0taylor · 2017-01-19T16:46:35Z

@soumith thanks very much. I'll try again tomorrow

soumith · 2017-01-23T05:10:32Z

has been fixed

…torch#8)

Should prevent crashes during NCCL initialization. If `data_parallel_tutorial.py` is executed without this option it would segfault in `ncclShmOpen` while executing `nn.DataParallel(model)` For posterity: ``` % nvidia-smi Fri Jun 16 20:46:45 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla M60 Off | 00000000:00:1B.0 Off | 0 | | N/A 41C P0 37W / 150W | 752MiB / 7680MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla M60 Off | 00000000:00:1C.0 Off | 0 | | N/A 36C P0 38W / 150W | 418MiB / 7680MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla M60 Off | 00000000:00:1D.0 Off | 0 | | N/A 41C P0 38W / 150W | 418MiB / 7680MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla M60 Off | 00000000:00:1E.0 Off | 0 | | N/A 35C P0 38W / 150W | 418MiB / 7680MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+ % NCCL_DEBUG=INFO python data_parallel_tutorial.py Let's use 4 GPUs! c825878acf65:32373:32373 [0] NCCL INFO cudaDriverVersion 12010 c825878acf65:32373:32373 [0] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0> c825878acf65:32373:32373 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation NCCL version 2.14.3+cuda11.7 c825878acf65:32373:32443 [0] NCCL INFO NET/IB : No device found. c825878acf65:32373:32443 [0] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0> c825878acf65:32373:32443 [0] NCCL INFO Using network Socket c825878acf65:32373:32445 [2] NCCL INFO Using network Socket c825878acf65:32373:32446 [3] NCCL INFO Using network Socket c825878acf65:32373:32444 [1] NCCL INFO Using network Socket c825878acf65:32373:32446 [3] NCCL INFO Trees [0] -1/-1/-1->3->2 [1] -1/-1/-1->3->2 c825878acf65:32373:32445 [2] NCCL INFO Trees [0] 3/-1/-1->2->1 [1] 3/-1/-1->2->1 c825878acf65:32373:32443 [0] NCCL INFO Channel 00/02 : 0 1 2 3 c825878acf65:32373:32444 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 c825878acf65:32373:32443 [0] NCCL INFO Channel 01/02 : 0 1 2 3 c825878acf65:32373:32443 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 Bus error (core dumped) (lldb) bt * thread #1, name = 'python', stop reason = signal SIGBUS * frame #0: 0x00007effcd6b0ded libc.so.6`__memset_avx2_erms at memset-vec-unaligned-erms.S:145 frame #1: 0x00007eff3985e425 libnccl.so.2`ncclShmOpen(char*, int, void**, void**, int) at shmutils.cc:52 frame #2: 0x00007eff3985e377 libnccl.so.2`ncclShmOpen(shmPath="/dev/shm/nccl-7dX4mg", shmSize=9637888, shmPtr=0x00007efe4a59ac30, devShmPtr=0x00007efe4a59ac38, create=1) at shmutils.cc:61 frame #3: 0x00007eff39863322 libnccl.so.2`::shmRecvSetup(comm=<unavailable>, graph=<unavailable>, myInfo=<unavailable>, peerInfo=<unavailable>, connectInfo=0x00007efe57fe3fe0, recv=0x00007efe4a05d2f0, channelId=0, connIndex=0) at shm.cc:110 frame #4: 0x00007eff398446a4 libnccl.so.2`ncclTransportP2pSetup(ncclComm*, ncclTopoGraph*, int, int*) at transport.cc:33 frame #5: 0x00007eff398445c0 libnccl.so.2`ncclTransportP2pSetup(comm=0x0000000062355ab0, graph=0x00007efe57fe6a40, connIndex=0, highestTransportType=0x0000000000000000) at transport.cc:89 frame #6: 0x00007eff398367cd libnccl.so.2`::initTransportsRank(comm=0x0000000062355ab0, commId=<unavailable>) at init.cc:790 frame #7: 0x00007eff398383fe libnccl.so.2`::ncclCommInitRankFunc(job_=<unavailable>) at init.cc:1089 frame #8: 0x00007eff3984de07 libnccl.so.2`ncclAsyncJobMain(arg=0x000000006476e6d0) at group.cc:62 frame #9: 0x00007effce0bf6db libpthread.so.0`start_thread + 219 frame #10: 0x00007effcd64361f libc.so.6`__GI___clone at clone.S:95 ```

soumith closed this as completed Jan 23, 2017

zhuhaozhe added a commit to zhuhaozhe/tutorials that referenced this issue Jun 12, 2023

using in our testing for speed up numbers and other statement fix (py…

cebd517

…torch#8)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda issue #8

cuda issue #8

brad0taylor commented Jan 19, 2017

soumith commented Jan 19, 2017

brad0taylor commented Jan 19, 2017 •

edited

soumith commented Jan 19, 2017

ngimel commented Jan 19, 2017

soumith commented Jan 19, 2017

brad0taylor commented Jan 19, 2017 •

edited

soumith commented Jan 23, 2017

cuda issue #8

cuda issue #8

Comments

brad0taylor commented Jan 19, 2017

soumith commented Jan 19, 2017

brad0taylor commented Jan 19, 2017 • edited

soumith commented Jan 19, 2017

ngimel commented Jan 19, 2017

soumith commented Jan 19, 2017

brad0taylor commented Jan 19, 2017 • edited

soumith commented Jan 23, 2017

brad0taylor commented Jan 19, 2017 •

edited

brad0taylor commented Jan 19, 2017 •

edited