Cuda9 updates #2263

csarofeen · 2017-07-31T18:39:43Z

Regrouped cuda9 fixes and cleaned branch so it wouldn't show other commits in PR.
Also added cudnn7 grouped convolution support, and a hgemm fix that was needed for cuda9 for pre-maxwell hardware.

CPU implementation of L_p feature pooling

GPU implementation of L_p feature pooling

Work around bug in msvc compiler in win32 mode

apaszke

I reviewed the parts I understood. I'll need to skim through cuDNN7 and nccl2 docs to check the rest

test/test_nn.py

-        output.backward(grad_output)
+        types = (torch.FloatTensor,)
+        if TEST_CUDA:
+            types += (torch.cuda.FloatTensor,)


torch/csrc/cudnn/Conv.cpp

@@ -238,11 +242,11 @@ struct algorithm_search<cudnnConvolutionBwdFilterAlgo_t> {
         CUDNN_CONVOLUTION_BWD_FILTER_ALGO_0,
         CUDNN_CONVOLUTION_BWD_FILTER_ALGO_1,
         CUDNN_CONVOLUTION_BWD_FILTER_ALGO_FFT,
+         CUDNN_CONVOLUTION_BWD_FILTER_ALGO_3,
+         CUDNN_CONVOLUTION_BWD_FILTER_ALGO_WINOGRAD_NONFUSED,


torch/csrc/cudnn/Conv.cpp

  if (groupIdx > 0) {
    long size = 1;
    for (int i = dim; i < tensor->nDimension; ++i) {
      size *= tensor->size[i];
    }
    ptr += elementSize * size * groupIdx / groups;
-  }
+    }


tpankaj · 2017-08-07T00:43:35Z

I got an error while compiling your pull request. It was at the very end, something about not finding HalfTensor. I can rerun it and get the full error message if this isn't a known issue.

csarofeen · 2017-08-07T15:49:23Z

@tpankaj Please run python setup.py clean, rm -rf build then try again. If the error persists, please post it here. Thanks!

tpankaj · 2017-08-10T04:53:30Z

The error persists. Here it is, compiled on an Ubuntu 16.04 system with CUDA 9 and CuDNN 7.

Grabbing  src/nccl.h                          > /home/ubuntu/pytorch/torch/lib/build/nccl/include/nccl.h
Compiling src/libwrap.cu                      > /home/ubuntu/pytorch/torch/lib/build/nccl/obj/libwrap.o
Compiling src/core.cu                         > /home/ubuntu/pytorch/torch/lib/build/nccl/obj/core.o
Compiling src/all_gather.cu                   > /home/ubuntu/pytorch/torch/lib/build/nccl/obj/all_gather.o
Compiling src/all_reduce.cu                   > /home/ubuntu/pytorch/torch/lib/build/nccl/obj/all_reduce.o
src/common_kernel.h(42): error: class "__half" has no member "x"

src/common_kernel.h(42): error: class "__half" has no member "x"

src/common_kernel.h(55): error: class "__half" has no member "x"

src/common_kernel.h(55): error: class "__half" has no member "x"

src/common_kernel.h(42): error: class "__half" has no member "x"

src/common_kernel.h(42): error: class "__half" has no member "x"

src/common_kernel.h(55): error: class "__half" has no member "x"

src/common_kernel.h(55): error: class "__half" has no member "x"

src/copy_kernel.h(28): error: class "__half" has no member "x"

src/copy_kernel.h(28): error: class "__half" has no member "x"

src/copy_kernel.h(28): error: class "__half" has no member "x"

src/copy_kernel.h(28): error: class "__half" has no member "x"

6 errors detected in the compilation of "/tmp/tmpxft_00003042_00000000-11_all_gather.compute_61.cpp1.ii".

ngimel · 2017-08-10T05:07:45Z

nccl subtree needs to be updated, cc @soumith, @apaszke. Alternatively, you can install nccl on your system (you need dev packages, so you'd have nccl header) nccl subtree won't be compiled.

TeslasGhost · 2017-08-10T05:46:05Z

Hi,

Not sure if this is this is the same problem as the abovementioned, but I attempt to build from source using CUDA9 and CUDNN 7. It seems to get pretty far in the build process, and then towards the end I am getting...

Scanning dependencies of target nccl
[100%] Generating lib/libnccl.so
Grabbing  src/nccl.h                          > /home/ubuntu/pytorch/torch/lib/build/nccl/include/nccl.h
Compiling src/libwrap.cu                      > /home/ubuntu/pytorch/torch/lib/build/nccl/obj/libwrap.o
Compiling src/core.cu                         > /home/ubuntu/pytorch/torch/lib/build/nccl/obj/core.o
Compiling src/all_gather.cu                   > /home/ubuntu/pytorch/torch/lib/build/nccl/obj/all_gather.o
Compiling src/all_reduce.cu                   > /home/ubuntu/pytorch/torch/lib/build/nccl/obj/all_reduce.o
src/common_kernel.h(42): error: class "__half" has no member "x"

src/common_kernel.h(42): error: class "__half" has no member "x"

src/common_kernel.h(55): error: class "__half" has no member "x"

src/common_kernel.h(55): error: class "__half" has no member "x"

src/copy_kernel.h(28): error: class "__half" has no member "x"

src/copy_kernel.h(28): error: class "__half" has no member "x"

src/common_kernel.h(42): error: class "__half" has no member "x"

src/common_kernel.h(42): error: class "__half" has no member "x"

src/common_kernel.h(55): error: class "__half" has no member "x"

src/common_kernel.h(55): error: class "__half" has no member "x"

src/copy_kernel.h(28): error: class "__half" has no member "x"

src/copy_kernel.h(28): error: class "__half" has no member "x"

6 errors detected in the compilation of "/tmp/tmpxft_00004d18_00000000-11_all_gather.compute_61.cpp1.ii".
Makefile:121: recipe for target '/home/ubuntu/pytorch/torch/lib/build/nccl/obj/all_gather.o' failed
make[3]: *** [/home/ubuntu/pytorch/torch/lib/build/nccl/obj/all_gather.o] Error 1
make[3]: *** Waiting for unfinished jobs....
6 errors detected in the compilation of "/tmp/tmpxft_00004d29_00000000-11_all_reduce.compute_61.cpp1.ii".
Makefile:121: recipe for target '/home/ubuntu/pytorch/torch/lib/build/nccl/obj/all_reduce.o' failed
make[3]: *** [/home/ubuntu/pytorch/torch/lib/build/nccl/obj/all_reduce.o] Error 1
ptxas warning : Too big maxrregcount value specified 96, will be ignored
ptxas warning : Too big maxrregcount value specified 96, will be ignored
CMakeFiles/nccl.dir/build.make:60: recipe for target 'lib/libnccl.so' failed
make[2]: *** [lib/libnccl.so] Error 2
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/nccl.dir/all' failed
make[1]: *** [CMakeFiles/nccl.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

Can someone kindly confirm that this is also a nccl subtree issue?
[EDIT]: It appears that it is. I have no idea how to install NCCL Header on my 'system', but I will look into it now. Any other help would be greatly appreciated. :)

tpankaj · 2017-08-10T05:47:14Z

@TeslasGhost That looks like the exact error I got.

tpankaj · 2017-08-10T05:51:35Z

@ngimel I'm continuing to get an error now that I've installed nccl and nccl dev package for CUDA 9:

CMake Error at /home/ubuntu/anaconda3/share/cmake-3.6/Modules/ExternalProject.cmake:1924 (message):
  No download info given for 'nccl_external' and its source directory:

   /home/ubuntu/pytorch/torch/lib/gloo/third-party/nccl

  is not an existing non-empty directory.  Please specify one of:

   * SOURCE_DIR with an existing non-empty directory
   * URL
   * GIT_REPOSITORY
   * HG_REPOSITORY
   * CVS_REPOSITORY and CVS_MODULE
   * SVN_REVISION
   * DOWNLOAD_COMMAND
Call Stack (most recent call first):
  /home/ubuntu/anaconda3/share/cmake-3.6/Modules/ExternalProject.cmake:2473 (_ep_add_download_command)
  cmake/External/nccl.cmake:16 (ExternalProject_Add)
  cmake/Dependencies.cmake:53 (include)
  CMakeLists.txt:49 (include)


-- Configuring incomplete, errors occurred!
See also "/home/ubuntu/pytorch/torch/lib/build/gloo/CMakeFiles/CMakeOutput.log".

Is there a source directory for that version of nccl that I need to point it to?

ngimel · 2017-08-10T16:24:26Z

Apparently Findnccl.cmake in gloo subtree is not finding your install of nccl. https://github.com/pytorch/pytorch/blob/master/torch/lib/gloo/cmake/Modules/Findnccl.cmake

csarofeen · 2017-08-10T17:12:08Z

I opened up an issue mentioning the steps required to use a user installed nccl, please see #2375

futurely · 2017-08-15T15:33:44Z

When will this be updated and merged? Cudnn7 grouped convolution support resolves the high priority issue #1708 and is independent of CUDA 9 updates. If the two feature sets are separated, both would be easier to be merged.

ngimel · 2017-08-15T16:01:59Z

Unfortunately it does not. For depthwise-separable convolutions this https://github.com/szagoruyko/pyinn is much better, and for other grouped convolutions current cudnn version provides only modest improvements.

wickedfoo and others added 6 commits July 12, 2017 14:21

cpu lp pooling

2520459

LP pooling kernels

feddb03

Added UpSampling module and associated tests.

4d45ce7

Merge pull request #1259 from wickedfoo/feature_lp_pooling

26a0b9a

CPU implementation of L_p feature pooling

Merge pull request #477 from wickedfoo/feature_lp_pooling

f194ac1

GPU implementation of L_p feature pooling

replace lon glong types with size_t (#1267)

e25b3d7

Work around bug in msvc compiler in win32 mode

apaszke reviewed Aug 1, 2017

View reviewed changes

soumith added 5 commits August 2, 2017 22:44

add 2d and 3d dilated full Convolution

6e6dca0

add 2d and 3d dilated full Convolution

a565b77

remove limitations on output_padding in Conv* routines

814b65d

remove limitations on output_padding in Conv* routines

74e5328

fix Conv3d non-contiguous weight bug

70c95db

tpankaj mentioned this pull request Aug 11, 2017

Unify checking for libnccl in setup.py and build_all.sh #2375

Closed

soumith force-pushed the cuda9 branch from 807fe27 to b6c9a4a Compare August 24, 2017 21:01

soumith approved these changes Aug 25, 2017

View reviewed changes

soumith force-pushed the cuda9 branch 2 times, most recently from cbf60dd to e024f41 Compare August 25, 2017 11:25

csarofeen added 2 commits August 25, 2017 07:27

Updates for CUDA 9

bc93d79

Updates for CUDA 9

d112cbd

soumith and others added 8 commits August 25, 2017 07:31

cuda 9 hgemm fix

01adebe

Updates for CUDA 9

ec86d0b

cudnn 7 grouped convolutions

51b6035

Disable persistent BN for cudnn < 7.0.3

802ddd9

Merge commit '01adebea1c0cb9aa704e50a9d14507b0fab5939f'

b3d2a35

fix leaking symbols from THNN

e4c05c2

Merge commit 'e4c05c2b5f3dbc121c0cf4bb78d15540412dcd3c'

ecc7579

Merge commit 'd112cbd7f675a8ffde3a8995ac37c69a4c84e5df'

0d7d79a

soumith force-pushed the cuda9 branch from e024f41 to 0d7d79a Compare August 25, 2017 11:40

soumith merged commit 0d7d79a into pytorch:master Aug 25, 2017

ezyang added the open source label Jun 24, 2019

csarofeen deleted the cuda9 branch February 12, 2020 13:32

IvanYashchuk pushed a commit to IvanYashchuk/pytorch that referenced this pull request Jan 5, 2023

Move IR printing definition to the class, step 2: Exprs (pytorch#2263)

797e396

Cuda9 updates #2263

Cuda9 updates #2263

Uh oh!

Conversation

csarofeen commented Jul 31, 2017

Uh oh!

apaszke left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

tpankaj commented Aug 7, 2017

Uh oh!

csarofeen commented Aug 7, 2017

Uh oh!

tpankaj commented Aug 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngimel commented Aug 10, 2017

Uh oh!

TeslasGhost commented Aug 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tpankaj commented Aug 10, 2017

Uh oh!

tpankaj commented Aug 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngimel commented Aug 10, 2017

Uh oh!

csarofeen commented Aug 10, 2017

Uh oh!

futurely commented Aug 15, 2017

Uh oh!

ngimel commented Aug 15, 2017

Uh oh!

Uh oh!

tpankaj commented Aug 10, 2017 •

edited

Loading

TeslasGhost commented Aug 10, 2017 •

edited

Loading

tpankaj commented Aug 10, 2017 •

edited

Loading