Error building k2 v1.13 with pytorch:22.01 #916

GNroy · 2022-02-09T06:33:55Z

I'm trying to build k2 v1.13 from source (python3 setup.py install) inside of pytorch:22.01 container (nvcr.io/nvidia/pytorch:22.01-py3) and I get the following error when compiling mutual_information.cu:

[100%] Building CUDA object k2/python/csrc/CMakeFiles/_k2.dir/torch/mutual_information_cuda.cu.o
/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(534): error: namespace "thrust::cub" has no member "CacheModifiedInputIterator"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(534): error: too few arguments for class template "thrust::detail::conditional"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(534): error: expected an identifier

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(536): error: expected a ";"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(576): error: namespace "thrust::cub" has no member "BlockLoad"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(576): error: expected a ";"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(593): error: namespace "thrust::cub" has no member "BlockStore"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(593): error: expected a ";"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(635): error: namespace "thrust::cub" has no member "PtxVersion"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(642): error: namespace "thrust::cub" has no member "SyncStream"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(647): error: namespace "thrust::cub" has no member "CTA_SYNC"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(663): error: namespace "thrust::cub" has no member "UnitWord"

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(663): error: expected an identifier

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(667): error: identifier "DeviceWord" is undefined

/usr/local/cuda/include/thrust/system/cuda/detail/core/util.h(670): error: "DeviceWord" is not a type name

15 errors detected in the compilation of "/tmp/pip-install-g7ig1yhx/k2_5d3497e7876e4f82bd5f05bd7bdc1677/k2/python/csrc/torch/mutual_information.cu".
make[3]: *** [k2/python/csrc/CMakeFiles/_k2.dir/build.make:216: k2/python/csrc/CMakeFiles/_k2.dir/torch/mutual_information.cu.o] Error 1

For the same setup, k2 v1.11 build succeeded.

P.S. sorry if the issue is not related to k2.

The text was updated successfully, but these errors were encountered:

danpovey · 2022-02-09T06:50:54Z

I suspect it's a mismatch between the CUDA on your path, which is a system-installed CUDA, and whatever that version of PyTorch was intended to be used with. I believe when we include PyTorch we also get CUDA headers, including those of cub, and this can cause problems if we don't have the exact same version as the NVCC we are using.

csukuangfj · 2022-02-09T06:56:59Z

I'm trying to build k2 v1.13 from source (python3 setup.py install) inside of pytorch:22.01 container

Could you provide some information about the container, e.g.

pytorch version
CUDA version
?

GNroy · 2022-02-09T07:03:28Z

Could you provide some information about the container, e.g.

pytorch version
CUDA version
?

CUDA 11.6.r11.6
PyTorch 1.11.0a0+bfe5ad2

(Edit) More info on https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel_22-01.html
(Edit2) print(torch.version.cuda) gives 11.6
(Edit3) /usr/local/cuda links to /etc/alternatives/cuda -> /usr/local/cuda-11.6

csukuangfj · 2022-02-09T07:14:09Z

I will try to use torch 1.11.0 + CUDA 11.5 to reproduce your issue and try to fix it.
(The nightly built wheel of pytorch https://download.pytorch.org/whl/nightly/torch_nightly.html supports only up to CUDA 11.5)

GNroy · 2022-02-09T09:02:24Z

@csukuangfj I just tried the torch 1.11.0 + CUDA 11.5 combination in pytorch:21.12 (previous container).
k2 v1.13 build succeeded in such setup.
This makes the issue either related to CUDA 11.6 or the pytorch:22.01 container itself.

To reproduce the issue:

docker run --gpus all -it nvcr.io/nvidia/pytorch:22.01-py3
K2_MAKE_ARGS="-j" pip install git+https://github.com/k2-fsa/k2@v1.13#egg=k2

GNroy mentioned this issue Feb 9, 2022

K2 losses NVIDIA/NeMo#3351

Merged

This was referenced Feb 9, 2022

Avoid using thrust:: directly, use THRUST_NS_QUALIFIER:: instead pytorch/pytorch#72582

Open

Fix building for CUDA 11.6 #917

Merged

csukuangfj closed this as completed in #917 Feb 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error building k2 v1.13 with pytorch:22.01 #916

Error building k2 v1.13 with pytorch:22.01 #916

GNroy commented Feb 9, 2022

danpovey commented Feb 9, 2022

csukuangfj commented Feb 9, 2022

GNroy commented Feb 9, 2022 •

edited

Loading

csukuangfj commented Feb 9, 2022

GNroy commented Feb 9, 2022

Error building k2 v1.13 with pytorch:22.01 #916

Error building k2 v1.13 with pytorch:22.01 #916

Comments

GNroy commented Feb 9, 2022

danpovey commented Feb 9, 2022

csukuangfj commented Feb 9, 2022

GNroy commented Feb 9, 2022 • edited Loading

csukuangfj commented Feb 9, 2022

GNroy commented Feb 9, 2022

GNroy commented Feb 9, 2022 •

edited

Loading