Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility patch for CUDA Toolkit 11.0 and PyTorch 1.8 #41

Merged
merged 1 commit into from
Jul 2, 2021

Conversation

yihuajack
Copy link
Contributor

@yihuajack yihuajack commented Jun 12, 2021

The changes in CMakeLists.txt is compatibility patch for CUDA Toolkit 11.0. Since 11.0, cub is already included in CUDA Toolkit, using custom cub will trigger an error off. This may fix #34.
For usage of ${CUDA_INCLUDE_DIRS}, ${CUDA_VERSION_MAJOR}, and VERSION_LESS refer to https://cmake.org/cmake/help/latest/module/FindCUDA.html

The changes in dreamplace/ops is compatibility patch for PyTorch 1.8.
For changes in torch.h the definition of function rfft() and irfft(), PyTorch 1.7 and 1.8 implements an at::fft namespace instead of at to be consistent with the style of NumPy referring to https://github.com/pytorch/pytorch/wiki/The-torch.fft-module-in-PyTorch-1.7, which recommends using at::view_as_complex, though the documentation of previous versions and PyTorch 1.7/1.8 does not clearly give the differences between the two versions of ffts and iffts. Note that for PyTorch 1.7 the previous ffts and iffts can work but are deprecated, and they are completely abandoned in PyTorch 1.8. New version of PyTorch also reminds that AT_ERROR is deprecated, so TORCH_CHECK_VALUE is used here instead. Due to the limit of std::value_or() (or c10::value_or()), when signal_ndim==1, if else block is needed.
However, in dct2_fft2.cpp and dct2_fft2_cuda.cpp, at::irfft receives the last argument as a <brace-enclosed initializer list>. This will be fine for previous versions because the argument is declared to be IntArrayRef signal_sizes (see torch/include/ATen/Functions.h), but newer functions, it is declared to be c10::optional<c10::IntArrayRef> (IntArrayRef is just ArrayRef<int64_t> or ArrayRef<long int>), whose constructors would fail to match the <brace-enclosed initializer list> (see pytorch/pytorch#43545). <brace-enclosed initializer list> has no type, so will not make sense for deduction (see https://en.cppreference.com/w/cpp/language/list_initialization). Using double braces will resolve that (see http://gavinchou.github.io/experience/summary/syntax/initializer_list, https://stackoverflow.com/questions/49261221/call-constructor-of-class-with-brace-enclosed-initilizer-list, and Scott Meyers' "Effective Modern C++" Item 7).

In torch.h, global_swap_cuda.cpp, and k_reorder_cuda.cpp, the changes of DREAMPLACE_DISPATCH_FLOATING_TYPES(TYPE, NAME, ...) and DISPATCH_CUSTOM_TYPES(TYPE, NAME, ...) is introduced since PyTorch 1.8.0, which can be checked here: pytorch/pytorch@v1.7.1...v1.8.0.

The changes in CMakeLists.txt is compatibility patch for CUDA Toolkit 11.0.
The changes in dreamplace/ops is compatibility patch for PyTorch 1.8.
@limbo018
Copy link
Owner

Hi Yihua, appreciate your efforts. Could you make sure the patch passes unittest/ops/dct_unittest.py? I tried and it failed to pass. It is required to pass this unit test before committing the patch.

@yihuajack
Copy link
Contributor Author

I'll work on this

@limbo018
Copy link
Owner

I'll work on this

Hi Yihua, I spent some time to fix the problems in your patch and merged it into the develop branch. It can pass the unit test and global placement now, while I still got problem in the detailed placement (runtime failure with CUDA 11, works fine with older CUDA versions).

You can check my fix in the develop branch. If no other issues are reported, I will merge into the master branch this week.

@yihuajack
Copy link
Contributor Author

Appreciate your effort! But on the upstream develop branch I got a weird problem that when generating dct.stamp it prompted an error of ModuleNotFoundError: no module named torch. This never happened before. I checked that CMake does use /usr/bin/python and I can import torch by /usr/bin/python in terminal. I can successfully manually run python ${SETUP_PY} build under dct. What's the possible reason for this? It's really troublesome...

@limbo018
Copy link
Owner

Hi, I have replaced all setup.py.in with a pure CMake system in the latest commits (521844) to the develop branch (This has been done internally for a while, but not updated to the public version.). It no longer relies on the stamping scheme. You can have a try and see whether you still encounter the problem.

@yihuajack
Copy link
Contributor Author

Glad to find that this version can be successfully built

@limbo018 limbo018 merged commit c43958f into limbo018:master Jul 2, 2021
@yihuajack yihuajack deleted the yihuajack/patch branch July 9, 2021 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Questions on updating to Latest CUDA/cub/thrust
2 participants