Skip to content

Conversation

peterbell10
Copy link
Collaborator

@peterbell10 peterbell10 commented Oct 17, 2021

Stack from ghstack:

This guts THCState to simply be an empty struct, as well as:

  • moving THCState_getPeerToPeerAccess and its cache into ATen.
  • cleaning up dead code in THCGeneral.cpp
  • moving THCudaInit and THCMagma_init into CUDAHooks::initCUDA

Differential Revision: D31721648

This guts `THCState` to simply be an empty struct, as well as:
- moving `THCState_getPeerToPeerAccess` and its cache into `ATen`.
- cleaning up dead code in `THCGeneral.cpp`
- moving `THCudaInit` and `THCMagma_init` into `CUDAHooks::initCUDA`

[ghstack-poisoned]
@pytorch-probot
Copy link

pytorch-probot bot commented Oct 17, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/pytorch/pytorch/blob/919050f67b85579ea8d33ec16a9d539b363fe8fa/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default,ciflow/cuda

Workflows Labels (bold enabled) Status
Triggered Workflows
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux ✅ triggered
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux ✅ triggered
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow ✅ triggered
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled ✅ triggered
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck ✅ triggered
periodic-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled ✅ triggered
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
puretorch-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Oct 17, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 919050f (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-xenial-cuda10.2-py3.6-gcc7 / test (multigpu, 1, 1, linux.16xlarge.nvidia.gpu) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2021-10-18T13:36:39.2683136Z AssertionError: RuntimeError not raised
2021-10-18T13:36:39.2673856Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 2898, in wrapper
2021-10-18T13:36:39.2674697Z     return func(*args, **kwargs)
2021-10-18T13:36:39.2675768Z   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 112, in wrapper
2021-10-18T13:36:39.2676657Z     return func(*args, **kwargs)
2021-10-18T13:36:39.2677620Z   File "/var/lib/jenkins/workspace/test/distributed/test_c10d_nccl.py", line 2397, in test_nccl_timeout
2021-10-18T13:36:39.2678834Z     process_group.allreduce(torch.rand(10).cuda(self.rank)).wait(timeout=timedelta(seconds=1))
2021-10-18T13:36:39.2679784Z   File "/opt/conda/lib/python3.6/unittest/case.py", line 203, in __exit__
2021-10-18T13:36:39.2680543Z     self._raiseFailure("{} not raised".format(exc_name))
2021-10-18T13:36:39.2681492Z   File "/opt/conda/lib/python3.6/unittest/case.py", line 135, in _raiseFailure
2021-10-18T13:36:39.2682335Z     raise self.test_case.failureException(msg)
2021-10-18T13:36:39.2683136Z AssertionError: RuntimeError not raised
2021-10-18T13:36:39.2683555Z 
2021-10-18T13:36:39.2683796Z 
2021-10-18T13:36:39.2684109Z 		
2021-10-18T13:36:39.2684669Z ✅ 181 Passed
2021-10-18T13:36:39.2685191Z 💨 11 Skipped
2021-10-18T13:36:39.2685723Z 🚨 1 Failed
2021-10-18T13:36:39.2901628Z ##[group]Run # Remove any previous test reports if they exist
2021-10-18T13:36:39.2902530Z �[36;1m# Remove any previous test reports if they exist�[0m
2021-10-18T13:36:39.2903483Z �[36;1mrm -f test-reports-*.zip�[0m
2021-10-18T13:36:39.2904162Z �[36;1mzip -r "test-reports-${FILE_SUFFIX}.zip" test -i '*.xml'�[0m

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@peterbell10
Copy link
Collaborator Author

@pytorchbot ciflow rerun -l ciflow/cuda

@ngimel
Copy link
Collaborator

ngimel commented Oct 17, 2021

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ngimel
Copy link
Collaborator

ngimel commented Oct 17, 2021

I'm getting the same error in internal builds

c10::Error: detail::magma_init_fnINTERNAL ASSERT FAILED at "caffe2/aten/src/ATen/cuda/detail/CUDAHooks.cpp":75, please report a bug to PyTorch. Cannot initilaize magma, init routine not set
Exception raised from initCUDA at caffe2/aten/src/ATen/cuda/detail/CUDAHooks.cpp:75 (most recent call first):
# 0  c10::get_backtrace[abi:cxx11](unsigned long, unsigned long, bool)
# 1  std::_Function_handler<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > (), c10::(anonymous namespace)::GetFetchStackTrace()::$_0>::_M_invoke(std::_Any_data const&)
# 2  c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)
# 3  c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 4  c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, char const*)
# 5  at::cuda::detail::CUDAHooks::initCUDA() const
# 6  at::Context::lazyInitCUDA()::{lambda()#1}::operator()() const
# 7  __pthread_once_slow
# 8  void std::call_once<at::Context::lazyInitCUDA()::{lambda()#1}>(std::once_flag&, at::Context::lazyInitCUDA()::{lambda()#1}&&)
# 9  at::Context::lazyInitCUDA()
# 10 at::(anonymous namespace)::(anonymous namespace)::wrapper__empty_strided(c10::ArrayRef<long>, c10::ArrayRef<long>, c10::optional<c10::ScalarType>, c10::optional<c10::Layout>, c10::optional<c10::Device>, c10::optional<bool>)

any advice why that would happen?

@peterbell10
Copy link
Collaborator Author

That means AT_MAGMA_ENABLED() is true but magma_init_fn hasn't been set. That must mean the initializer in cuda/BatchLinearAlgebra.cpp hasn't run yet. I'm not sure how that could happen though.

The old code just ignored it if this happened, so it's possible this was happening silently before.

@ngimel
Copy link
Collaborator

ngimel commented Oct 18, 2021

I've added prints to verify that initializer in cuda/BatchLinearAlgebra.cpp has run before the initialization is performed in CUDAHooks, and the function is correctly set in the initializer and can be called, but yet in CUDAHooks somehow it is unset again?
I didn't check if it was properly set before this PR.
Edit: checked that before this PR magma_init in THC was successfully called.

peterbell10 added a commit to peterbell10/pytorch that referenced this pull request Oct 18, 2021
This guts `THCState` to simply be an empty struct, as well as:
- moving `THCState_getPeerToPeerAccess` and its cache into `ATen`.
- cleaning up dead code in `THCGeneral.cpp`
- moving `THCudaInit` and `THCMagma_init` into `CUDAHooks::initCUDA`

ghstack-source-id: e3a38ee
Pull Request resolved: pytorch#66765
This guts `THCState` to simply be an empty struct, as well as:
- moving `THCState_getPeerToPeerAccess` and its cache into `ATen`.
- cleaning up dead code in `THCGeneral.cpp`
- moving `THCudaInit` and `THCMagma_init` into `CUDAHooks::initCUDA`

Differential Revision: [D31721648](https://our.internmc.facebook.com/intern/diff/D31721648)

[ghstack-poisoned]
@peterbell10
Copy link
Collaborator Author

My next best guess would be static initialization order issues. I've changed the variable from std::function to a static function pointer so it doesn't need a constructor.

@ngimel
Copy link
Collaborator

ngimel commented Oct 18, 2021

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ngimel
Copy link
Collaborator

ngimel commented Oct 18, 2021

This seems to be working on a local repro, I'll wait for all the tests to run and will land.

@facebook-github-bot
Copy link
Contributor

@ngimel merged this pull request in 8637556.

wconstab pushed a commit that referenced this pull request Oct 20, 2021
Summary:
Pull Request resolved: #66765

This guts `THCState` to simply be an empty struct, as well as:
- moving `THCState_getPeerToPeerAccess` and its cache into `ATen`.
- cleaning up dead code in `THCGeneral.cpp`
- moving `THCudaInit` and `THCMagma_init` into `CUDAHooks::initCUDA`

Test Plan: Imported from OSS

Reviewed By: zou3519

Differential Revision: D31721648

Pulled By: ngimel

fbshipit-source-id: 772b24787656a95f9e3fcb287d912b1c3400f32d
@facebook-github-bot facebook-github-bot deleted the gh/peterbell10/176/head branch October 22, 2021 14:17
yihuajack added a commit to yihuajack/OpenPCDet that referenced this pull request Jul 24, 2022
sshaoshuai pushed a commit to open-mmlab/OpenPCDet that referenced this pull request Aug 13, 2022
* feat: support torch>=1.11

Fix #900.
Support PyTorch version >= 1.11. Referring to pytorch/pytorch#66765 and https://github.com/pytorch/pytorch/wiki/TH-to-ATen-porting-guide.

* fix: Remove preproc torch version check macros
FeedOnMilkT pushed a commit to FeedOnMilkT/OpenPCDet-AttentionEnhanced that referenced this pull request Mar 17, 2025
* feat: support torch>=1.11

Fix #900.
Support PyTorch version >= 1.11. Referring to pytorch/pytorch#66765 and https://github.com/pytorch/pytorch/wiki/TH-to-ATen-porting-guide.

* fix: Remove preproc torch version check macros
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants