New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Static initialization issue when building LibTorch (CPU) as static lib on Windows #83255
Comments
Thanks for that. I gave up on building it statically earlier. There's a few CUDA ops that don't get registered too. |
I haven't tried building with CUDA, yet. I might get around to it next week and check if this approach would fix those issues, as well. |
@cristianPanaite could you help @1enn0 merge the above PR? |
To be able to proceed with merging the PR you have to sign the EasyCLA. here you can notice that the EasyCLA is not covered. Keep in mind that you must use the same email you used to do this PR, and also set your email visibility as public (You can do that in Settings->Emails). Check these things before signing the CLA. After these steps, the PR should be ready to merge. |
@cristianPanaite I made a new PR #90133 for this. Could you help merge it? |
Fixes pytorch#83255 Code comes from pytorch#83258 after fixing merge conflicts. Pull Request resolved: pytorch#90133 Approved by: https://github.com/soumith, https://github.com/malfet
The `TORCH_LIBRARY_IMPL` registrations in `OpsImpl.cpp` needs to happen after `ProcessGroup` is registered as a torch class -- which happens in `Ops.cpp`. However, the order of the registrations is undefined between the two files. If the registration in `OpsImpl.cpp` runs before `Ops.cpp`, we get a crash at program launch similar to #83255 . This happens in our internal build. This PR moves `OpsImpl.cpp` to the end of `Oops.cpp`. Because according to the omniscient lord of chatGPT: <img width="600" alt="2022-12-04_19-25" src="https://user-images.githubusercontent.com/1381301/205542847-3535b319-3c2a-4e8e-bc11-27913f6afb39.png"> Pull Request resolved: #90149 Approved by: https://github.com/kwen2501, https://github.com/H-Huang, https://github.com/soumith
…)" This reverts commit 6a01394.
🐛 Describe the bug
When compiling LibTorch as a static lib (
BUILD_SHARED_LIBS=OFF
) on Windows 11, there is an issue with the static initialization. Here is a small example how to reproduce this issue:Create Conda env
Build LibTorch and e.g.
op_registration_test
For simplicity, we are building LibTorch (CPU only, without MKL etc.) and a single test target from the
Caffe2_CPU_TEST_SRCS
. I am using plain CMake, Ninja and MSVC compiler.When running
pytorch-build/bin/op_registration_test.exe
in the console, you will see no output. When having a closer look by opening the executable in Visual Studio and running it under the debugger, we get the error messageFor more detail, see the full call stack here.
The problem seems to be that some of the statically initialized classes (
LinearPackedParamsBase
andConvPackedParamsBase
to be specific) are not initialized before some of their usage in aTORCH_LIBRARY_IMPL
definition. These problematic sections are:aten/src/ATen/native/quantized/qlinear_unpack.cpp
aten/src/ATen/native/quantized/qconv_unpack.cpp
aten/src/ATen/native/quantized/cpu/qlinear_prepack.cpp
aten/src/ATen/native/quantized/cpu/qlinear_dynamic.cpp
aten/src/ATen/native/quantized/cpu/qlinear.cpp
aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_unpack.cpp
aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear_prepack.cpp
aten/src/ATen/native/ao_sparse/quantized/cpu/qlinear.cpp
Specifically calling the construct on first use initialization functions (
register_linear_params()
andregister_conv_params<>()
) from the corresponding namespace at the top of theTORCH_LIBRARY_IMPL
body in above files fixes the issue. I have a PR ready that can fix this but I am wondering if there might be a better solution to this problem 😃Versions
The
collect_env.py
script does not give any useful information here, as I am trying to build libtorch. Instead, I am including the summary output of cmake:cc @peterjc123 @mszhanyi @skyline75489 @nbcsm
The text was updated successfully, but these errors were encountered: