Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static linking master bug #21737

Closed
1 of 5 tasks
ezyang opened this issue Jun 13, 2019 · 20 comments
Closed
1 of 5 tasks

Static linking master bug #21737

ezyang opened this issue Jun 13, 2019 · 20 comments
Labels
module: build Build system issues module: static linking Related to statically linked libtorch (we dynamically link by default) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ezyang
Copy link
Contributor

ezyang commented Jun 13, 2019

Statically linking PyTorch doesn't work (well). This master bug is to track all of the individual bugs related to this problem, as well as record up-to-date instructions for how to do static linking on master.

Blockers:

Other issues:

How to create a statically linked executable as of 5a7e2cc aka Jun 13 (WORK IN PROGRESS: these instructions use hacks; as we fix the build process, these instructions will change. Our current general strategy will be to edit PyTorch's top-level CMakeLists.txt to add the build rules for the binary we want to build, and then build do a "normal" static library build.)

  1. Edit CMakeLists.txt in top-level PyTorch and add cmake rules for the binary you want to build. The rules I have quoted here contain workarounds for a number of bugs in our build:
add_executable(example-app "example-app.cpp")
target_link_libraries(example-app PRIVATE -Wl,--whole-archive -Wl,--no-as-needed torch caffe2::mkl )
set_property(TARGET example-app PROPERTY CXX_STANDARD 11)

A pretty good example-app.cpp might be:

#include <torch/torch.h>
#include <iostream>

int main() {
  torch::Tensor tensor = torch::rand({2, 3});
  std::cout << tensor << std::endl;
}

Also, apply this patch:

diff --git a/caffe2/CMakeLists.txt b/caffe2/CMakeLists.txt
index 39b685c5b..f21575ff6 100644
--- a/caffe2/CMakeLists.txt
+++ b/caffe2/CMakeLists.txt
@@ -631,7 +631,7 @@ if (NOT INTERN_BUILD_MOBILE OR NOT BUILD_CAFFE2_MOBILE)
 
 
   if (NOT NO_API)
-    target_include_directories(torch PRIVATE
+    target_include_directories(torch PUBLIC
       ${TORCH_SRC_DIR}/csrc/api
       ${TORCH_SRC_DIR}/csrc/api/include)
   endif()
@@ -689,9 +689,9 @@ IF (USE_TBB)
 ENDIF()
 
 
-  target_include_directories(torch PRIVATE ${ATen_CPU_INCLUDE})
+  target_include_directories(torch PUBLIC ${ATen_CPU_INCLUDE})
 
-  target_include_directories(torch PRIVATE
+  target_include_directories(torch PUBLIC
     ${TORCH_SRC_DIR}/csrc)
 
   target_include_directories(torch PRIVATE
  1. mkdir build and cd into it.
  2. Run cmake for a static build, disabling MKL-DNN which is known not to work: cmake -GNinja -DBUILD_SHARED_LIBS=OFF -DUSE_MKLDNN=OFF .. (if it can't run the code gen script, add -DPYTHON_EXECUTABLE=$(which python)) (make works too, but ninja is better from a recompilation avoidance perspective)
  3. Run ninja example-app
  4. Your binary now lives in build/bin/example-app. Here is my ldd output:
$ ldd build/bin/example-app 
        linux-vdso.so.1 (0x00007ffef5ddf000)
        libgomp.so.1 => /scratch/ezyang/pytorch-tmp-env/lib/libgomp.so.1 (0x00007f60f34f3000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f60dd1e6000)
        libgcc_s.so.1 => /scratch/ezyang/pytorch-tmp-env/lib/libgcc_s.so.1 (0x00007f60f34b0000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f60dcfe2000)
        libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f60dcdd7000)
        libnvToolsExt.so.1 => /public/apps/cuda/10.0/lib64/libnvToolsExt.so.1 (0x00007f60dcbce000)
        libmkl_intel_lp64.so => /scratch/ezyang/pytorch-tmp-env/lib/libmkl_intel_lp64.so (0x00007f60dc06f000)
        libmkl_gnu_thread.so => /scratch/ezyang/pytorch-tmp-env/lib/libmkl_gnu_thread.so (0x00007f60da824000)
        libmkl_core.so => /scratch/ezyang/pytorch-tmp-env/lib/libmkl_core.so (0x00007f60d65da000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f60d63bb000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f60d601d000)
        libmpi_cxx.so.20 => /usr/lib/x86_64-linux-gnu/libmpi_cxx.so.20 (0x00007f60d5e03000)
        libmpi.so.20 => /usr/lib/x86_64-linux-gnu/libmpi.so.20 (0x00007f60d5b11000)
        libcudart.so.10.0 => /public/apps/cuda/10.0/lib64/libcudart.so.10.0 (0x00007f60d5897000)
        libcusparse.so.10.0 => /public/apps/cuda/10.0/lib64/libcusparse.so.10.0 (0x00007f60d1e2f000)
        libcurand.so.10.0 => /public/apps/cuda/10.0/lib64/libcurand.so.10.0 (0x00007f60cdcc8000)
        libcufft.so.10.0 => /public/apps/cuda/10.0/lib64/libcufft.so.10.0 (0x00007f60c7814000)
        libcudnn.so.7 => /public/apps/cudnn/v7.4/cuda/lib64/libcudnn.so.7 (0x00007f60b289e000)
        libcublas.so.10.0 => /public/apps/cuda/10.0/lib64/libcublas.so.10.0 (0x00007f60ae308000)
        libstdc++.so.6 => /scratch/ezyang/pytorch-tmp-env/lib/libstdc++.so.6 (0x00007f60f3369000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f60adf17000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f60f32f4000)
        libopen-rte.so.20 => /usr/lib/x86_64-linux-gnu/libopen-rte.so.20 (0x00007f60adc8f000)
        libopen-pal.so.20 => /usr/lib/x86_64-linux-gnu/libopen-pal.so.20 (0x00007f60ad9dd000)
        libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f60ad7a0000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f60ad59d000)
        libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f60ad393000)
@ezyang ezyang added the module: static linking Related to statically linked libtorch (we dynamically link by default) label Jun 13, 2019
@ezyang
Copy link
Contributor Author

ezyang commented Jun 13, 2019

cc @marchss

@ezyang
Copy link
Contributor Author

ezyang commented Jun 13, 2019

cc @Palmitoxico

@ezyang
Copy link
Contributor Author

ezyang commented Jun 13, 2019

@jacobmou
Copy link

jacobmou commented Dec 17, 2019

Hi @ezyang , I'm wondering if this issue is fixed in 1.2 release?

I have experienced this issue in runtime after building libtorch in a totally different building environment with static linking. However, specifying dynamic linking works like a charm.

btw, the error message is like C++ exception with description "No functions are registered for schema aten::zeros(int[] size, *, ScalarType? dtype=None,.

@ezyang
Copy link
Contributor Author

ezyang commented Dec 18, 2019

We haven't explicitly fixed this, but the mobile work done recently has made the state of static linking better on master. I haven't checked how many of the issues I've listed here are still a problem.

The error message you're seeing looks consistent with the static initializers being pruned away. Try the suggested -Wl,--whole-archive -Wl,--no-as-needed I did above.

cc @dreiss @ljk53

@kikirizki
Copy link

kikirizki commented Jun 2, 2020

Any update on this ?

@Nintorac
Copy link
Contributor

Nintorac commented Jun 13, 2020

Hey,
I am having issues getting a static build to work
Not particularly experienced with CMake so maybe I am completely missing something.

Trying to build the static libraries from PyTorch 1.5rc1 release branch

I have added PyTorch as a submodule of my project and then updated the my CMakeLists.txt to contain the lines;

add_subdirectory(pytorch)
target_link_libraries(NeuralDX7 PRIVATE -Wl,--whole-archive -Wl,--no-as-needed torch caffe2::mkl)
set_property(TARGET NeuralDX7 PROPERTY CXX_STANDARD 11)

None of the lines mentioned in the diff of the OP of this issue were there so I ignored that step, maybe related to @ezyang's most recent comment in this issue?

Next I configure the build with

cmake -B build/ -GUnix\ Makefiles -DBUILD_SHARED_LIBS=OFF -DUSE_MKLDNN=OFF -DBUILD_PYTHON=OFF

which results in an error

-- Configuring incomplete, errors occurred!
See also "/home/nintorac/local_audio/NeuralDX7-plugin/build/CMakeFiles/CMakeOutput.log".
See also "/home/nintorac/local_audio/NeuralDX7-plugin/build/CMakeFiles/CMakeError.log".

I have uploaded the file /home/nintorac/local_audio/NeuralDX7-plugin/build/CMakeFiles/CMakeError.log here

It seems unable to find pthreads and AVX both of which I'm pretty sure should be available?
I am running Debian Buster with an Intel i7-4770HQ CPU.

@ezyang
Copy link
Contributor Author

ezyang commented Jun 15, 2020

Have you managed to successfully compile a non-statically-linked version of PyTorch? The logs you posted strongly suggest you are missing some relevant devel libraries.

@Nintorac
Copy link
Contributor

Nintorac commented Jun 16, 2020

Not yet, though a little further :)

I built a docker container to run the build and managed to get the configure to work within that.

I had to modify this line target_link_libraries(NeuralDX7 PRIVATE -Wl,--whole-archive -Wl,--no-as-needed torch caffe2::mkl) to be target_link_libraries(NeuralDX7 PRIVATE -Wl,--whole-archive -Wl,--no-as-needed torch caffe2).

Will changing caffe2::mkl->caffe2 result in a performance hit?

and then during build I am getting this error

Traceback (most recent call last):
  File "/opt/pytorch/caffe2/contrib/aten/gen_op.py", line 219, in <module>
    decls = yaml.load(read(os.path.join(args.yaml_dir, 'Declarations.yaml')), Loader=Loader)
  File "/opt/pytorch/caffe2/contrib/aten/gen_op.py", line 60, in read
    with open(filename, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/build/pytorch/caffe2/../aten/src/ATen/Declarations.yaml'
make[2]: *** [pytorch/caffe2/CMakeFiles/__aten_op_header_gen.dir/build.make:63: pytorch/caffe2/contrib/aten/aten_op.h] Error 1
make[1]: *** [CMakeFiles/Makefile2:3306: pytorch/caffe2/CMakeFiles/__aten_op_header_gen.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

Maybe related to #16444? I have tried re running cmake several times but it hasn't worked yet. Might have a go at using ninja instead of CMake tomorrow though I am not sure if this will be compatible with the JUCE stuff. Open to other suggestions though!

Thanks

@Nintorac
Copy link
Contributor

Nintorac commented Jun 16, 2020

Same issue with ninja

[0/2] Re-checking globbed directories...
[1/1241] Generating contrib/aten/aten_op.h
FAILED: pytorch/caffe2/contrib/aten/aten_op.h 
cd /opt/build/pytorch/caffe2 && /usr/bin/python3 /opt/pytorch/caffe2/contrib/aten/gen_op.py --aten_root=/opt/pytorch/caffe2/../aten --template_dir=/opt/pytorch/caffe2/contrib/aten --yaml_dir=/opt/build/pytorch/caffe2/../aten/src/ATen --install_dir=/opt/build/pytorch/caffe2/contrib/aten
Traceback (most recent call last):
  File "/opt/pytorch/caffe2/contrib/aten/gen_op.py", line 219, in <module>
    decls = yaml.load(read(os.path.join(args.yaml_dir, 'Declarations.yaml')), Loader=Loader)
  File "/opt/pytorch/caffe2/contrib/aten/gen_op.py", line 60, in read
    with open(filename, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/build/pytorch/caffe2/../aten/src/ATen/Declarations.yaml'
ninja: build stopped: subcommand failed.
nintorac@musicbeast:~/nintorac_audio/NeuralDX7-plugin$ docker run --rm -v $(pwd)/CMakeLists.txt:/opt/CMakeLists.txt -v $(pwd)/pytorch:/opt/pytorch -v $(pwd)/JUCE:/opt/JUCE -v $(pwd)/Source:/opt/Source -v $(pwd)/build:/opt/build -v $(pwd)/NeuralDX7.jit:/opt/NeuralDX7.jit builder cmake --build build -j1
[0/2] Re-checking globbed directories...
[1/1241] Generating contrib/aten/aten_op.h
FAILED: pytorch/caffe2/contrib/aten/aten_op.h 
cd /opt/build/pytorch/caffe2 && /usr/bin/python3 /opt/pytorch/caffe2/contrib/aten/gen_op.py --aten_root=/opt/pytorch/caffe2/../aten --template_dir=/opt/pytorch/caffe2/contrib/aten --yaml_dir=/opt/build/pytorch/caffe2/../aten/src/ATen --install_dir=/opt/build/pytorch/caffe2/contrib/aten
Traceback (most recent call last):
  File "/opt/pytorch/caffe2/contrib/aten/gen_op.py", line 219, in <module>
    decls = yaml.load(read(os.path.join(args.yaml_dir, 'Declarations.yaml')), Loader=Loader)
  File "/opt/pytorch/caffe2/contrib/aten/gen_op.py", line 60, in read
    with open(filename, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/build/pytorch/caffe2/../aten/src/ATen/Declarations.yaml'
ninja: build stopped: subcommand failed.

@ezyang
Copy link
Contributor Author

ezyang commented Jun 16, 2020

If you can't build a normal version of PyTorch then the problem is probably not static linking related. I'd post in https://discuss.pytorch.org/ with a more detailed description of how you are setting up your build environment and look for help. Normal build of PyTorch should be working without any problems and it is almost certainly an environment problem.

@Nintorac
Copy link
Contributor

Nintorac commented Jun 18, 2020

Normal build works fine, I have created a repo to reproduce the issue

Should I still move to the forum?

@kikirizki
Copy link

kikirizki commented Jun 18, 2020

@ezyang I have the same problem like @Nintorac

@powderluv
Copy link
Contributor

powderluv commented Jul 14, 2020

Top of master fails with a tensorpipe dependency on libuv:

-- Configuring done
CMake Error: install(EXPORT "tensorpipe-targets" ...) includes target "tensorpipe" which requires target "uv_a" that is not in any export set.

@powderluv
Copy link
Contributor

powderluv commented Jul 14, 2020

@ezyang here is a first pass fix to include pytorch with add_subdirectory(). There are few more fixes required and I will push them up next.

#41387

@b93901190
Copy link

b93901190 commented Jul 27, 2020

Top of master fails with a tensorpipe dependency on libuv:

-- Configuring done
CMake Error: install(EXPORT "tensorpipe-targets" ...) includes target "tensorpipe" which requires target "uv_a" that is not in any export set.

also have this issue when building libtorch from source,
does anyone have solution for this?

@gemfield
Copy link
Contributor

gemfield commented Sep 27, 2020

@b93901190 update your repo and try again:

git submodule sync
git submodule update --init --recursive

@wuziq
Copy link

wuziq commented Aug 11, 2021

@b93901190 update your repo and try again:

git submodule sync
git submodule update --init --recursive

I tried this for the v1.6.0 branch and I still get this error: CMake Error: install(EXPORT "tensorpipe-targets" ...) includes target "tensorpipe" which requires target "uv_a" that is not in any export set. Is there a workaround for v1.6.0?

@wuziq
Copy link

wuziq commented Aug 11, 2021

@b93901190 update your repo and try again:

git submodule sync
git submodule update --init --recursive

I tried this for the v1.6.0 branch and I still get this error: CMake Error: install(EXPORT "tensorpipe-targets" ...) includes target "tensorpipe" which requires target "uv_a" that is not in any export set. Is there a workaround for v1.6.0?

I tried -DUSE_TENSORPIPE:BOOL=OFF, which gets me past the configure step, but when I build, I get this error:

[ 74%] Building CXX object caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/pickler.cpp.o
In file included from /workspaces/Labs/torch/pytorch/torch/csrc/distributed/rpc/rref_context.h:8,
                 from /workspaces/Labs/torch/pytorch/torch/csrc/jit/serialization/pickler.cpp:4:
/workspaces/Labs/torch/pytorch/torch/csrc/distributed/rpc/utils.h:3:10: fatal error: tensorpipe/core/message.h: No such file or directory
    3 | #include <tensorpipe/core/message.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [caffe2/CMakeFiles/torch_cpu.dir/build.make:14673: caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/serialization/pickler.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:2961: caffe2/CMakeFiles/torch_cpu.dir/all] Error 2
make: *** [Makefile:146: all] Error 2

@yaadhavraajagility
Copy link

yaadhavraajagility commented Oct 14, 2021

#66640

I am able to build it, but I get linker errors on runtime with my application

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: build Build system issues module: static linking Related to statically linked libtorch (we dynamically link by default) triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants