Installer not setting rpath for MAGMA (OS X w/ GPU) #27409

elbamos · 2019-10-04T22:53:19Z

🐛 Bug

The installation scripts aren't adding the magma path to the dylib. This is at least as far back as 1.1, and exists in the current master.

It's easily fixable post-install with install_name_tool -add_rpath /usr/local/magma/lib /path/to/libtorch.dylib (actually in 1.1 its the caffe_gpu dylib), but of course this should be set properly by the installer.

To Reproduce

Steps to reproduce the behavior:

Compile 1.1 or later on OS X with GPU and MAGA support.
Launch python, import torch

Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 13:42:17)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Volumes/home500/anaconda/envs/pytorch1.2/lib/python3.6/site-packages/torch/__init__.py", line 81, in <module>
    from torch._C import *
ImportError: dlopen(/Volumes/home500/anaconda/envs/pytorch1.2/lib/python3.6/site-packages/torch/_C.cpython-36m-darwin.so, 9): Library not loaded: @rpath/libmagma.so
  Referenced from: /Volumes/home500/anaconda/envs/pytorch1.2/lib/python3.6/site-packages/torch/lib/libtorch.dylib
  Reason: image not found

Expected behavior

Not throw an exception, and instead return silently and run properly.

Environment

PyTorch version: 1.1.0
Is debug build: No
CUDA used to build PyTorch: 10.0

OS: Mac OSX 10.13.6
GCC version: Could not collect
CMake version: version 3.14.0

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: Could not collect
GPU models and configuration: GeForce GTX 1080 Ti
Nvidia driver version: 1.1.0
cuDNN version: Probably one of the following:
/usr/local/cuda/lib/libcudnn.7.dylib
/usr/local/cuda/lib/libcudnn_static.a

Versions of relevant libraries:
[pip3] numpy==1.16.4
[conda] blas 1.0 mkl
[conda] gpytorch 0.3.5 pypi_0 pypi
[conda] mkl 2019.4 233
[conda] mkl-include 2019.4 233
[conda] mkl-service 2.3.0 py36hfbe908c_0
[conda] mkl_fft 1.0.14 py36h5e564d8_0
[conda] mkl_random 1.1.0 py36ha771720_0
[conda] torch 1.1.0 pypi_0 pypi
[conda] torchfile 0.1.0 pypi_0 pypi
[conda] torchnet 0.0.4 pypi_0 pypi
[conda] torchtext 0.4.0 pypi_0 pypi
[conda] torchvision 0.4.0a0+d31eafa pypi_0 pypi

The text was updated successfully, but these errors were encountered:

pietern · 2019-10-08T12:40:23Z

Thanks for reporting. I suppose this is only an issue if MAGMA is installed in a non-system path?

Is this something you could submit a PR for?

cc @soumith for MAGMA

elbamos · 2019-10-09T20:54:53Z

Yes, I'm sure it wouldn't arise if MAGMA was installed in a system path. There isn't a package installer on OSX that supports MAGMA, which needs to get compiled against the system's CUDA anyway. MAGMA from source wants to install at /usr/local/magma.

The pytorch build process knows to look for, and properly finds, MAGMA at that path.

The pytorch build process has become so complex at this point, I'm reluctant to submit a PR that would touch it. Also, since not many of the recent master builds are passing CI, I wouldn't really have an effective way of testing the PR against platforms other than my own.

pietern · 2019-11-20T13:55:34Z

@soumith I think you're more familiar with magma et al. Who should take a look at this?

soumith · 2019-11-21T22:44:23Z

in terms of cmake / rpath, maybe @xuhdev would know.

xuhdev · 2019-11-21T22:52:26Z

Could you try from the latest source? A lot of things have changed since then, and I doubt whether it still exist in the latest version. For the old version, I don't think it hurts to stick to your workaround (i.e., install_name_tool -add_rpath /usr/local/magma/lib /path/to/libtorch.dylib).

elbamos · 2019-11-22T19:00:29Z

@xuhdev I just tested fea963d, and the issue is still there.

I think what's going on is that the installer expects magma to have been installed via a python package and therefore to be accessible from the python library path.

xuhdev · 2019-11-22T19:36:13Z

Thanks for the info. I'll try to look into this on Monday.

xuhdev · 2019-11-25T21:27:59Z

Did you install from the source? If so, would you mind showing the output of

grep MAGMA_LIBRARIES build/CMakeCache.txt

elbamos · 2019-12-01T22:45:45Z

@xuhdev

MAGMA_LIBRARIES:FILEPATH=/usr/local/magma/lib/libmagma.so

That is where they live.

xuhdev · 2019-12-02T21:18:17Z

(Sorry for asking more questions; Because I can't reproduce this issue, I have to rely on your info)

Could you show the path printed from otool -L /path/to/libtorch.dylib, both before and after you run the install_name_tool workaround?

elbamos · 2019-12-02T23:03:46Z

@xuhdev Hey I'm happy to help any way I can! (Sorry for the delay to your prior question - I was out of town on business.)

Here's what I get from a fresh compile:

/Volumes/home500/anaconda/envs/pytorch1.3/lib/python3.6/site-packages/torch/lib/libtorch.dylib:
	@rpath/libtorch.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libcudart.10.0.dylib (compatibility version 0.0.0, current version 10.0.130)
	@rpath/libiomp5.dylib (compatibility version 5.0.0, current version 5.0.0)
	@rpath/libmkl_intel_lp64.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libmkl_intel_thread.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libmkl_core.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.50.4)
	@rpath/libc10_cuda.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libnvrtc.10.0.dylib (compatibility version 0.0.0, current version 10.0.130)
	@rpath/libnvToolsExt.1.dylib (compatibility version 0.0.0, current version 1.0.0)
	@rpath/libcusparse.10.0.dylib (compatibility version 0.0.0, current version 10.0.130)
	@rpath/libcurand.10.0.dylib (compatibility version 0.0.0, current version 10.0.130)
	@rpath/libmagma.so (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libcudnn.7.dylib (compatibility version 0.0.0, current version 7.6.4)
	@rpath/libc10.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libcufft.10.0.dylib (compatibility version 0.0.0, current version 10.0.145)
	@rpath/libcublas.10.0.dylib (compatibility version 0.0.0, current version 10.0.130)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 400.9.0)

And after install_name_data, of course the otool output doesn't change:

/Volumes/home500/anaconda/envs/pytorch1.3/lib/python3.6/site-packages/torch/lib/libtorch.dylib:
	@rpath/libtorch.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libcudart.10.0.dylib (compatibility version 0.0.0, current version 10.0.130)
	@rpath/libiomp5.dylib (compatibility version 5.0.0, current version 5.0.0)
	@rpath/libmkl_intel_lp64.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libmkl_intel_thread.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libmkl_core.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1252.50.4)
	@rpath/libc10_cuda.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libnvrtc.10.0.dylib (compatibility version 0.0.0, current version 10.0.130)
	@rpath/libnvToolsExt.1.dylib (compatibility version 0.0.0, current version 1.0.0)
	@rpath/libcusparse.10.0.dylib (compatibility version 0.0.0, current version 10.0.130)
	@rpath/libcurand.10.0.dylib (compatibility version 0.0.0, current version 10.0.130)
	@rpath/libmagma.so (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libcudnn.7.dylib (compatibility version 0.0.0, current version 7.6.4)
	@rpath/libc10.dylib (compatibility version 0.0.0, current version 0.0.0)
	@rpath/libcufft.10.0.dylib (compatibility version 0.0.0, current version 10.0.145)
	@rpath/libcublas.10.0.dylib (compatibility version 0.0.0, current version 10.0.130)
	/usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 400.9.0)

xuhdev · 2019-12-02T23:32:36Z

Oops, I'm sorry, I meant otool -l /path/to/libtorch.dylib

elbamos · 2019-12-02T23:48:58Z

Thought you might...

Before:
before.txt

After:
after.txt

And the diff is:

4c4
<  0xfeedfacf 16777223          3  0x00           6    33       3224 0x00918085
---
>  0xfeedfacf 16777223          3  0x00           6    34       3264 0x00918085
494a495,498
> Load command 33
>           cmd LC_RPATH
>       cmdsize 40
>          path /usr/local/magma/lib/ (offset 12)

xuhdev · 2019-12-04T21:04:06Z

When you built PyTorch, did you have DYLD_LIBRARY_PATH, DYLD_FALLBACK_LIBRARY_PATH, or LIBRARY_PATH set? Is the path to libmagma.so in any of these variables?

elbamos · 2019-12-05T02:02:11Z

Nope, and nope.

[pytorch1.3] master(+1/-1)+* ± env | grep LIBRARY
CAML_LD_LIBRARY_PATH=/Users/aelberg/.opam/system/lib/stublibs:/usr/local/lib/ocaml/stublibs

xuhdev · 2019-12-05T21:15:23Z

I have no idea of what's going on in your situation. Your RPATH is empty upon built. I probably will revisit this after I have some other thoughts. Thanks for the past info though!

elbamos · 2019-12-05T21:49:48Z

Would a build log of some kind help?

…

On Dec 5, 2019, at 1:15 PM, Hong Xu ***@***.***> wrote: I have no idea of what's going on in your situation. Your RPATH is empty upon built. I probably will revisit this after I have some other thoughts. Thanks for the past info though! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

xuhdev · 2019-12-05T22:06:54Z

@elbamos Sure; Let's see whether we can sniff something there

elbamos · 2019-12-12T16:51:46Z

@xuhdev Here you go:
buildlog.zip

pietern added module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general module: macos Mac OS related issues triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Oct 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Installer not setting rpath for MAGMA (OS X w/ GPU) #27409

Installer not setting rpath for MAGMA (OS X w/ GPU) #27409

elbamos commented Oct 4, 2019

pietern commented Oct 8, 2019

elbamos commented Oct 9, 2019

pietern commented Nov 20, 2019

soumith commented Nov 21, 2019

xuhdev commented Nov 21, 2019

elbamos commented Nov 22, 2019

xuhdev commented Nov 22, 2019

xuhdev commented Nov 25, 2019

elbamos commented Dec 1, 2019

xuhdev commented Dec 2, 2019

elbamos commented Dec 2, 2019

xuhdev commented Dec 2, 2019

elbamos commented Dec 2, 2019

xuhdev commented Dec 4, 2019

elbamos commented Dec 5, 2019

xuhdev commented Dec 5, 2019

elbamos commented Dec 5, 2019 via email

xuhdev commented Dec 5, 2019

elbamos commented Dec 12, 2019

Installer not setting rpath for MAGMA (OS X w/ GPU) #27409

Installer not setting rpath for MAGMA (OS X w/ GPU) #27409

Comments

elbamos commented Oct 4, 2019

🐛 Bug

To Reproduce

Expected behavior

Environment

pietern commented Oct 8, 2019

elbamos commented Oct 9, 2019

pietern commented Nov 20, 2019

soumith commented Nov 21, 2019

xuhdev commented Nov 21, 2019

elbamos commented Nov 22, 2019

xuhdev commented Nov 22, 2019

xuhdev commented Nov 25, 2019

elbamos commented Dec 1, 2019

xuhdev commented Dec 2, 2019

elbamos commented Dec 2, 2019

xuhdev commented Dec 2, 2019

elbamos commented Dec 2, 2019

xuhdev commented Dec 4, 2019

elbamos commented Dec 5, 2019

xuhdev commented Dec 5, 2019

elbamos commented Dec 5, 2019 via email

xuhdev commented Dec 5, 2019

elbamos commented Dec 12, 2019