Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we use Conda environment for installing torch? #341

Closed
issactoast opened this issue Nov 1, 2020 · 14 comments
Closed

Can we use Conda environment for installing torch? #341

issactoast opened this issue Nov 1, 2020 · 14 comments

Comments

@issactoast
Copy link
Contributor

I am on Windows10 using WSL2, which requires CUDA 11.0.

I can install PyTorch using Conda environment and using WSL2 at the same time but can't use the torch in R. I think this lack of ability to combine virtual env for the torch in R blowing tons of possible users for the package.

@dfalbel
Copy link
Member

dfalbel commented Nov 1, 2020

Hi @issactoast

Currently we rely on LibTorch 1.5 which does not support CUDA 11.0, but the next version of torch will use LibTorch 1.7, so CUDA 11.0 will be supported.

Just to make sure I understand... are you suggesting that you should be able to conda install torch-r or that the R package should use the same LibTorch that is packaged with PyTorch?

@issactoast
Copy link
Contributor Author

Hello @dfalbel

Thanks for the quick response. Either way could solve the problem now, but enabling conda install r-torch looks quicker solution for Windows users who want to use GPU capability with conda env.

@dfalbel
Copy link
Member

dfalbel commented Nov 3, 2020

OK! I'll need some help from the community on that. I have never submitted R packages to conda and wound't know where to start, sorry!

@izahn
Copy link

izahn commented Nov 22, 2020

Building conda packages from CRAN is easy, basic instructions are at https://docs.conda.io/projects/conda-build/en/latest/user-guide/tutorials/build-r-pkgs.html

I built a conda package and uploaded to https://anaconda.org/izahn/r-torch

This package works well on my Arch Linux system, but fails on RHEL 7 with this error message:

Torch failed to start, restart your R session to try again. 
/opt/R/library/4.0/torch/deps/liblantern.so - /lib64/libm.so.6: version `GLIBC_2.23' not found
(required by /opt/R/library/4.0/torch/deps/./libtorch_cpu.so)

The conda installation has a copy of libm.so.6, but it seems that libtorch insists on using the one at /lib64/libm.so.6. Any ideas about how to help it find the libm.so.6 in the conda environment instead?

@dfalbel
Copy link
Member

dfalbel commented Nov 23, 2020

That's awesome @izahn ! Thanks!

Doesn't this: version GLIBC_2.23' not found` message means that we need an updated version of glibc in this environment? Searching for that message shows that updating glibc might solve it.

@izahn
Copy link

izahn commented Nov 23, 2020

@dfalbel the conda build system includes glibc (or at least libm.so.6); the problem is that torch::install_torch tries to use host system version at /lib64/libm.so.6 instead. So in a sense I have updated glibc, I just don't know how to tell torch::install_torch to use that updated version.

@skeydan
Copy link
Collaborator

skeydan commented Nov 24, 2020

have you tried setting LD_LIBRARY_PATH?

@izahn
Copy link

izahn commented Nov 24, 2020

I tried setting LD_LIBRARY_PATH, but that caused other errors during the build process itself (even before the install_torch() part).

This seems to be where conda packaging gets more complicated. The conda build system uses a sysroot, as described in https://docs.conda.io/projects/conda-build/en/latest/resources/define-metadata.html#host and conda/conda-build#3696. I'm a bit out of my depth here, but as far as I can figure building the torch R package uses the conda sysroot, but install_torch doesn't know about it and tries to use host system libraries.

Is it possible to build the torch libraries instead of installing the pre-built ones with install_torch()? I think (or at least hope) if we could do that the conda build system would kick in and use the correct libraries.

@dfalbel
Copy link
Member

dfalbel commented Nov 24, 2020

Yes, you can build libtorch with instructions here: https://github.com/pytorch/pytorch/blob/master/docs/libtorch.rst#building-libtorch-using-cmake

And lantern (the C interface to libtorch that we use in the R package) here: https://github.com/mlverse/torch/blob/master/tools/buildlantern.R

Maybe you could also point to the the lib included in the torch conda package (conda install torch) by setting the TORCH_HOME env var? As they might have fixed that somehow?

@izahn
Copy link

izahn commented Feb 16, 2021

OK, I've made some progress on this front and submitted a conda package recipe at conda-forge/staged-recipes#13992

Setting up CUDA packages for conda is more complicated, so this is CPU-only for now. I do hope to add CUDA support in the future.

Finally, I'd love some help maintaining the conda package, let me know if you are interested and I'll add you to the maintainers list.

@izahn
Copy link

izahn commented Feb 17, 2021

Further update -- I've given up for now on packaging torch for conda. Fundamentally conda doesn't want repackaged binaries, and the torch package doesn't make it easy to install without repackaged binaries., I fought with it for a while, but kept ending up with

> library('torch'); torch_tensor(1)
Warning message:
Torch failed to start, restart your R session to try again. /home/conda/staged-recipes/build_artifacts/r-torch_1613595975690/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehol/lib/R/library/torch/deps/liblantern.so - libc10.so: cannot open shared object file: No such file or directory 

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: cpp_torch_float32()
 2: initialize(...)
 3: torch_dtype$new(cpp_torch_float32())
 4: torch_float()
 5: methods$initialize(self, self$private, ...)
 6: Tensor$new(data, dtype, device, requires_grad, pin_memory)
 7: torch_tensor(1)
An irrecoverable exception occurred. R is aborting now ...
/home/conda/staged-recipes/build_artifacts/r-torch_1613595975690/test_tmp/run_test.sh: line 7:  4273 Segmentation fault      (core dumped) $R -e "library('torch'); torch_tensor(1)"

or similar.

@dfalbel
Copy link
Member

dfalbel commented Feb 17, 2021

Hi @izahn ,

Thanks for your efforts and sorry it didn't work!

torch package doesn't make it easy to install without repackaged binaries

What should we change in the R package to solve this? Is it related to separated compilation steps for libtorch and liblantern?
I am not sure if conda allows this, but in theory we could deliver both binaries in an inst/deps/ folder.

we could perhaps download the binaries in this script:

https://github.com/izahn/staged-recipes/blob/89827cfedbceec76e3ced3d937732ad6518dc642/recipes/r-torch/build.sh

and patch the .Rbuildignore to allow the binaries to be included in the built package.

Is it possible to see the logs for the builds?

@izahn
Copy link

izahn commented Feb 18, 2021

Hi @izahn ,

Thanks for your efforts and sorry it didn't work!

torch package doesn't make it easy to install without repackaged binaries

What should we change in the R package to solve this? Is it related to separated compilation steps for libtorch and liblantern?
I am not sure if conda allows this, but in theory we could deliver both binaries in an inst/deps/ folder.

I'm still relatively new to conda packaging and not totally sure how it works. The package building process definitely flags the pre-built libraries though. I tried telling it to ignore them in https://github.com/conda-forge/staged-recipes/pull/13992/files#diff-f21c0b2e0f37c9ea8dac5100f7bcecab20c39783571405ff1d7425d4beea380aR22, which kind of works. My (admittedly limited) understanding is that conda-forge wants to build everything so that everything is built with the same toolchain.

we could perhaps download the binaries in this script:

https://github.com/izahn/staged-recipes/blob/89827cfedbceec76e3ced3d937732ad6518dc642/recipes/r-torch/build.sh

and patch the .Rbuildignore to allow the binaries to be included in the built package.

Maybe, I don't know if that will help or not.

Is it possible to see the logs for the builds?

There are some older logs (probably not helpful, from before I realized I actually needed at least torch_tensor(1) in the tests) at
https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=278862&view=results . The "passing" tests there would have failed on torch_tensor(1) I'm pretty sure.

I re-started the CI so you can see the result of my latest effort over at https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=279239&view=logs&j=6f142865-96c3-535c-b7ea-873d86b887bd&t=22b0682d-ab9e-55d7-9c79-49f3c3ba4823

@issactoast
Copy link
Contributor Author

Close this. For future reference, you can use a torch with GPU support on WSL2 ubuntu 18.04.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants