Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch 1.8.1 on conda is 1.27GB #687

Open
soumith opened this issue Apr 2, 2021 · 5 comments
Open

PyTorch 1.8.1 on conda is 1.27GB #687

soumith opened this issue Apr 2, 2021 · 5 comments
Labels
conda Conda related issues

Comments

@soumith
Copy link
Member

soumith commented Apr 2, 2021

Considering that on conda, we are linking with the conda-provided toolkit and the conda-provided MKL, it seems like there's a bug somewhere that our pytorch binaries on conda are so large.
please check.

@orionr
Copy link
Contributor

orionr commented Apr 5, 2021

cc @malfet and @seemethere

@malfet
Copy link
Contributor

malfet commented Apr 5, 2021

Update: we link statically with cudnn while shipping to conda, because neither https://anaconda.org/anaconda/cudnn nor https://anaconda.org/nvidia/cudnn has versions we depends on. (And CuDNN for 11.1 is much bigger than the one for 10.2)
Quick CUBIN sizes comparison:

$ ~/git/torch-builder/analytics/cubinsizes.py unp-10.2/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so 
Analyzing unp-10.2/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so
.nv_fatbin size 986.6MiB
  ptx_37: 189.6MiB
  sm_37: 72.5MiB
  sm_50: 139.9MiB
  sm_60: 147.6MiB
  sm_61: 137.5MiB
  sm_70: 151.0MiB
  sm_75: 134.0MiB
  sm_35: 14.5MiB
__nv_relfatbin size 395.6KiB
  ptx_37: 43.2KiB
  sm_37: 54.5KiB
  sm_50: 59.2KiB
  sm_60: 59.5KiB
  sm_61: 59.5KiB
  sm_70: 60.0KiB
  sm_75: 59.6KiB
$ ~/git/torch-builder/analytics/cubinsizes.py unp-11.1/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so 
Analyzing unp-11.1/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cu.so
.nv_fatbin size 1.2GiB
  ptx_37: 234.9MiB
  sm_37: 87.2MiB
  sm_50: 146.9MiB
  sm_60: 148.7MiB
  sm_61: 132.6MiB
  sm_70: 112.7MiB
  sm_75: 96.9MiB
  sm_80: 111.8MiB
  sm_86: 110.5MiB
__nv_relfatbin size 0.0B
$ ~/git/torch-builder/analytics/cubinsizes.py unp-11.1/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cpp.so 
Analyzing unp-11.1/lib/python3.7/site-packages/torch/lib/libtorch_cuda_cpp.so
.nv_fatbin size 663.0MiB
  ptx_37: 4.8MiB
  sm_37: 9.2MiB
  sm_50: 45.1MiB
  sm_60: 54.0MiB
  sm_61: 54.6MiB
  sm_70: 82.8MiB
  sm_75: 75.7MiB
  sm_80: 96.0MiB
  sm_86: 95.8MiB
  sm_35: 20.4MiB
  ptx_70: 124.7MiB
__nv_relfatbin size 576.4KiB
  ptx_37: 55.2KiB
  sm_37: 58.2KiB
  sm_50: 64.9KiB
  sm_60: 65.7KiB
  sm_61: 65.7KiB
  sm_70: 67.1KiB
  sm_75: 66.6KiB
  sm_80: 66.5KiB
  sm_86: 66.5KiB

@soumith
Copy link
Member Author

soumith commented Apr 5, 2021

in that case, if we are linking to system CuDNN, it has to be pruned first I guess.

@malfet
Copy link
Contributor

malfet commented Apr 5, 2021

@soumith we can prune CuDNN for 11.1, as it results in unusable library, see following comment, which reproduces the problem with CuBLAS, but CuDNN is similarly affected: pytorch/pytorch#53336 (comment)

@rgommers
Copy link
Contributor

cudnn in conda-forge is up-to-date and is currently being maintained by NVIDIA: https://anaconda.org/conda-forge/cudnn. As is cudatoolkit: https://github.com/conda-forge/cudatoolkit-feedstock. As a result, the conda-forge CUDA 11.2 packages for PyTorch are only 630 MB: https://anaconda.org/conda-forge/pytorch/files.

Given how much better conda-forge is maintained than defaults, and that it has significantly more users by now (I estimate 10x more, based on Python and NumPy download numbers), I think it's time to switch to relying on conda-forge.

@rgommers rgommers added the conda Conda related issues label Jul 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conda Conda related issues
Projects
None yet
Development

No branches or pull requests

4 participants