Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unify installation (attempt #2) #1129

Merged
merged 24 commits into from Jun 2, 2022
Merged

unify installation (attempt #2) #1129

merged 24 commits into from Jun 2, 2022

Conversation

kousu
Copy link
Contributor

@kousu kousu commented May 17, 2022

Checklist

GitHub

  • I've given this PR a concise, self-descriptive, and meaningful title
  • I've linked relevant issues in the PR body
  • I've applied the relevant labels to this PR
  • I've assigned a reviewer

PR contents

Description

Use the newish --extra-index that torch put up at https://download.pytorch.org/whl in preference to --find-links. This means a lot of the installation gymnastics can be streamlined.

Streamline the installation by moving all dependencies back into setup.py. This essentially redoes #1000 but without the disallowed direct .whl links or the pip install ivadomed[cpu] / pip install ivadomed[gpu] extras.

Effects:

Dropping requirements_gpu.txt is, IMO, the biggest change here. Instead of trying to second-guess torch, we will conform to their decision to put CPU-only Windows/Mac builds, but CUDA 10.* GPU Linux builds on PyPI.

The CUDA 10 GPU builds are a waste of space for the majority of Linux users (e.g.), while for a minority (i.e. us, when working on romane.neuro.polymtl.ca) they are broken because those users need CUDA 11 instead; on the other hand, the CPU builds mean Windows users can't train models if that's something they want to try to help out with.

But @kanishk16 noticed that torch provides an escape hatch to cover all these cases now, one that's better than us trying to handle it in our setup.py (i.e. #1000's experiment with pip install ivadomed[cpu] / pip install ivadomed[gpu]): torch hosts custom pip repos people can use like

pip install --extra-index-url https://download.pytorch.org/whl/cpu ivadomed # e.g. for linux users without GPUs
pip install --extra-index-url https://download.pytorch.org/whl/cu102 ivadomed # e.g. for Windows users with GPUs
pip install --extra-index-url https://download.pytorch.org/whl/cu113 ivadomed # e.g. for Linux users with very new GPUs

This PR clears the way for people to use those by getting ivadomed out of the business of assuming what hardware people have. torch's default (=PyPI) builds work for 99% of systems; people on the bleeding edge (notably: our romane.neuro.polymtl.ca system, or people who want to do deep learning natively on Windows) can adapt their pip.conf to it; but that's not and shouldn't be in ivadomed's scope. We will document "some amount" of this in followup that's already started: #1125 but, again, mostly we should stick close to torch's decisions to avoid the friction.

(please ignore the branch name; it was a naive name from the first version of this branch, something like 'torch-install' would have been better)

Linked issues

Resolves #996, Fixes #1130 as well as fixes #861 which is part of fixing spinalcordtoolbox/spinalcordtoolbox#3790.

To be followed by #1125.

@kousu kousu requested a review from kanishk16 May 17, 2022 05:46
@kousu kousu added dependencies Pull requests that update a dependency file installation category: installation-related stuff labels May 17, 2022
@kousu

This comment was marked as resolved.

@kanishk16

This comment was marked as resolved.

@kousu

This comment was marked as resolved.

Copy link
Contributor

@kanishk16 kanishk16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, we need to upgrade to torch at least 1.10 (would need to confirm the minimum version but definitely not 1.8.0) to be able to use the --extra-index-url flag

As of 17.05.2022:
image

image

@kousu
Copy link
Contributor Author

kousu commented May 17, 2022

Apparently, we need to upgrade to torch at least 1.10 (would need to confirm the minimum version but definitely not 1.8.0) to be able to use the --extra-index-url flag

As of 17.05.2022: image

image

Drat. Good point.

Maybe it's not so bad? I think while they only updated their documentation at that version, the older versions are available in this repo ("index") too.

It's not obvious from clicking on https://download.pytorch.org/whl/cu113 what's available, but I realized I could get pip to tell me what is there by giving it --index-url (to make it ignore PyPI) and telling it an intentionally bad version. Here's the results:

(test2) p115628@joplin:~$ pip install --index-url https://download.pytorch.org/whl/cu101 torch==1.7.7
Looking in indexes: https://download.pytorch.org/whl/cu101
ERROR: Could not find a version that satisfies the requirement torch==1.7.7 (from versions: 1.4.0, 1.5.0+cu101, 1.5.1+cu101, 1.6.0+cu101, 1.7.0+cu101, 1.7.1+cu101, 1.8.0+cu101, 1.8.1+cu101)
ERROR: No matching distribution found for torch==1.7.7
(test2) p115628@joplin:~$ pip install --index-url https://download.pytorch.org/whl/cu110 torch==1.7.7
Looking in indexes: https://download.pytorch.org/whl/cu110
ERROR: Could not find a version that satisfies the requirement torch==1.7.7 (from versions: 1.7.0+cu110, 1.7.1+cu110)
ERROR: No matching distribution found for torch==1.7.7
(test2) p115628@joplin:~$ pip install --index-url https://download.pytorch.org/whl/cu111 torch==1.7.7
Looking in indexes: https://download.pytorch.org/whl/cu111
ERROR: Could not find a version that satisfies the requirement torch==1.7.7 (from versions: 1.8.0+cu111, 1.8.1+cu111, 1.9.0+cu111, 1.9.1+cu111, 1.10.0+cu111, 1.10.1+cu111, 1.10.2+cu111)
ERROR: No matching distribution found for torch==1.7.7
(test2) p115628@joplin:~$ pip install --index-url https://download.pytorch.org/whl/cu112 torch==1.7.7
Looking in indexes: https://download.pytorch.org/whl/cu112
ERROR: Could not find a version that satisfies the requirement torch==1.7.7 (from versions: none)
ERROR: No matching distribution found for torch==1.7.7
(test2) p115628@joplin:~$ pip install --index-url https://download.pytorch.org/whl/cu113 torch==1.8.0
Looking in indexes: https://download.pytorch.org/whl/cu113
ERROR: Could not find a version that satisfies the requirement torch==1.8.0 (from versions: 1.10.0+cu113, 1.10.1+cu113, 1.10.2+cu113, 1.11.0+cu113)
ERROR: No matching distribution found for torch==1.8.0

So actually this form works:

(test2) p115628@joplin:~$ pip install --extra-index-url https://download.pytorch.org/whl/cu111 torch==1.8.1
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu111
Collecting torch==1.8.1
  Downloading https://download.pytorch.org/whl/cu111/torch-1.8.1%2Bcu111-cp38-cp38-linux_x86_64.whl (1982.2 MB)
     |████████████████████████████████| 1982.2 MB 6.9 kB/s 
Collecting typing-extensions
  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Collecting numpy
  Using cached numpy-1.22.3-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB)
Installing collected packages: typing-extensions, numpy, torch

So the trick is just that they don't have builds for every combination of CUDA/torch, only for the ones that overlapped in support in time. So I shouldn't have bumped this to CUDA 11.3 at the same time, actually; I'll undo that.

docs/source/installation.rst Outdated Show resolved Hide resolved
docs/source/installation.rst Outdated Show resolved Hide resolved
docs/source/installation.rst Outdated Show resolved Hide resolved
@kousu

This comment was marked as resolved.

requirements.txt Outdated Show resolved Hide resolved
kousu added a commit to spinalcordtoolbox/spinalcordtoolbox that referenced this pull request May 17, 2022
#3790 won't work
right until ivadomed/ivadomed#1129 is reviewed
and published; but this lets us at least test in the meantime.

This commit should be reverted before this branch is merged.
@coveralls
Copy link

coveralls commented May 19, 2022

Pull Request Test Coverage Report for Build 2426322688

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 68.853%

Totals Coverage Status
Change from base Build 2425390340: 0.0%
Covered Lines: 4094
Relevant Lines: 5946

💛 - Coveralls

@kanishk16
Copy link
Contributor

kanishk16 commented May 26, 2022

@kousu I packaged & published a vanilla pypi package TINX for an end to end sanity check.

TLDR; the requirements.txt used in TINX is:

torch==1.8.2
torchvision==0.9.2

Also, I wanted to try if we could unpin CUDA11. And, it kinda works:

installing TINX on CPU
(tinxtest) [kanis GitBox ]$ pip install TINX==0.0.1 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cpu
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/lts/1.8/cpu
Collecting TINX==0.0.1
  Using cached TINX-0.0.1-py3-none-any.whl (2.6 kB)
Collecting torch==1.8.2
  Downloading https://download.pytorch.org/whl/lts/1.8/cpu/torch-1.8.2%2Bcpu-cp39-cp39-win_amd64.whl (191.7 MB)
     |████████████████████████████████| 191.7 MB 120 kB/s
Collecting torchvision==0.9.2
  Downloading https://download.pytorch.org/whl/lts/1.8/cpu/torchvision-0.9.2%2Bcpu-cp39-cp39-win_amd64.whl (847 kB)
     |████████████████████████████████| 847 kB 6.4 MB/s
Collecting numpy
  WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.HTTPSConnection object at 0x0000023DCB30
6E50>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed')': /packages/99/2e/de8dcdbe1579b96f202685d8811d864291c374ea5fce13c63c70fdaf905d/numpy-1.22.4-cp39-cp39-win_amd64.whl
  Downloading numpy-1.22.4-cp39-cp39-win_amd64.whl (14.7 MB)
     |████████████████████████████████| 14.7 MB 297 kB/s
Collecting typing-extensions
  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Collecting pillow>=4.1.1
  Downloading Pillow-9.1.1-cp39-cp39-win_amd64.whl (3.3 MB)
     |████████████████████████████████| 3.3 MB 344 kB/s
Installing collected packages: typing-extensions, numpy, torch, pillow, torchvision, TINX
Successfully installed TINX-0.0.1 numpy-1.22.4 pillow-9.1.1 torch-1.8.2+cpu torchvision-0.9.2+cpu typing-extensions-4.2.0

installing TINX on CUDA10.2
(tinxtestcu102) [kanis GitBox ]$ conda list
# packages in environment at C:\Users\kanis\anaconda3\envs\tinxtestcu102:
#
# Name                    Version                   Build  Channel
ca-certificates           2022.4.26            haa95532_0
certifi                   2022.5.18.1      py39haa95532_0
openssl                   1.1.1o               h2bbff1b_0
pip                       21.2.4           py39haa95532_0
python                    3.9.12               h6244533_0
setuptools                61.2.0           py39haa95532_0
sqlite                    3.38.3               h2bbff1b_0
tzdata                    2022a                hda174b7_0
vc                        14.2                 h21ff451_1
vs2015_runtime            14.27.29016          h5e58377_2
wheel                     0.37.1             pyhd3eb1b0_0
wincertstore              0.2              py39haa95532_2
(tinxtestcu102) [kanis GitBox ]$ pip install TINX==0.0.1 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu102
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/lts/1.8/cu102
Collecting TINX==0.0.1
  Using cached TINX-0.0.1-py3-none-any.whl (2.6 kB)
Collecting torch==1.8.2
  Downloading https://download.pytorch.org/whl/lts/1.8/cu102/torch-1.8.2%2Bcu102-cp39-cp39-win_amd64.whl (1388.7 MB)
     |████████████████████████████████| 1388.7 MB 15 kB/s
Collecting torchvision==0.9.2
  Downloading https://download.pytorch.org/whl/lts/1.8/cu102/torchvision-0.9.2%2Bcu102-cp39-cp39-win_amd64.whl (1.6 MB)
     |████████████████████████████████| 1.6 MB 273 kB/s
Collecting typing-extensions
  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Collecting numpy
  Using cached numpy-1.22.4-cp39-cp39-win_amd64.whl (14.7 MB)
Collecting pillow>=4.1.1
  Using cached Pillow-9.1.1-cp39-cp39-win_amd64.whl (3.3 MB)
Installing collected packages: typing-extensions, numpy, torch, pillow, torchvision, TINX
Successfully installed TINX-0.0.1 numpy-1.22.4 pillow-9.1.1 torch-1.8.2+cu102 torchvision-0.9.2+cu102 typing-extensions-4.2.0
(tinxtestcu102) [kanis GitBox ]$ conda list
# packages in environment at C:\Users\kanis\anaconda3\envs\tinxtestcu102:
#
# Name                    Version                   Build  Channel
ca-certificates           2022.4.26            haa95532_0
certifi                   2022.5.18.1      py39haa95532_0
numpy                     1.22.4                   pypi_0    pypi
openssl                   1.1.1o               h2bbff1b_0
pillow                    9.1.1                    pypi_0    pypi
pip                       21.2.4           py39haa95532_0
python                    3.9.12               h6244533_0
setuptools                61.2.0           py39haa95532_0
sqlite                    3.38.3               h2bbff1b_0
tinx                      0.0.1                    pypi_0    pypi
torch                     1.8.2+cu102              pypi_0    pypi
torchvision               0.9.2+cu102              pypi_0    pypi
typing-extensions         4.2.0                    pypi_0    pypi
tzdata                    2022a                hda174b7_0
vc                        14.2                 h21ff451_1
vs2015_runtime            14.27.29016          h5e58377_2
wheel                     0.37.1             pyhd3eb1b0_0
wincertstore              0.2              py39haa95532_2

installing TINX on CUDA11.1
(base) [kanis GitBox ]$ conda activate tinxtestcu111
(tinxtestcu111) [kanis GitBox ]$ pip install TINX==0.0.1 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/lts/1.8/cu111
Collecting TINX==0.0.1
  Using cached TINX-0.0.1-py3-none-any.whl (2.6 kB)
Collecting torchvision==0.9.2
  Downloading https://download.pytorch.org/whl/lts/1.8/cu111/torchvision-0.9.2%2Bcu111-cp39-cp39-win_amd64.whl (1.9 MB)
     |████████████████████████████████| 1.9 MB 2.2 MB/s
Collecting torch==1.8.2
  Downloading https://download.pytorch.org/whl/lts/1.8/cu111/torch-1.8.2%2Bcu111-cp39-cp39-win_amd64.whl (3057.4 MB)
     |████████████████████████████████| 3057.4 MB 997 bytes/s
Collecting numpy
  Using cached numpy-1.22.4-cp39-cp39-win_amd64.whl (14.7 MB)
Collecting typing-extensions
  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Collecting pillow>=4.1.1
  Using cached Pillow-9.1.1-cp39-cp39-win_amd64.whl (3.3 MB)
Installing collected packages: typing-extensions, numpy, torch, pillow, torchvision, TINX
Successfully installed TINX-0.0.1 numpy-1.22.4 pillow-9.1.1 torch-1.8.2+cu111 torchvision-0.9.2+cu111 typing-extensions-4.2.0

Unfortunately, this doesn't work for CUDA11.3:

installing TINX on CUDA11.3
(tinxtestcu113) [kanis GitBox ]$ pip install TINX==0.0.1 --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu113
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/lts/1.8/cu113
Collecting TINX==0.0.1
  Using cached TINX-0.0.1-py3-none-any.whl (2.6 kB)
ERROR: Could not find a version that satisfies the requirement torch==1.8.2 (from tinx) (from versions: 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0)
ERROR: No matching distribution found for torch==1.8.2

Weirdly, torch 1.8.1 and 1.9.0 both work on CUDA10.2, CUDA11.1 and CUDA11.3.

installing torch on CUDA10.2
(cu102) [kanis GitBox ]$ pip install --extra-index-url https://download.pytorch.org/whl/cu102 torch==1.8.1
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu102
Collecting torch==1.8.1
  Downloading https://download.pytorch.org/whl/cu102/torch-1.8.1%2Bcu102-cp39-cp39-win_amd64.whl (1387.0 MB)
     |████████████████████████████████| 1387.0 MB 10 kB/s
Collecting numpy
  Using cached numpy-1.22.4-cp39-cp39-win_amd64.whl (14.7 MB)
Collecting typing-extensions
  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Installing collected packages: typing-extensions, numpy, torch
Successfully installed numpy-1.22.4 torch-1.8.1+cu102 typing-extensions-4.2.0xxx
installing torch on CUDA11.1
(cu111) [kanis GitBox ]$ pip install torch==1.8.1 --extra-index-url https://download.pytorch.org/whl/cu111
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu111
Collecting torch==1.8.1
  Using cached https://download.pytorch.org/whl/cu111/torch-1.8.1%2Bcu111-cp39-cp39-win_amd64.whl (3055.6 MB)
Collecting typing-extensions
  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Collecting numpy
  Using cached numpy-1.22.4-cp39-cp39-win_amd64.whl (14.7 MB)
Installing collected packages: typing-extensions, numpy, torch
Successfully installed numpy-1.22.4 torch-1.8.1+cu111 typing-extensions-4.2.0
installing torch on CUDA11.3
(cu113) [kanis GitBox ]$ pip install torch==1.8.1 --extra-index-url https://download.pytorch.org/whl/cu113
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113
Collecting torch==1.8.1
  Using cached torch-1.8.1-cp39-cp39-win_amd64.whl (190.5 MB)
Collecting typing-extensions
  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Collecting numpy
  Using cached numpy-1.22.4-cp39-cp39-win_amd64.whl (14.7 MB)
Installing collected packages: typing-extensions, numpy, torch
Successfully installed numpy-1.22.4 torch-1.8.1 typing-extensions-4.2.0

This is expected as PyTorch 1.8.2 doesn't build whls for CUDA11.3. Anyways, we could atleast unpin CUDA11. Since there aren't many changes required as such, I guess we could expand the scope of this PR to include that prospect as well. Hope that isn't an issue.

As far this PR is concerned, I'm inclined towards:

  • Unifying the installation process a bit
  • Unpinning CUDA11
  • Bumping up torch from 1.8.1 to 1.8.2 (LTS)

What do you think? If you agree, I'll update the PR with some extras_require for dev related dependencies from #1000 and then maybe you could test this PR on the neuropoly infra to ensure we aren't breaking anything.

@kousu
Copy link
Contributor Author

kousu commented May 26, 2022

As far this PR is concerned, I'm inclined towards:

* [x]  Unifying the installation process a bit

* [x]  Unpinning CUDA11

* [x]  ~Bumping up torch from 1.8.1 to 1.8.2 (LTS)~

What do you think? If you agree, I'll update the PR with some extras_require for dev related dependencies from #1000 and then maybe you could test this PR on the neuropoly infra to ensure we aren't breaking anything.

I think that's a great idea! spinalcordtoolbox/spinalcordtoolbox#3790 is currently pointed at this branch so you after you do your updates I should be able to just re-trigger our CI there to see how it behaves. Let me know when that's done :)

In the meantime I'm going to try out your TINX package on some different platforms to make sure I understand myself how torch's various packages behave.

@kousu
Copy link
Contributor Author

kousu commented May 26, 2022

You're great to work with @kanishk16. Inspired by your experiments above, I wrote a script that can scan what the torch repos currently publish. It uses the trick that --index-url will disable reading pypi.org, and the trick that pip will tell you what versions it considered if it fails -- and torch==9.9.9 will always fail.

for release_channel in "" lts/1.8/; do
  for cuda_major in 10 11; do
    for cuda_minor in `seq 0 7`; do
      (set -x
      pip install 'torch==9.9.9' --index-url https://download.pytorch.org/whl/${release_channel}cu${cuda_major}${cuda_minor} 2>&1) | egrep -o 'from versions: [^)]*' 
    done
  done
done

I tested it under different conda environments with different pythons installed:

(there's a lot of information here but I'll do my best to summarize it for you)

python3.10 on Linux
(py310) p115628@bireli:~$ for release_channel in "" lts/1.8/; do
  for cuda_major in 10 11; do
    for cuda_minor in `seq 0 7`; do
      (set -x
      pip install 'torch==9.9.9' --index-url https://download.pytorch.org/whl/${release_channel}cu${cuda_major}${cuda_minor} 2>&1) | egrep -o 'from versions: [^)]*' 
    done
  done
done
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu100
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu101
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu102
from versions: 1.11.0+cu102
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu103
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu104
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu105
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu106
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu107
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu110
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu111
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu112
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu113
from versions: 1.11.0+cu113
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu114
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu115
from versions: 1.11.0+cu115
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu116
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu117
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu100
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu101
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu102
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu103
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu104
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu105
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu106
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu107
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu110
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu111
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu112
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu113
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu114
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu115
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu116
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu117
from versions: none
python3.9 on Linux
(py39) p115628@bireli:~$ for release_channel in "" lts/1.8/; do
  for cuda_major in 10 11; do
    for cuda_minor in `seq 0 7`; do
      (set -x
      pip install 'torch==9.9.9' --index-url https://download.pytorch.org/whl/${release_channel}cu${cuda_major}${cuda_minor} 2>&1) | egrep -o 'from versions: [^)]*' 
    done
  done
done
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu100
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu101
from versions: 1.7.1+cu101, 1.8.0+cu101, 1.8.1+cu101
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu102
from versions: 1.7.1, 1.8.0, 1.8.1+cu102, 1.9.0+cu102, 1.9.1+cu102, 1.10.0+cu102, 1.10.1+cu102, 1.10.2+cu102, 1.11.0+cu102
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu103
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu104
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu105
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu106
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu107
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu110
from versions: 1.7.1+cu110
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu111
from versions: 1.8.0+cu111, 1.8.1+cu111, 1.9.0+cu111, 1.9.1+cu111, 1.10.0+cu111, 1.10.1+cu111, 1.10.2+cu111
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu112
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu113
from versions: 1.10.0+cu113, 1.10.1+cu113, 1.10.2+cu113, 1.11.0+cu113
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu114
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu115
from versions: 1.11.0+cu115
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu116
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu117
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu100
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu101
from versions: 1.8.2+cu101
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu102
from versions: 1.8.1+cu102, 1.8.2+cu102
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu103
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu104
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu105
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu106
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu107
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu110
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu111
from versions: 1.8.1+cu111, 1.8.2+cu111
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu112
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu113
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu114
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu115
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu116
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu117
from versions: none
python3.8 on Linux
(py38) p115628@bireli:~$ for release_channel in "" lts/1.8/; do
  for cuda_major in 10 11; do
    for cuda_minor in `seq 0 7`; do
      (set -x
      pip install 'torch==9.9.9' --index-url https://download.pytorch.org/whl/${release_channel}cu${cuda_major}${cuda_minor} 2>&1) | egrep -o 'from versions: [^)]*' 
    done
  done
done
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu100
from versions: 1.4.0+cu100
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu101
from versions: 1.4.0, 1.5.0+cu101, 1.5.1+cu101, 1.6.0+cu101, 1.7.0+cu101, 1.7.1+cu101, 1.8.0+cu101, 1.8.1+cu101
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu102
from versions: 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1+cu102, 1.9.0+cu102, 1.9.1+cu102, 1.10.0+cu102, 1.10.1+cu102, 1.10.2+cu102, 1.11.0+cu102
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu103
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu104
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu105
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu106
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu107
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu110
from versions: 1.7.0+cu110, 1.7.1+cu110
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu111
from versions: 1.8.0+cu111, 1.8.1+cu111, 1.9.0+cu111, 1.9.1+cu111, 1.10.0+cu111, 1.10.1+cu111, 1.10.2+cu111
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu112
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu113
from versions: 1.10.0+cu113, 1.10.1+cu113, 1.10.2+cu113, 1.11.0+cu113
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu114
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu115
from versions: 1.11.0+cu115
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu116
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu117
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu100
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu101
from versions: 1.8.2+cu101
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu102
from versions: 1.8.1+cu102, 1.8.2+cu102
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu103
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu104
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu105
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu106
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu107
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu110
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu111
from versions: 1.8.1+cu111, 1.8.2+cu111
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu112
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu113
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu114
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu115
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu116
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu117
from versions: none
pyrhon3.7 on Linux
(py37) p115628@bireli:~$ for release_channel in "" lts/1.8/; do
  for cuda_major in 10 11; do
    for cuda_minor in `seq 0 7`; do
      (set -x
      pip install 'torch==9.9.9' --index-url https://download.pytorch.org/whl/${release_channel}cu${cuda_major}${cuda_minor} 2>&1) | egrep -o 'from versions: [^)]*' 
    done
  done
done
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu100
from versions: 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0+cu100, 1.3.1+cu100, 1.4.0+cu100
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu101
from versions: 1.3.0, 1.3.1, 1.4.0, 1.5.0+cu101, 1.5.1+cu101, 1.6.0+cu101, 1.7.0+cu101, 1.7.1+cu101, 1.8.0+cu101, 1.8.1+cu101
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu102
from versions: 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1+cu102, 1.9.0+cu102, 1.9.1+cu102, 1.10.0+cu102, 1.10.1+cu102, 1.10.2+cu102, 1.11.0+cu102
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu103
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu104
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu105
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu106
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu107
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu110
from versions: 1.7.0+cu110, 1.7.1+cu110
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu111
from versions: 1.8.0+cu111, 1.8.1+cu111, 1.9.0+cu111, 1.9.1+cu111, 1.10.0+cu111, 1.10.1+cu111, 1.10.2+cu111
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu112
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu113
from versions: 1.10.0+cu113, 1.10.1+cu113, 1.10.2+cu113, 1.11.0+cu113
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu114
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu115
from versions: 1.11.0+cu115
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu116
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu117
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu100
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu101
from versions: 1.8.2+cu101
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu102
from versions: 1.8.1+cu102, 1.8.2+cu102
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu103
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu104
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu105
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu106
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu107
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu110
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu111
from versions: 1.8.1+cu111, 1.8.2+cu111
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu112
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu113
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu114
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu115
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu116
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu117
from versions: none
python3.6 on Linux
(py36) p115628@bireli:~$ for release_channel in "" lts/1.8/; do
  for cuda_major in 10 11; do
    for cuda_minor in `seq 0 7`; do
      (set -x
      pip install 'torch==9.9.9' --index-url https://download.pytorch.org/whl/${release_channel}cu${cuda_major}${cuda_minor} 2>&1) | egrep -o 'from versions: [^)]*' 
    done
  done
done
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu100
from versions: 1.0.0, 1.0.1, 1.0.1.post2, 1.1.0, 1.2.0, 1.3.0+cu100, 1.3.1+cu100, 1.4.0+cu100
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu101
from versions: 1.3.0, 1.3.1, 1.4.0, 1.5.0+cu101, 1.5.1+cu101, 1.6.0+cu101, 1.7.0+cu101, 1.7.1+cu101, 1.8.0+cu101, 1.8.1+cu101
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu102
from versions: 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1+cu102, 1.9.0+cu102, 1.9.1+cu102, 1.10.0+cu102, 1.10.1+cu102, 1.10.2+cu102
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu103
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu104
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu105
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu106
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu107
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu110
from versions: 1.7.0+cu110, 1.7.1+cu110
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu111
from versions: 1.8.0+cu111, 1.8.1+cu111, 1.9.0+cu111, 1.9.1+cu111, 1.10.0+cu111, 1.10.1+cu111, 1.10.2+cu111
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu112
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu113
from versions: 1.10.0+cu113, 1.10.1+cu113, 1.10.2+cu113
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu114
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu115
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu116
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/cu117
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu100
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu101
from versions: 1.8.2+cu101
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu102
from versions: 1.8.1+cu102, 1.8.2+cu102
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu103
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu104
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu105
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu106
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu107
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu110
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu111
from versions: 1.8.1+cu111, 1.8.2+cu111
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu112
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu113
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu114
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu115
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu116
from versions: none
+ pip install torch==9.9.9 --index-url https://download.pytorch.org/whl/lts/1.8/cu117
from versions: none

Also, if you install torch off pypi, these are the CUDA versions you get bundled:

python 3.10 on Linux: CUDA 10.2

(there's no 1.8 branch here)

(py310) p115628@bireli:~$ pip install torch
Collecting torch
  Downloading torch-1.11.0-cp310-cp310-manylinux1_x86_64.whl (750.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 750.6/750.6 MB 1.1 MB/s eta 0:00:00
Collecting typing-extensions
  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Installing collected packages: typing-extensions, torch
Successfully installed torch-1.11.0 typing-extensions-4.2.0
(py310) p115628@bireli:~$ python -c 'import torch; print(torch.version.cuda)'
/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/py310/lib/python3.10/site-packages/torch/_masked/__init__.py:223: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:68.)
  example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]])
10.2
(py310) p115628@bireli:~$ pip install 'torch~=1.8.0'
ERROR: Could not find a version that satisfies the requirement torch~=1.8.0 (from versions: 1.11.0)
ERROR: No matching distribution found for torch~=1.8.0
python 3.9 on Linux: CUDA 10.2

(on both latest and the 1.8 branches)

(py39) p115628@bireli:~$ pip install torch
Collecting torch
  Downloading torch-1.11.0-cp39-cp39-manylinux1_x86_64.whl (750.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 750.6/750.6 MB 1.7 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions in ./.conda/envs/py39/lib/python3.9/site-packages (from torch) (4.2.0)
Installing collected packages: torch
Successfully installed torch-1.11.0
(py39) p115628@bireli:~$ python -c 'import torch; print(torch.version.cuda)'
10.2
(py39) p115628@bireli:~$ pip install 'torch~=1.8.0'
Collecting torch~=1.8.0
  Using cached torch-1.8.1-cp39-cp39-manylinux1_x86_64.whl (804.1 MB)
Requirement already satisfied: numpy in ./.conda/envs/py39/lib/python3.9/site-packages (from torch~=1.8.0) (1.22.4)
Requirement already satisfied: typing-extensions in ./.conda/envs/py39/lib/python3.9/site-packages (from torch~=1.8.0) (4.2.0)
Installing collected packages: torch
  Attempting uninstall: torch
    Found existing installation: torch 1.11.0
    Uninstalling torch-1.11.0:
      Successfully uninstalled torch-1.11.0
Successfully installed torch-1.8.1
(py39) p115628@bireli:~$ python -c 'import torch; print(torch.version.cuda)'
10.2
python 3.8 on Linux: CUDA 10.2

(on both latest and the 1.8 branch)

(py38) p115628@bireli:~$ pip install torch
Collecting torch
  Downloading torch-1.11.0-cp38-cp38-manylinux1_x86_64.whl (750.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 750.6/750.6 MB 1.5 MB/s eta 0:00:00
Collecting typing-extensions
  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Installing collected packages: typing-extensions, torch
Successfully installed torch-1.11.0 typing-extensions-4.2.0
(py38) p115628@bireli:~$ python -c 'import torch; print(torch.version.cuda)'
/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/py38/lib/python3.8/site-packages/torch/_masked/__init__.py:223: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:68.)
  example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]])
10.2
(py38) p115628@bireli:~$ pip install 'torch~=1.8.0'
Collecting torch~=1.8.0
  Downloading torch-1.8.1-cp38-cp38-manylinux1_x86_64.whl (804.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 804.1/804.1 MB 1.6 MB/s eta 0:00:00
Collecting numpy
  Downloading numpy-1.22.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16.9/16.9 MB 9.3 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions in ./.conda/envs/py38/lib/python3.8/site-packages (from torch~=1.8.0) (4.2.0)
Installing collected packages: numpy, torch
  Attempting uninstall: torch
    Found existing installation: torch 1.11.0
    Uninstalling torch-1.11.0:
      Successfully uninstalled torch-1.11.0
Successfully installed numpy-1.22.4 torch-1.8.1
(py38) p115628@bireli:~$ python -c 'import torch; print(torch.version.cuda)'
10.2
python 3.7 on Linux: CUDA 10.2

(in both latest torch and the 1.8 branch)

(py37) p115628@bireli:~$ pip install torch
Collecting torch
  Downloading torch-1.11.0-cp37-cp37m-manylinux1_x86_64.whl (750.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 750.6/750.6 MB 1.5 MB/s eta 0:00:00
Collecting typing-extensions
  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Installing collected packages: typing-extensions, torch
Successfully installed torch-1.11.0 typing-extensions-4.2.0
(py37) p115628@bireli:~$ python -c 'import torch; print(torch.version.cuda)'
/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/py37/lib/python3.7/site-packages/torch/_masked/__init__.py:223: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:68.)
  example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]])
10.2
(py37) p115628@bireli:~$ pip install 'torch~=1.8'
Requirement already satisfied: torch~=1.8 in ./.conda/envs/py37/lib/python3.7/site-packages (1.11.0)
Requirement already satisfied: typing-extensions in ./.conda/envs/py37/lib/python3.7/site-packages (from torch~=1.8) (4.2.0)
(py37) p115628@bireli:~$ pip install 'torch~=1.8.0'
Collecting torch~=1.8.0
  Downloading torch-1.8.1-cp37-cp37m-manylinux1_x86_64.whl (804.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 804.1/804.1 MB 1.3 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions in ./.conda/envs/py37/lib/python3.7/site-packages (from torch~=1.8.0) (4.2.0)
Collecting numpy
  Downloading numpy-1.21.6-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (15.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.7/15.7 MB 3.4 MB/s eta 0:00:00
Installing collected packages: numpy, torch
  Attempting uninstall: torch
    Found existing installation: torch 1.11.0
    Uninstalling torch-1.11.0:
      Successfully uninstalled torch-1.11.0
Successfully installed numpy-1.21.6 torch-1.8.1
(py37) p115628@bireli:~$ python -c 'import torch; print(torch.version.cuda)'
10.2
python 3.6 on Linux: CUDA 10.1

(there's no latest nor 1.8 branch; but the most recent torch uses CUDA 10.1)

(py36) p115628@bireli:~$ pip install torch
Collecting torch
  Downloading torch-1.4.0-cp36-cp36m-manylinux1_x86_64.whl (753.4 MB)
     |████████████████████████████████| 753.4 MB 17 kB/s              
Installing collected packages: torch
Successfully installed torch-1.4.0
(py36) p115628@bireli:~$ pip install numpy
Collecting numpy
  Downloading numpy-1.19.5-cp36-cp36m-manylinux2010_x86_64.whl (14.8 MB)
     |████████████████████████████████| 14.8 MB 12.0 MB/s            
Installing collected packages: numpy
Successfully installed numpy-1.19.5
(py36) p115628@bireli:~$ python -c 'import torch; print(torch.version.cuda)'
10.1

Summary:

  • The only realistically (i.e. widely) supported CUDA versions are CUDA 10.1, CUDA 10.2, CUDA 11.1 and CUDA 11.3

    There's a few CUDA 10.0 builds: 1.3.0+cu100, 1.3.1+cu100, 1.4.0+cu100, and a couple CUDA 11.0 builds: 1.7.0+cu110, 1.7.1+cu110, and a single lone CUDA 11.5 build: 1.11.0+cu115

  • The CUDA 11.3 builds are for torch>=1.10 at the moment.

    I bet they'll backport it to torch==1.8.2+cu113 and maybe torch==1.8.1+cu113 at some point since that's their LTS branch, they just haven't done it yet.

  • python3.10 is not compatible with torch 1.8. Maybe they'll backport it, but for the moment we have to maintain

    python_requires='>=3.6,<3.10',

  • There are 1.8.2 builds on the LTS repos, the non-LTS repos go 1.8.0, 1.8.1 then skip to 1.9.0.

    I wonder what the reason for the inconsistency is. The only intentional policy I could see for this is: "1.8 is dead, you only get updates to it if you're using the LTS repo, and the only updates are minor bugfixes or security patches". But even in that case surely they could add a symlink into the LTS repo from the non-LTS repo to have them share a file, and then everyone gets the bugfixes? Weird.

  • torch 1.8 through 1.10 well support python3.6 through 3.9 -- and CUDA 10 through CUDA 11.

    So we could (in a future PR) think about unpinning torch too?

    But there is an inconsistency: CUDA 10.2 goes up to torch 1.11 but CUDA 11.1 only goes up to torch 1.10; so, unpinned, someone will who tried to get CUDA 11 might actually end up with a CUDA 10 because in python versions 1.10.2+cu111 < 1.11.0:

    CUDA 10.2 decides torch 1.11
    (py39) p115628@bireli:~$ pip install 'torch>=1.8' --extra-index-url https://download.pytorch.org/whl/cu102
    Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu102
    Collecting torch>=1.8
      Downloading https://download.pytorch.org/whl/cu102/torch-1.11.0%2Bcu102-cp39-cp39-linux_x86_64.whl (750.6 MB)
        ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/750.6 MB 10.3 MB/s eta 0:01:13
    
    CUDA 11.1 decides torch 1.11 from PyPI, which is actually CUDA 10.2
    (py39) p115628@bireli:~$ pip install 'torch>=1.8' --extra-index-url https://download.pytorch.org/whl/cu111
    Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu111
    Collecting torch>=1.8
      Using cached torch-1.11.0-cp39-cp39-manylinux1_x86_64.whl (750.6 MB)
    

    This is also a problem with 1.8, which is what you discovered above: because there's no torch==1.8.*+cu113 package, 1.8.1 on pypi wins:

    CUDA 11.3 + torch~=1.8.0 decides torch from PyPI, which is actually CUDA 10.2
    (py39) p115628@bireli:~$ pip install 'torch~=1.8.0' --extra-index-url https://download.pytorch.org/whl/cu113
    Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113
    Collecting torch~=1.8.0
      Using cached torch-1.8.1-cp39-cp39-manylinux1_x86_64.whl (804.1 MB)
    

    I think only way to avoid this is to cross reference the output above to make sure whatever instructions written into the ivadomed install instructions line up with the version of torch you're currently using. Because we're using 1.8.1, we can tell people that these are okay:

    pip install ivadomed --extra-index-url https://download.pytorch.org/whl/cpu
    
    pip install ivadomed --extra-index-url https://download.pytorch.org/whl/cu101 # but this is redundant
    
    pip install ivadomed --extra-index-url https://download.pytorch.org/whl/cu102 # this is redundant; these packages are already on PyPI
    
    pip install ivadomed --extra-index-url https://download.pytorch.org/whl/cu111
    
    pip install ivadomed --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu101 # but only if we change the pin to `torch~=1.8.1`, because there's currently only a 1.8.2 build here, weirdly; also see note about this being redundant
    
    pip install ivadomed --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu102 # ditto
    
    pip install ivadomed --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111
    

Ultimately we just want this to work for as many people as possible:

>>> torch.cuda.is_available()
True

if they have a GPU and (as in spinalcordtoolbox/spinalcordtoolbox#3790) to avoid the wasted download (--extra-index-url https://download.pytorch.org/whl/cpu) if they don't.

I don't know anything about Windows and GPUs, but as far as Linux goes, the builds on PyPI work for >90% of people, because they all come with CUDA 10.2 vendored inside -- that's why they're so large. It's only people with extremely recent GPUs that will need the helping hand of being told to use pip install ivadomed --extra-index-url https://download.pytorch.org/whl/cu111.

Okay, gonna post this now and then let it settle and then maybe think how all this information should affect the installation docs.

Cheers 🍷

@kousu
Copy link
Contributor Author

kousu commented May 26, 2022

This whole section

Screenshot 2022-05-26 at 17-38-10 Installation — ivadomed documentation

is backwards.

On this system, with GeForce GTX TITAN X cards, using PyPI's torch is fine:

bireli
p115628@bireli:~$ lspci -nn -vm -d 10DE::0300
Device: 01:00.0
Class:  VGA compatible controller [0300]
Vendor: NVIDIA Corporation [10de]
Device: GM200 [GeForce GTX TITAN X] [17c2]
SVendor:        eVga.com. Corp. [3842]
SDevice:        GM200 [GeForce GTX TITAN X] [2990]
PhySlot:        4
Rev:    a1
NUMANode:       0

Device: 02:00.0
Class:  VGA compatible controller [0300]
Vendor: NVIDIA Corporation [10de]
Device: GM200 [GeForce GTX TITAN X] [17c2]
SVendor:        NVIDIA Corporation [10de]
SDevice:        GM200 [GeForce GTX TITAN X] [1132]
PhySlot:        6
Rev:    a1
NUMANode:       0
p115628@bireli:~$ conda create -y -n py38- python==3.8
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.11.0
  latest version: 4.12.0

Please update conda by running

    $ conda update -n base conda



## Package Plan ##

  environment location: /home/GRAMES.POLYMTL.CA/p115628/.conda/envs/py38-

  added / updated specs:
    - python==3.8


The following NEW packages will be INSTALLED:

  _libgcc_mutex      conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge
  _openmp_mutex      conda-forge/linux-64::_openmp_mutex-4.5-2_gnu
  ca-certificates    conda-forge/linux-64::ca-certificates-2022.5.18.1-ha878542_0
  ld_impl_linux-64   conda-forge/linux-64::ld_impl_linux-64-2.36.1-hea4e1c9_2
  libffi             conda-forge/linux-64::libffi-3.2.1-he1b5a44_1007
  libgcc-ng          conda-forge/linux-64::libgcc-ng-12.1.0-h8d9b700_16
  libgomp            conda-forge/linux-64::libgomp-12.1.0-h8d9b700_16
  libstdcxx-ng       conda-forge/linux-64::libstdcxx-ng-12.1.0-ha89aaad_16
  libzlib            conda-forge/linux-64::libzlib-1.2.11-h166bdaf_1014
  ncurses            conda-forge/linux-64::ncurses-6.3-h27087fc_1
  openssl            conda-forge/linux-64::openssl-1.1.1o-h166bdaf_0
  pip                conda-forge/noarch::pip-22.1.1-pyhd8ed1ab_0
  python             conda-forge/linux-64::python-3.8.0-h357f687_5
  python_abi         conda-forge/linux-64::python_abi-3.8-2_cp38
  readline           conda-forge/linux-64::readline-8.1-h46c0cb4_0
  setuptools         conda-forge/linux-64::setuptools-62.3.2-py38h578d9bd_0
  sqlite             conda-forge/linux-64::sqlite-3.38.5-h4ff8645_0
  tk                 conda-forge/linux-64::tk-8.6.12-h27826a3_0
  wheel              conda-forge/noarch::wheel-0.37.1-pyhd8ed1ab_0
  xz                 conda-forge/linux-64::xz-5.2.5-h516909a_1
  zlib               conda-forge/linux-64::zlib-1.2.11-h166bdaf_1014



Downloading and Extracting Packages
ld_impl_linux-64-2.3 | 667 KB    | ############################################################################################################################ | 100% 
xz-5.2.5             | 343 KB    | ############################################################################################################################ | 100% 
readline-8.1         | 295 KB    | ############################################################################################################################ | 100% 
_libgcc_mutex-0.1    | 3 KB      | ############################################################################################################################ | 100% 
wheel-0.37.1         | 31 KB     | ############################################################################################################################ | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate py38-
#
# To deactivate an active environment, use
#
#     $ conda deactivate

p115628@bireli:~$ conda activate py38-
(py38-) p115628@bireli:~$ python -c 'import torch; print(torch.randn(30000, 10000, device=torch.device("cuda")))'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'torch'
(py38-) p115628@bireli:~$ pip install torch
Collecting torch
  Using cached torch-1.11.0-cp38-cp38-manylinux1_x86_64.whl (750.6 MB)
Collecting typing-extensions
  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Installing collected packages: typing-extensions, torch
Successfully installed torch-1.11.0 typing-extensions-4.2.0
(py38-) p115628@bireli:~$ python -c 'import torch; print(torch.cuda.get_device_capability())'
/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/py38-/lib/python3.8/site-packages/torch/_masked/__init__.py:223: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:68.)
  example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]])
(5, 2)
(py38-) p115628@bireli:~$ python -c 'import torch; print(torch.randn(30000, 10000, device=torch.device("cuda")))'
/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/py38-/lib/python3.8/site-packages/torch/_masked/__init__.py:223: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:68.)
  example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]])
tensor([[-0.1883, -0.6621,  0.5812,  ..., -0.9289,  0.1802,  0.5258],
        [-1.7286, -2.0483,  1.2767,  ...,  0.3962,  0.1931, -1.0421],
        [ 1.1171,  0.8416,  0.2220,  ...,  0.5828,  0.1077,  0.5009],
        ...,
        [ 0.1562, -1.7454, -0.9132,  ..., -0.6853,  0.6716,  1.1607],
        [ 0.6536, -1.3897,  0.6077,  ...,  2.9900,  0.0282,  1.6492],
        [-0.6518,  0.6744,  0.6370,  ...,  1.3030, -0.1539,  1.6168]],
       device='cuda:0')

That (5,2) refers to the compute score of 5.2 that's listed on that site:

Screenshot 2022-05-26 at 17-51-47 CUDA GPUs - Compute Capability

It's not okay on this other system with RTX A6000 cards:

romane
p115628@romane:~$ lspci -nn -vm -d 10DE::0300
Device: 01:00.0
Class:  VGA compatible controller [0300]
Vendor: NVIDIA Corporation [10de]
Device: GA102GL [RTX A6000] [2230]
SVendor:        NVIDIA Corporation [10de]
SDevice:        GA102GL [RTX A6000] [1459]
Rev:    a1

Device: 41:00.0
Class:  VGA compatible controller [0300]
Vendor: NVIDIA Corporation [10de]
Device: GA102GL [RTX A6000] [2230]
SVendor:        NVIDIA Corporation [10de]
SDevice:        GA102GL [RTX A6000] [1459]
Rev:    a1

Device: 81:00.0
Class:  VGA compatible controller [0300]
Vendor: NVIDIA Corporation [10de]
Device: GA102GL [RTX A6000] [2230]
SVendor:        NVIDIA Corporation [10de]
SDevice:        GA102GL [RTX A6000] [1459]
Rev:    a1

Device: c1:00.0
Class:  VGA compatible controller [0300]
Vendor: NVIDIA Corporation [10de]
Device: GA102GL [RTX A6000] [2230]
SVendor:        NVIDIA Corporation [10de]
SDevice:        GA102GL [RTX A6000] [1459]
Rev:    a1
p115628@romane:~$ conda create -n py38- -y python==3.8
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: done


==> WARNING: A newer version of conda exists. <==
  current version: 4.11.0
  latest version: 4.12.0

Please update conda by running

    $ conda update -n base conda



## Package Plan ##

  environment location: /home/GRAMES.POLYMTL.CA/p115628/.conda/envs/py38-

  added / updated specs:
    - python==3.8


The following NEW packages will be INSTALLED:

  _libgcc_mutex      conda-forge/linux-64::_libgcc_mutex-0.1-conda_forge
  _openmp_mutex      conda-forge/linux-64::_openmp_mutex-4.5-2_gnu
  ca-certificates    conda-forge/linux-64::ca-certificates-2022.5.18.1-ha878542_0
  ld_impl_linux-64   conda-forge/linux-64::ld_impl_linux-64-2.36.1-hea4e1c9_2
  libffi             conda-forge/linux-64::libffi-3.2.1-he1b5a44_1007
  libgcc-ng          conda-forge/linux-64::libgcc-ng-12.1.0-h8d9b700_16
  libgomp            conda-forge/linux-64::libgomp-12.1.0-h8d9b700_16
  libstdcxx-ng       conda-forge/linux-64::libstdcxx-ng-12.1.0-ha89aaad_16
  libzlib            conda-forge/linux-64::libzlib-1.2.11-h166bdaf_1014
  ncurses            conda-forge/linux-64::ncurses-6.3-h27087fc_1
  openssl            conda-forge/linux-64::openssl-1.1.1o-h166bdaf_0
  pip                conda-forge/noarch::pip-22.1.1-pyhd8ed1ab_0
  python             conda-forge/linux-64::python-3.8.0-h357f687_5
  python_abi         conda-forge/linux-64::python_abi-3.8-2_cp38
  readline           conda-forge/linux-64::readline-8.1-h46c0cb4_0
  setuptools         conda-forge/linux-64::setuptools-62.3.2-py38h578d9bd_0
  sqlite             conda-forge/linux-64::sqlite-3.38.5-h4ff8645_0
  tk                 conda-forge/linux-64::tk-8.6.12-h27826a3_0
  wheel              conda-forge/noarch::wheel-0.37.1-pyhd8ed1ab_0
  xz                 conda-forge/linux-64::xz-5.2.5-h516909a_1
  zlib               conda-forge/linux-64::zlib-1.2.11-h166bdaf_1014


Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate py38-
#
# To deactivate an active environment, use
#
#     $ conda deactivate

p115628@romane:~$ conda activate py38-
(py38-) p115628@romane:~$ pip install torch
Collecting torch
  Using cached torch-1.11.0-cp38-cp38-manylinux1_x86_64.whl (750.6 MB)
Collecting typing-extensions
  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Installing collected packages: typing-extensions, torch
Successfully installed torch-1.11.0 typing-extensions-4.2.0
(py38-) p115628@romane:~$ python -c 'import torch; print(torch.cuda.get_device_capability())'
/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/py38-/lib/python3.8/site-packages/torch/_masked/__init__.py:223: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:68.)
  example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]])
/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/py38-/lib/python3.8/site-packages/torch/cuda/__init__.py:145: UserWarning: 
NVIDIA RTX A6000 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA RTX A6000 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
(8, 6)
(py38-) p115628@romane:~$ python -c 'import torch; print(torch.randn(30000, 10000, device=torch.device("cuda")))'
/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/py38-/lib/python3.8/site-packages/torch/_masked/__init__.py:223: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:68.)
  example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]])
/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/py38-/lib/python3.8/site-packages/torch/cuda/__init__.py:145: UserWarning: 
NVIDIA RTX A6000 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA RTX A6000 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
  File "<string>", line 1, in <module>
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Screenshot 2022-05-26 at 17-57-08 CUDA GPUs - Compute Capability

is actually too new for CUDA 10.2:

The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA RTX A6000 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

i.e. it only goes up to compute capability 7.0, not up to 8.6

I did #951 8 months ago specifically so people here in the lab could make use of romane. And then I assume somehow that PR morphed into the install instructions saying "requires CUDA11".

To repeat myself, I have a better solution now, if we can use it: I've made sure that system has the latest torch with CUDA 11 is installed system-wide

https://github.com/neuropoly/computers/blob/6c65de3ec3b0668474382d6e25f13b414f01d3e5/ansible/roles/neuropoly.gpu/tasks/main.yml#L48-L51

so simply not being in a venv makes it work:

(py38-) p115628@romane:~$ conda deactivate
p115628@romane:~$ python -c 'import torch; print(torch.randn(30000, 10000, device=torch.device("cuda")))'
tensor([[ 0.4567,  0.4432, -1.3155,  ...,  0.5862,  0.6272,  0.0838],
        [-0.2455, -1.7399,  1.0717,  ..., -1.4235,  0.1820, -1.9642],
        [-1.1659, -0.6563,  1.7811,  ...,  1.6540,  0.2502, -0.8320],
        ...,
        [-0.0515,  0.7311, -1.4324,  ...,  1.0228,  0.3162, -1.0383],
        [-0.1775, -0.5633,  0.5970,  ...,  0.6630,  1.1929, -0.4902],
        [-0.1143, -0.1719, -0.1860,  ..., -0.0066,  0.7317,  0.6888]],
       device='cuda:0')
p115628@romane:~$ python -c 'import torch; print(torch.__version__)'
1.11.0+cu113

This doesn't quite work because ivadomed tells everyone to start by making an empty env, so my system-wide install gets ignored. But it's still what I want to encourage: people should just use what's on PyPI, and install it system-wide (or at most, pip install --user). It saves time, bandwidth, storage (a lot of storage), user frustration, and debugging. 90%+ of people will have a GPU supported by the torch that is sitting on PyPI. CUDA 10.2 is on PyPI because it's the most widely supported version of CUDA at the moment. romane is an extreme outlier, and exceptions like those need to have sysadmins that can make sure the right torch gets installed.

With that in mind, I just discovered a way to do that for us, here's my much better fix for romane: use /etc/pip.conf to make the repos apply to all envs, not just the system env:

(py38-) p115628@romane:~$ cat /etc/pip.conf 
[install]
extra-index-url = 
    https://download.pytorch.org/whl/cu111
    https://download.pytorch.org/whl/cu113

This is a system-specific patch, needed until CUDA 11 is common enough that it gets rolled into the packages on PyPI. It's something the person who owns the hardware needs to be aware of; ivadomed shouldn't be telling its users about this at all.

demo that this behaves
(py38-) p115628@romane:~$ pip uninstall -y torch
Found existing installation: torch 1.11.0
Uninstalling torch-1.11.0:
  Successfully uninstalled torch-1.11.0
(py38-) p115628@romane:~$ pip install torch
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu111, https://download.pytorch.org/whl/cu113
Collecting torch
  Downloading https://download.pytorch.org/whl/cu113/torch-1.11.0%2Bcu113-cp38-cp38-linux_x86_64.whl (1637.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 GB 3.0 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions in ./.conda/envs/py38-/lib/python3.8/site-packages (from torch) (4.2.0)
Installing collected packages: torch
Successfully installed torch-1.11.0+cu113
(py38-) p115628@romane:~$ python -c 'import torch; print(torch.randn(30000, 10000, device=torch.device("cuda")))'
/home/GRAMES.POLYMTL.CA/p115628/.conda/envs/py38-/lib/python3.8/site-packages/torch/_masked/__init__.py:223: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:68.)
  example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]])
tensor([[ 2.0732,  0.3504,  0.4453,  ..., -2.1887, -0.4107,  1.8200],
        [ 0.9452,  0.7018, -0.9494,  ..., -2.2149, -0.1643,  0.6272],
        [-0.1187, -0.0083,  0.4826,  ...,  0.4657, -1.0855, -0.6287],
        ...,
        [ 1.6792, -0.5234, -0.5107,  ...,  0.4370, -0.7986, -0.9397],
        [-0.2385, -1.0306,  0.7024,  ...,  0.8448, -1.2321, -1.9301],
        [ 0.8152, -0.6241, -1.0984,  ..., -1.3339,  0.6821, -1.3811]],
       device='cuda:0')

I'm going to apply that patch to romane and then we should back out #951 and the messes it caused from ivadomed, including all of install section 3 :P

Even better, if you give a broader set of repos, you cover most possible torch versions:

root@romane:~# cat /etc/pip.conf 
[install]
# the NVIDIA RTX A6000 GPUs in this system are *too new*:
# they are "compute capability" 8.6, which is greater than the 7.0-compatible
# CUDA 10.2 libraries that that torch's standard distribution bundles.
# But torch provides these alternate repos with CUDA 11.x versions,
# if you need them:
extra-index-url = 
    https://download.pytorch.org/whl/cu110
    https://download.pytorch.org/whl/cu111
    https://download.pytorch.org/whl/cu112
    https://download.pytorch.org/whl/cu113
    https://download.pytorch.org/whl/cu114
(py38-) p115628@romane:~$ pip install torch~=1.7.0
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu110, https://download.pytorch.org/whl/cu111, https://download.pytorch.org/whl/cu112, https://download.pytorch.org/whl/cu113, https://download.pytorch.org/whl/cu114
Collecting torch~=1.7.0
  Downloading https://download.pytorch.org/whl/cu110/torch-1.7.1%2Bcu110-cp38-cp38-linux_x86_64.whl (1156.8 MB)
     ━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/1.2 GB 117.2 MB/s eta 0:00:09
ERROR: Operation cancelled by user
(py38-) p115628@romane:~$ pip install torch~=1.8.0
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu110, https://download.pytorch.org/whl/cu111, https://download.pytorch.org/whl/cu112, https://download.pytorch.org/whl/cu113, https://download.pytorch.org/whl/cu114
Collecting torch~=1.8.0
  Downloading https://download.pytorch.org/whl/cu111/torch-1.8.1%2Bcu111-cp38-cp38-linux_x86_64.whl (1982.2 MB)
     ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/2.0 GB 69.7 MB/s eta 0:00:28^CERROR: Operation cancelled by user
     ━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/2.0 GB 69.7 MB/s eta 0:00:28
(py38-) p115628@romane:~$ pip install torch~=1.9.0
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu110, https://download.pytorch.org/whl/cu111, https://download.pytorch.org/whl/cu112, https://download.pytorch.org/whl/cu113, https://download.pytorch.org/whl/cu114
Collecting torch~=1.9.0
  Downloading https://download.pytorch.org/whl/cu111/torch-1.9.1%2Bcu111-cp38-cp38-linux_x86_64.whl (2041.3 MB)
     ━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/2.0 GB 21.1 MB/s eta 0:01:34
ERROR: Operation cancelled by user
(py38-) p115628@romane:~$ pip install torch~=1.10.0
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu110, https://download.pytorch.org/whl/cu111, https://download.pytorch.org/whl/cu112, https://download.pytorch.org/whl/cu113, https://download.pytorch.org/whl/cu114
Collecting torch~=1.10.0
  Downloading https://download.pytorch.org/whl/cu113/torch-1.10.2%2Bcu113-cp38-cp38-linux_x86_64.whl (1821.4 MB)
     ━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.1/1.8 GB 28.7 MB/s eta 0:01:00
ERROR: Operation cancelled by user
(py38-) p115628@romane:~$ pip install torch~=1.11.0
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu110, https://download.pytorch.org/whl/cu111, https://download.pytorch.org/whl/cu112, https://download.pytorch.org/whl/cu113, https://download.pytorch.org/whl/cu114
Collecting torch~=1.11.0
  Using cached https://download.pytorch.org/whl/cu113/torch-1.11.0%2Bcu113-cp38-cp38-linux_x86_64.whl (1637.0 MB)
Requirement already satisfied: typing-extensions in ./.conda/envs/py38-/lib/python3.8/site-packages (from torch~=1.11.0) (4.2.0)
Installing collected packages: torch
^CERROR: Operation cancelled by user

Notice how with this in place 1.7 -> CUDA 11.0, 1.8 -> CUDA 11.1, 1.9 -> CUDA 11.1, 1.10 -> CUDA 11.3, 1.11 -> CUDA 11.3.

kousu added a commit that referenced this pull request May 26, 2022
By using ~=, we get broader compatiblity: on systems using `--index-url https://download.pytorch.org/whl/lts/1.8/cu102` or `--index-url https://download.pytorch.org/whl/lts/1.8/cu111` people will get 1.8.2 instead of 1.8.1, and will continue to get patches into the future so long as 1.8.x is maintained.

See #1129 (comment) for the data that went into this.
@kousu
Copy link
Contributor Author

kousu commented May 26, 2022

Actually, @kanishk16, would it be okay if we merged this without editing the documentation? We can start a follow-up PR to strip down the install docs. The sooner this is merged the sooner I can also merge spinalcordtoolbox/spinalcordtoolbox#3790.

Feel free to work in your suggestions to setup.py. I just think that we shouldn't touch installation.md (and maybe I should even revert the changes I did make): technical writing is really time consuming to do well and will stall this PR for another couple weeks at least.

@kanishk16
Copy link
Contributor

kanishk16 commented May 27, 2022

Summary:

  • The only realistically (i.e. widely) supported CUDA versions are CUDA 10.1, CUDA 10.2, CUDA 11.1 and CUDA 11.3
    There's a few CUDA 10.0 builds: 1.3.0+cu100, 1.3.1+cu100, 1.4.0+cu100, and a couple CUDA 11.0 builds: 1.7.0+cu110, 1.7.1+cu110, and a single lone CUDA 11.5 build: 1.11.0+cu115

Out of all the widely supported CUDA versions, I'm inclined towards supporting CUDA 10.2, CUDA 11.1 and and CUDA 11.3.

  • The CUDA 11.3 builds are for torch>=1.10 at the moment.
    I bet they'll backport it to torch==1.8.2+cu113 and maybe torch==1.8.1+cu113 at some point since that's their LTS branch, they just haven't done it yet.

Maybe or maybe not... With the release of 1.11.0, I'm afraid they could even change their LTS to 1.10.0 as it supports CUDA 10.2, CUDA 11.1 and CUDA 11.3 🤷‍♂️

  • python3.10 is not compatible with torch 1.8. Maybe they'll backport it, but for the moment we have to maintain
    python_requires='>=3.6,<3.10',
  • There are 1.8.2 builds on the LTS repos, the non-LTS repos go 1.8.0, 1.8.1 then skip to 1.9.0.
    I wonder what the reason for the inconsistency is. The only intentional policy I could see for this is: "1.8 is dead, you only get updates to it if you're using the LTS repo, and the only updates are minor bugfixes or security patches". But even in that case surely they could add a symlink into the LTS repo from the non-LTS repo to have them share a file, and then everyone gets the bugfixes? Weird.

Yup even I observed the same.

  • torch 1.8 through 1.10 well support python3.6 through 3.9 -- and CUDA 10 through CUDA 11.
    So we could (in a future PR) think about unpinning torch too?

Definitely loosening up torch a little more.

But there is an inconsistency: CUDA 10.2 goes up to torch 1.11 but CUDA 11.1 only goes up to torch 1.10; so, unpinned, someone will who tried to get CUDA 11 might actually end up with a CUDA 10 because in python versions 1.10.2+cu111 < 1.11.0:
CUDA 10.2 decides torch 1.11

(py39) p115628@bireli:~$ pip install 'torch>=1.8' --extra-index-url https://download.pytorch.org/whl/cu102
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu102
Collecting torch>=1.8
  Downloading https://download.pytorch.org/whl/cu102/torch-1.11.0%2Bcu102-cp39-cp39-linux_x86_64.whl (750.6 MB)
    ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.6/750.6 MB 10.3 MB/s eta 0:01:13

CUDA 11.1 decides torch 1.11 from PyPI, which is actually CUDA 10.2

(py39) p115628@bireli:~$ pip install 'torch>=1.8' --extra-index-url https://download.pytorch.org/whl/cu111
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu111
Collecting torch>=1.8
  Using cached torch-1.11.0-cp39-cp39-manylinux1_x86_64.whl (750.6 MB)

Looks like it's installing torch 1.11 but cpu version

This is also a problem with 1.8, which is what you discovered above: because there's no torch==1.8.*+cu113 package, 1.8.1 on pypi wins:
CUDA 11.3 + torch~=1.8.0 decides torch from PyPI, which is actually CUDA 10.2

(py39) p115628@bireli:~$ pip install 'torch~=1.8.0' --extra-index-url https://download.pytorch.org/whl/cu113
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113
Collecting torch~=1.8.0
  Using cached torch-1.8.1-cp39-cp39-manylinux1_x86_64.whl (804.1 MB)

Even here, if I'm not wrong it appears it's installing cpu version

I think only way to avoid this is to cross reference the output above to make sure whatever instructions written into the ivadomed install instructions line up with the version of torch you're currently using. Because we're using 1.8.1, we can tell people that these are okay:

pip install ivadomed --extra-index-url https://download.pytorch.org/whl/cpu

pip install ivadomed --extra-index-url https://download.pytorch.org/whl/cu101 # but this is redundant
pip install ivadomed --extra-index-url https://download.pytorch.org/whl/cu102 # this is redundant; these packages are already on PyPI

I'm not sure if I follow, perhaps I'm missing something but:

(pypicu102) [kanis GitBox ]$ pip install 'torch~=1.8.0'
Collecting torch~=1.8.0
  Using cached torch-1.8.1-cp39-cp39-win_amd64.whl (190.5 MB)
Collecting typing-extensions
  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Collecting numpy
  Using cached numpy-1.22.4-cp39-cp39-win_amd64.whl (14.7 MB)
Installing collected packages: typing-extensions, numpy, torch
Successfully installed numpy-1.22.4 torch-1.8.1 typing-extensions-4.2.0
(pypicu102) [kanis GitBox ]$ python -c 'import torch; print(torch.version.cuda)'
None
(pypicu102) [kanis GitBox ]$ python -c 'import torch; print(torch.cuda.is_available())'
False
(cu102) [kanis GitBox ]$ pip install 'torch~=1.8.0' --extra-index-url https://download.pytorch.org/whl/cu102
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu102
Collecting torch~=1.8.0
  Using cached https://download.pytorch.org/whl/cu102/torch-1.8.1%2Bcu102-cp39-cp39-win_amd64.whl (1387.0 MB)
Collecting numpy
  Using cached numpy-1.22.4-cp39-cp39-win_amd64.whl (14.7 MB)
Requirement already satisfied: typing-extensions in c:\users\kanis\anaconda3\envs\cu102\lib\site-packages (from torch~=1.8.0) (4.2.0)
Installing collected packages: numpy, torch
  Attempting uninstall: torch
    Found existing installation: torch 1.9.1
    Uninstalling torch-1.9.1:
      Successfully uninstalled torch-1.9.1
Successfully installed numpy-1.22.4 torch-1.8.1+cu102
(cu102) [kanis GitBox ]$ python -c 'import torch; print(torch.version.cuda)'
10.2

pip install ivadomed --extra-index-url https://download.pytorch.org/whl/cu111

pip install ivadomed --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu101 # but only if we change the pin to `torch~=1.8.1`, because there's currently only a 1.8.2 build here, weirdly; also see note about this being redundant
pip install ivadomed --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu102 # ditto

I don't follow the above two as well...

pip install ivadomed --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu111

Ultimately we just want this to work for as many people as possible:

>>> torch.cuda.is_available()
True

I second this.

if they have a GPU and (as in spinalcordtoolbox/spinalcordtoolbox#3790) to avoid the wasted download (--extra-index-url https://download.pytorch.org/whl/cpu) if they don't.

I don't know anything about Windows and GPUs, but as far as Linux goes, the builds on PyPI work for >90% of people, because they all come with CUDA 10.2 vendored inside -- that's why they're so large. It's only people with extremely recent GPUs that will need the helping hand of being told to use pip install ivadomed --extra-index-url https://download.pytorch.org/whl/cu111.

🤔

Cheers 🍷

It's definitely a pleasure working with you @kousu!

@kanishk16
Copy link
Contributor

Actually, @kanishk16, would it be okay if we merged this without editing the documentation? We can start a follow-up PR to strip down the install docs. The sooner this is merged the sooner I can also merge spinalcordtoolbox/spinalcordtoolbox#3790.

Guess what, I had the same thing in mind. Moreover, I started #1125 as a doc PR for #1000. After this finalizes, we could update it accordingly.

Feel free to work in your suggestions to setup.py. I just think that we shouldn't touch installation.md (and maybe I should even revert the changes I did make): technical writing is really time consuming to do well and will stall this PR for another couple weeks at least.

Great!

@kousu
Copy link
Contributor Author

kousu commented May 27, 2022

CUDA 11.1 decides torch 1.11 from PyPI, which is actually CUDA 10.2

  Using cached torch-1.11.0-cp39-cp39-manylinux1_x86_64.whl (750.6 MB)

Looks like it's installing torch 1.11 but cpu version

  Using cached torch-1.8.1-cp39-cp39-manylinux1_x86_64.whl (804.1 MB)

Even here, if I'm not wrong it appears it's installing cpu version

On Linux these are the CUDA versions. The clue is that they're ~800MB: much larger than the cpu builds. Here's an example from yesterday:

(py39) p115628@bireli:~$ pip install torch
Collecting torch
  Downloading torch-1.11.0-cp39-cp39-manylinux1_x86_64.whl (750.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 750.6/750.6 MB 1.7 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions in ./.conda/envs/py39/lib/python3.9/site-packages (from torch) (4.2.0)
Installing collected packages: torch
Successfully installed torch-1.11.0
(py39) p115628@bireli:~$ python -c 'import torch; print(torch.version.cuda)'
10.2
(py39) p115628@bireli:~$ pip install 'torch~=1.8.0'
Collecting torch~=1.8.0
  Using cached torch-1.8.1-cp39-cp39-manylinux1_x86_64.whl (804.1 MB)
Requirement already satisfied: numpy in ./.conda/envs/py39/lib/python3.9/site-packages (from torch~=1.8.0) (1.22.4)
Requirement already satisfied: typing-extensions in ./.conda/envs/py39/lib/python3.9/site-packages (from torch~=1.8.0) (4.2.0)
Installing collected packages: torch
  Attempting uninstall: torch
    Found existing installation: torch 1.11.0
    Uninstalling torch-1.11.0:
      Successfully uninstalled torch-1.11.0
Successfully installed torch-1.8.1
(py39) p115628@bireli:~$ python -c 'import torch; print(torch.version.cuda)'
10.2
pip install ivadomed --extra-index-url https://download.pytorch.org/whl/cu101 # but this is redundant
pip install ivadomed --extra-index-url https://download.pytorch.org/whl/cu102 # this is redundant; these packages are already on PyPI

so what I was saying is on Linux there's never a reason to use https://download.pytorch.org/whl/cu10* because PyPI has those files already; on Linux you'd only ever need https://download.pytorch.org/whl/cpu or https://download.pytorch.org/whl/cu11*.

pip install ivadomed --extra-index-url https://download.pytorch.org/whl/lts/1.8/cu101 # but only if we change the pin to `torch~=1.8.1`, because there's currently only a 1.8.2 build here, weirdly; also see note about this being redundant

I don't follow the above two as well...

Sorry, I was getting tired by the time I wrote that. I just meant: the cu101 repo has torch==1.8.2+cu101, but not torch==1.8.1+cu101. It's a minor detail really, and I fixed it with:

torch~=1.8.1
torchvision~=0.9.1

I'm not sure if I follow, perhaps I'm missing something but:

(pypicu102) [kanis GitBox ]$ pip install 'torch~=1.8.0'
Collecting torch~=1.8.0
  Using cached torch-1.8.1-cp39-cp39-win_amd64.whl (190.5 MB)
Collecting typing-extensions
  Using cached typing_extensions-4.2.0-py3-none-any.whl (24 kB)
Collecting numpy
  Using cached numpy-1.22.4-cp39-cp39-win_amd64.whl (14.7 MB)
Installing collected packages: typing-extensions, numpy, torch
Successfully installed numpy-1.22.4 torch-1.8.1 typing-extensions-4.2.0
(pypicu102) [kanis GitBox ]$ python -c 'import torch; print(torch.version.cuda)'
None
(pypicu102) [kanis GitBox ]$ python -c 'import torch; print(torch.cuda.is_available())'
False
(cu102) [kanis GitBox ]$ pip install 'torch~=1.8.0' --extra-index-url https://download.pytorch.org/whl/cu102
Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu102
Collecting torch~=1.8.0
  Using cached https://download.pytorch.org/whl/cu102/torch-1.8.1%2Bcu102-cp39-cp39-win_amd64.whl (1387.0 MB)
Collecting numpy
  Using cached numpy-1.22.4-cp39-cp39-win_amd64.whl (14.7 MB)
Requirement already satisfied: typing-extensions in c:\users\kanis\anaconda3\envs\cu102\lib\site-packages (from torch~=1.8.0) (4.2.0)
Installing collected packages: numpy, torch
  Attempting uninstall: torch
    Found existing installation: torch 1.9.1
    Uninstalling torch-1.9.1:
      Successfully uninstalled torch-1.9.1
Successfully installed numpy-1.22.4 torch-1.8.1+cu102
(cu102) [kanis GitBox ]$ python -c 'import torch; print(torch.version.cuda)'
10.2

Ah okay!! So you have a Windows machine with working GPUs. Excellent. I only have access Linux machines with GPUs. (I can probably work on rigging up Windows inside qemu with GPU passthrough?).

So Windows users do need https://download.pytorch.org/whl/cu10*. 🤔

This weird decision of torch -- to put make CUDA the default for their Linux builds but not for Windows -- is the reason I got into this issue in the first place, way back 2 years ago in spinalcordtoolbox/spinalcordtoolbox#2712. I think they must assume if you're using Linux you're hardcore and/or rich and/or using a data center, where you can afford to waste 700MB if you don't have GPUs, but if you do have GPUs then the time spent to remember how to add --extra-index-url .... costs more than just wasting the 700MB, while on the other hand and if you're using Windows you're probably a newbie and you need to be hand-held. torch is Facebook's project, designed for Facebook Research, and their researchers are probably using Windows or macOS locally, without GPUs, but deploying on their internal data centers, which have both GPUs and very fast internet connections.

The rest of us have to work around their quirky assumptions.

Could Windows users also use https://download.pytorch.org/whl/cu111? Is there any harm in directing all Windows users to CUDA 11? If so the install instructions can be reduced to:

Run

pip install --upgrade pip
pip install ivadomed

If you have GPUs to accelerate your computations, verify they are working with:

python -c 'import torch; print(torch.cuda.is_available())'

If they are not, try

pip uninstall -y torch torchvision
pip install --extra-index-url https://download.pytorch.org/whl/cu111 ivadomed

because then that catches both people with Windows GPUs and Linux people with new GPUs with compute capability > 7.0.

And anyway ivadomed only really needs GPUs for training -- inference can be done on the CPU, and we're not really expecting Windows people to be participating in training, that's what we have romane for in the first place.

So somewhere I'd also like to include

If you do not own any GPUs, you should instead

pip install --extra-index-url https://download.pytorch.org/whl/cpu ivadomed

Together that covers as many combinations of OSes and hardware as best as possible, without overwhelming people with specific instructions. We can drop the "ivadomed needs CUDA 11" and the "ComputeCanada HPC" and the "ivadomed supports GPUs on Windows/Linux" parts.

@kousu
Copy link
Contributor Author

kousu commented May 27, 2022

And I've dropped the docs changes. Once we get this finalized then we can tackle summarizing what we learned in #1125 :)

@kousu kousu mentioned this pull request May 27, 2022
7 tasks
@kousu kousu merged commit 2f75bb1 into master Jun 2, 2022
@kousu kousu deleted the ng/cu113 branch June 2, 2022 07:10
@dyt811 dyt811 added this to the new release milestone Jun 2, 2022
kousu added a commit to spinalcordtoolbox/spinalcordtoolbox that referenced this pull request Jun 2, 2022
#3790 won't work
right until ivadomed/ivadomed#1129 is reviewed
and published; but this lets us at least test in the meantime.

This commit should be reverted before this branch is merged.
kousu added a commit to axondeepseg/axondeepseg that referenced this pull request Jul 25, 2022
ivadomed now <ivadomed/ivadomed#1129> specifies torch
as a standard dependency, without giving special manual instructions for it.

This means there's finally no need to pin a torch version here in axondeepseg;
previously, we had to match their version in our environment.yml because it was
a manual step in their install instructions. But now the metadata makes pip handle it.

ivadomed gave special install instructions to try to handle the complicated mess that it
hardware (i.e. CPU vs GPU vs hyper modern CUDA-11-only GPUs) compatibility.
But now they're just leaning on the 80%-20% rule of thumb: >80% of people are covered
by the default torch builds. And we know how to handle the <20% cases of people with
CUDA-11-only GPUs or Windows machines with GPUs by having them install with
'--extra-index-url = https://download.pytorch.org/whl/<hardware-specific-build>'

axondeepseg was pushing the CPU version of torch on people, assuming that most users
would not own GPUs and that in the Linux case it would be very wasteful, maybe even
make ADS uninstallable for some users. We gave our own manual instructions that
duplicated ivadomed's for how to switch to the GPU version. That process meant double
the downloads and likely mismatched versions, and maintenance work on our end
as we would have to update the manual instructions whenever ivadomed did.

Instead, follow torch's defaults, like ivadomed now does. But since the users with
Linux and CPU-only hardware is still fairly common, highlight --extra-index-url as an
optional *pre*step, allowing for the same final install but avoiding the double-download
and likely versioning mistakes.

Overall:

- Windows and macOS continue to have no GPU support
- Linux always has GPU support, even if it doesn't need it; this balloons the
  default install, but anyone using torch now has this same problem and the
  consensus seems to be just to pay for bigger disks that can tolerate the waste
- The relatively common case of Linux users without GPUs have an escape
  hatch to avoid the large download 
- Windows users with GPUs have a way to use them
  (`--extra-index-url https://download.pytorch.org/whl/cu102`), but we don't
  document nor support this
- Linux/Windows users with super new CUDA-11-only GPUs also have a
  way to use them, which we we also don't document nor support directly

Fixes ivadomed/ivadomed#861 (comment)

Related: spinalcordtoolbox/spinalcordtoolbox#3790

In passing, remove the redundant parts of the `conda env create` line.

Co-authored-by: Mathieu Boudreau, PhD <emb6150@gmail.com>
Co-authored-by: Armand <83031821+hermancollin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file installation category: installation-related stuff
Projects
None yet
5 participants