New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unify installation (attempt #2) #1129
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drat. Good point. Maybe it's not so bad? I think while they only updated their documentation at that version, the older versions are available in this repo ("index") too. It's not obvious from clicking on https://download.pytorch.org/whl/cu113 what's available, but I realized I could get
So actually this form works:
So the trick is just that they don't have builds for every combination of CUDA/torch, only for the ones that overlapped in support in time. So I shouldn't have bumped this to CUDA 11.3 at the same time, actually; I'll undo that. |
This comment was marked as resolved.
This comment was marked as resolved.
#3790 won't work right until ivadomed/ivadomed#1129 is reviewed and published; but this lets us at least test in the meantime. This commit should be reverted before this branch is merged.
Pull Request Test Coverage Report for Build 2426322688
💛 - Coveralls |
@kousu I packaged & published a vanilla pypi package TINX for an end to end sanity check. TLDR; the
Also, I wanted to try if we could unpin CUDA11. And, it kinda works: installing TINX on CPU
installing TINX on CUDA10.2
installing TINX on CUDA11.1
Unfortunately, this doesn't work for CUDA11.3: installing TINX on CUDA11.3
Weirdly, torch 1.8.1 and 1.9.0 both work on CUDA10.2, CUDA11.1 installing torch on CUDA10.2
installing torch on CUDA11.1
installing torch on CUDA11.3
This is expected as PyTorch 1.8.2 doesn't build whls for CUDA11.3. Anyways, we could atleast unpin CUDA11. Since there aren't many changes required as such, I guess we could expand the scope of this PR to include that prospect as well. Hope that isn't an issue. As far this PR is concerned, I'm inclined towards:
What do you think? If you agree, I'll update the PR with some extras_require for dev related dependencies from #1000 and then maybe you could test this PR on the neuropoly infra to ensure we aren't breaking anything. |
I think that's a great idea! spinalcordtoolbox/spinalcordtoolbox#3790 is currently pointed at this branch so you after you do your updates I should be able to just re-trigger our CI there to see how it behaves. Let me know when that's done :) In the meantime I'm going to try out your TINX package on some different platforms to make sure I understand myself how torch's various packages behave. |
You're great to work with @kanishk16. Inspired by your experiments above, I wrote a script that can scan what the torch repos currently publish. It uses the trick that
I tested it under different (there's a lot of information here but I'll do my best to summarize it for you) python3.10 on Linux
python3.9 on Linux
python3.8 on Linux
pyrhon3.7 on Linux
python3.6 on Linux
Also, if you install torch off pypi, these are the CUDA versions you get bundled: python 3.10 on Linux: CUDA 10.2(there's no 1.8 branch here)
python 3.9 on Linux: CUDA 10.2(on both latest and the 1.8 branches)
python 3.8 on Linux: CUDA 10.2(on both latest and the 1.8 branch)
python 3.7 on Linux: CUDA 10.2(in both latest torch and the 1.8 branch)
python 3.6 on Linux: CUDA 10.1(there's no latest nor 1.8 branch; but the most recent torch uses CUDA 10.1)
Summary:
Ultimately we just want this to work for as many people as possible:
if they have a GPU and (as in spinalcordtoolbox/spinalcordtoolbox#3790) to avoid the wasted download ( I don't know anything about Windows and GPUs, but as far as Linux goes, the builds on PyPI work for >90% of people, because they all come with CUDA 10.2 vendored inside -- that's why they're so large. It's only people with extremely recent GPUs that will need the helping hand of being told to use Okay, gonna post this now and then let it settle and then maybe think how all this information should affect the installation docs. Cheers 🍷 |
is backwards. On this system, with GeForce GTX TITAN X cards, using PyPI's torch is fine: bireli
That It's not okay on this other system with RTX A6000 cards: romane
is actually too new for CUDA 10.2:
i.e. it only goes up to compute capability 7.0, not up to 8.6 I did #951 8 months ago specifically so people here in the lab could make use of To repeat myself, I have a better solution now, if we can use it: I've made sure that system has the latest so simply not being in a venv makes it work:
This doesn't quite work because ivadomed tells everyone to start by making an empty env, so my system-wide install gets ignored. But it's still what I want to encourage: people should just use what's on PyPI, and install it system-wide (or at most, With that in mind, I just discovered a way to do that for us, here's my much better fix for
This is a system-specific patch, needed until CUDA 11 is common enough that it gets rolled into the packages on PyPI. It's something the person who owns the hardware needs to be aware of; ivadomed shouldn't be telling its users about this at all. demo that this behaves
I'm going to apply that patch to Even better, if you give a broader set of repos, you cover most possible torch versions:
Notice how with this in place 1.7 -> CUDA 11.0, 1.8 -> CUDA 11.1, 1.9 -> CUDA 11.1, 1.10 -> CUDA 11.3, 1.11 -> CUDA 11.3. |
By using ~=, we get broader compatiblity: on systems using `--index-url https://download.pytorch.org/whl/lts/1.8/cu102` or `--index-url https://download.pytorch.org/whl/lts/1.8/cu111` people will get 1.8.2 instead of 1.8.1, and will continue to get patches into the future so long as 1.8.x is maintained. See #1129 (comment) for the data that went into this.
Actually, @kanishk16, would it be okay if we merged this without editing the documentation? We can start a follow-up PR to strip down the install docs. The sooner this is merged the sooner I can also merge spinalcordtoolbox/spinalcordtoolbox#3790. Feel free to work in your suggestions to setup.py. I just think that we shouldn't touch installation.md (and maybe I should even revert the changes I did make): technical writing is really time consuming to do well and will stall this PR for another couple weeks at least. |
Out of all the widely supported CUDA versions, I'm inclined towards supporting CUDA 10.2, CUDA 11.1 and and CUDA 11.3.
Maybe or maybe not... With the release of 1.11.0, I'm afraid they could even change their LTS to 1.10.0 as it supports CUDA 10.2, CUDA 11.1 and CUDA 11.3 🤷♂️
Yup even I observed the same.
Definitely loosening up torch a little more.
Looks like it's installing torch 1.11 but cpu version
Even here, if I'm not wrong it appears it's installing cpu version
✅
I'm not sure if I follow, perhaps I'm missing something but:
✅
I don't follow the above two as well...
✅
I second this.
🤔
It's definitely a pleasure working with you @kousu! |
Guess what, I had the same thing in mind. Moreover, I started #1125 as a doc PR for #1000. After this finalizes, we could update it accordingly.
Great! |
On Linux these are the CUDA versions. The clue is that they're ~800MB: much larger than the cpu builds. Here's an example from yesterday:
so what I was saying is on Linux there's never a reason to use
Sorry, I was getting tired by the time I wrote that. I just meant: the cu101 repo has Lines 24 to 25 in 2d7b598
Ah okay!! So you have a Windows machine with working GPUs. Excellent. I only have access Linux machines with GPUs. (I can probably work on rigging up Windows inside qemu with GPU passthrough?). So Windows users do need This weird decision of torch -- to put make CUDA the default for their Linux builds but not for Windows -- is the reason I got into this issue in the first place, way back 2 years ago in spinalcordtoolbox/spinalcordtoolbox#2712. I think they must assume if you're using Linux you're hardcore and/or rich and/or using a data center, where you can afford to waste 700MB if you don't have GPUs, but if you do have GPUs then the time spent to remember how to add The rest of us have to work around their quirky assumptions. Could Windows users also use
because then that catches both people with Windows GPUs and Linux people with new GPUs with compute capability > 7.0. And anyway So somewhere I'd also like to include
Together that covers as many combinations of OSes and hardware as best as possible, without overwhelming people with specific instructions. We can drop the "ivadomed needs CUDA 11" and the "ComputeCanada HPC" and the "ivadomed supports GPUs on Windows/Linux" parts. |
And I've dropped the docs changes. Once we get this finalized then we can tackle summarizing what we learned in #1125 :) |
By using ~=, we get broader compatiblity: on systems using `--index-url https://download.pytorch.org/whl/lts/1.8/cu102` or `--index-url https://download.pytorch.org/whl/lts/1.8/cu111` people will get 1.8.2 instead of 1.8.1, and will continue to get patches into the future so long as 1.8.x is maintained. See #1129 (comment) for the data that went into this.
into extras_requirements
It's redundant: they all got moved to the `dev` extra.
#3790 won't work right until ivadomed/ivadomed#1129 is reviewed and published; but this lets us at least test in the meantime. This commit should be reverted before this branch is merged.
ivadomed now <ivadomed/ivadomed#1129> specifies torch as a standard dependency, without giving special manual instructions for it. This means there's finally no need to pin a torch version here in axondeepseg; previously, we had to match their version in our environment.yml because it was a manual step in their install instructions. But now the metadata makes pip handle it. ivadomed gave special install instructions to try to handle the complicated mess that it hardware (i.e. CPU vs GPU vs hyper modern CUDA-11-only GPUs) compatibility. But now they're just leaning on the 80%-20% rule of thumb: >80% of people are covered by the default torch builds. And we know how to handle the <20% cases of people with CUDA-11-only GPUs or Windows machines with GPUs by having them install with '--extra-index-url = https://download.pytorch.org/whl/<hardware-specific-build>' axondeepseg was pushing the CPU version of torch on people, assuming that most users would not own GPUs and that in the Linux case it would be very wasteful, maybe even make ADS uninstallable for some users. We gave our own manual instructions that duplicated ivadomed's for how to switch to the GPU version. That process meant double the downloads and likely mismatched versions, and maintenance work on our end as we would have to update the manual instructions whenever ivadomed did. Instead, follow torch's defaults, like ivadomed now does. But since the users with Linux and CPU-only hardware is still fairly common, highlight --extra-index-url as an optional *pre*step, allowing for the same final install but avoiding the double-download and likely versioning mistakes. Overall: - Windows and macOS continue to have no GPU support - Linux always has GPU support, even if it doesn't need it; this balloons the default install, but anyone using torch now has this same problem and the consensus seems to be just to pay for bigger disks that can tolerate the waste - The relatively common case of Linux users without GPUs have an escape hatch to avoid the large download - Windows users with GPUs have a way to use them (`--extra-index-url https://download.pytorch.org/whl/cu102`), but we don't document nor support this - Linux/Windows users with super new CUDA-11-only GPUs also have a way to use them, which we we also don't document nor support directly Fixes ivadomed/ivadomed#861 (comment) Related: spinalcordtoolbox/spinalcordtoolbox#3790 In passing, remove the redundant parts of the `conda env create` line. Co-authored-by: Mathieu Boudreau, PhD <emb6150@gmail.com> Co-authored-by: Armand <83031821+hermancollin@users.noreply.github.com>
Checklist
GitHub
PR contents
Description
Use the newish --extra-index that torch put up at https://download.pytorch.org/whl in preference to --find-links. This means a lot of the installation gymnastics can be streamlined.Streamline the installation by moving all dependencies back into setup.py. This essentially redoes #1000 but without the disallowed direct .whl links or the
pip install ivadomed[cpu]
/pip install ivadomed[gpu]
extras.Effects:
Each
requirements_x.txt
becomes an “extra”, giving a consistent installation process no matter whether people are installing from git, from pypi, or as a dependency of another package. There's no fumbling withpip install -r requirements_dev.txt
orpip install -r requirements_gpu.txt
Drops
requirements_gpu.txt
entirely, declaring our dependency ontorch
/torchvision
insetup.py
directly.Doing this fixes
whl
missing a dependency:torchvision
#1130 and Newtorch
instructions will make it harder for downstream packages to depend on ivadomed's Python API #861, making sure downstream packages (like SCT) can safely depend on ivadomed. This even resolvesrequirements_dev.txt
currently installs PyTorch CPU, conflicting with an earlier step #996.torch
is loosened totorch~=1.8.1
(meaning 1.8.1, 1.8.2, 1.8.3, …), which enables--extra-index-url https://download.pytorch.org/whl/....
for a wider set of platforms (see below)CUDA 11 is no longer mandatory -- it never really was.
Dropping
requirements_gpu.txt
is, IMO, the biggest change here. Instead of trying to second-guess torch, we will conform to their decision to put CPU-only Windows/Mac builds, but CUDA 10.* GPU Linux builds on PyPI.The CUDA 10 GPU builds are a waste of space for the majority of Linux users (e.g.), while for a minority (i.e. us, when working on
romane.neuro.polymtl.ca
) they are broken because those users need CUDA 11 instead; on the other hand, the CPU builds mean Windows users can't train models if that's something they want to try to help out with.But @kanishk16 noticed that torch provides an escape hatch to cover all these cases now, one that's better than us trying to handle it in our setup.py (i.e. #1000's experiment with
pip install ivadomed[cpu]
/pip install ivadomed[gpu]
): torch hosts custom pip repos people can use likeThis PR clears the way for people to use those by getting ivadomed out of the business of assuming what hardware people have.
torch
's default (=PyPI) builds work for 99% of systems; people on the bleeding edge (notably: ourromane.neuro.polymtl.ca
system, or people who want to do deep learning natively on Windows) can adapt their pip.conf to it; but that's not and shouldn't be in ivadomed's scope. We will document "some amount" of this in followup that's already started: #1125 but, again, mostly we should stick close to torch's decisions to avoid the friction.(please ignore the branch name; it was a naive name from the first version of this branch, something like 'torch-install' would have been better)
Linked issues
Resolves #996, Fixes #1130 as well as fixes #861 which is part of fixing spinalcordtoolbox/spinalcordtoolbox#3790.
To be followed by #1125.