Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytorch 2.0.1 pypi wheel does not install dependent cuda libraries #100974

Closed
Martin4R opened this issue May 9, 2023 · 39 comments
Closed

Pytorch 2.0.1 pypi wheel does not install dependent cuda libraries #100974

Martin4R opened this issue May 9, 2023 · 39 comments
Assignees
Labels
module: binaries Anything related to official binaries that we release to users module: regression It used to work, and now it doesn't needs design triage review
Milestone

Comments

@Martin4R
Copy link

Martin4R commented May 9, 2023

🐛 Describe the bug

With torch 2.0.1 the torch pypi wheel does not depend on cuda libraries anymore. Therefore when starting torch on a GPU enabled machine, it complains ValueError: libnvrtc.so.*[0-9].*[0-9] not found in the system path (stacktrace see at the end below).

When I show the dependency trees for torch=2.0.1 and torch=2.0.0 with poetry (installed on the same machine with same dependency file as before) it becomes clear that torch 2.0.1 is missing the nvidia dependencies:

└── torch 2.0.1 
        ├── filelock * 
        ├── jinja2 * 
        │   └── markupsafe >=2.0 
        ├── networkx * 
        ├── sympy * 
        │   └── mpmath >=0.19 
        └── typing-extensions * 

└── torch 2.0.0 
        ├── filelock * 
        ├── jinja2 * 
        │   └── markupsafe >=2.0 
        ├── networkx * 
        ├── nvidia-cublas-cu11 11.10.3.66 
        │   ├── setuptools * 
        │   └── wheel * 
        ├── nvidia-cuda-cupti-cu11 11.7.101 
        │   ├── setuptools * (circular dependency aborted here)
        │   └── wheel * (circular dependency aborted here)
        ├── nvidia-cuda-nvrtc-cu11 11.7.99 
        │   ├── setuptools * (circular dependency aborted here)
        │   └── wheel * (circular dependency aborted here)
        ├── nvidia-cuda-runtime-cu11 11.7.99 
        │   ├── setuptools * (circular dependency aborted here)
        │   └── wheel * (circular dependency aborted here)
        ├── nvidia-cudnn-cu11 8.5.0.96 
        │   ├── setuptools * (circular dependency aborted here)
        │   └── wheel * (circular dependency aborted here)
        ├── nvidia-cufft-cu11 10.9.0.58 
        ├── nvidia-curand-cu11 10.2.10.91 
        │   ├── setuptools * (circular dependency aborted here)
        │   └── wheel * (circular dependency aborted here)
        ├── nvidia-cusolver-cu11 11.4.0.1 
        │   ├── setuptools * (circular dependency aborted here)
        │   └── wheel * (circular dependency aborted here)
        ├── nvidia-cusparse-cu11 11.7.4.91 
        │   ├── setuptools * (circular dependency aborted here)
        │   └── wheel * (circular dependency aborted here)
        ├── nvidia-nccl-cu11 2.14.3 
        ├── nvidia-nvtx-cu11 11.7.91 
        │   ├── setuptools * (circular dependency aborted here)
        │   └── wheel * (circular dependency aborted here)
        ├── sympy * 
        │   └── mpmath >=0.19 
        ├── triton 2.0.0 
        │   ├── cmake * 
        │   ├── filelock * (circular dependency aborted here)
        │   ├── lit * 
        │   └── torch * (circular dependency aborted here)
        └── typing-extensions * 

Here the stacktrace of the error at runtime:

File "/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages/easyocr/recognition.py", line 2, in <module>
    import torch
File "/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages/torch/__init__.py", line 228, in <module>
  _load_global_deps()
File "/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages/torch/__init__.py", line 189, in _load_global_deps
   _preload_cuda_deps(lib_folder, lib_name)
File "/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages/torch/__init__.py", line 154, in _preload_cuda_deps
raise ValueError(f"{lib_name} not found in the system path {sys.path}")
ValueError: libnvrtc.so.*[0-9].*[0-9] not found in the system path ['/home/ray', '/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard', '/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages/ray/thirdparty_files', '/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/workers', '/home/ray/anaconda3/envs/myenv/lib/python3.10', '/home/ray/anaconda3/envs/myenv/lib/python3.10/lib-dynload', '/home/ray/anaconda3/envs/myenv/lib/python3.10/site-packages']

Versions

Version where the issue occurs is the pypi wheel of torch 2.0.1.

When trying to run python collect_env.py to collect the versions, two errors shows up:

"OSError: libcurand.so.10: cannot open shared object file: No such file or directory"
During handling of the above exception, another exception occurred:
"ValueError: libnvrtc.so.*[0-9].*[0-9] not found in the system path"

cc @ezyang @gchanan @zou3519 @seemethere @malfet

@malfet malfet added high priority module: binaries Anything related to official binaries that we release to users module: regression It used to work, and now it doesn't labels May 9, 2023
@malfet
Copy link
Contributor

malfet commented May 9, 2023

Not sure about the poetry, but I can't reproduce it with pip:

$ pip install torch
Collecting torch
  Downloading torch-2.0.1-cp310-cp310-manylinux1_x86_64.whl (619.9 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 619.9/619.9 MB 1.8 MB/s eta 0:00:00
Collecting filelock
  Using cached filelock-3.12.0-py3-none-any.whl (10 kB)
Collecting jinja2
  Using cached Jinja2-3.1.2-py3-none-any.whl (133 kB)
Collecting nvidia-cuda-nvrtc-cu11==11.7.99
  Downloading nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl (21.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.0/21.0 MB 68.5 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu11==11.7.99
  Using cached nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)
Collecting nvidia-cublas-cu11==11.10.3.66
  Downloading nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl (317.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 317.1/317.1 MB 4.4 MB/s eta 0:00:00
Collecting nvidia-cusparse-cu11==11.7.4.91
  Using cached nvidia_cusparse_cu11-11.7.4.91-py3-none-manylinux1_x86_64.whl (173.2 MB)
Collecting nvidia-curand-cu11==10.2.10.91
  Using cached nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)
Collecting triton==2.0.0
  Using cached triton-2.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB)
Collecting sympy
  Downloading sympy-1.11.1-py3-none-any.whl (6.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.5/6.5 MB 110.2 MB/s eta 0:00:00
Collecting networkx
  Using cached networkx-3.1-py3-none-any.whl (2.1 MB)
Collecting nvidia-cudnn-cu11==8.5.0.96
  Downloading nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 557.1/557.1 MB 2.2 MB/s eta 0:00:00
Collecting typing-extensions
  Using cached typing_extensions-4.5.0-py3-none-any.whl (27 kB)
Collecting nvidia-cuda-cupti-cu11==11.7.101
  Using cached nvidia_cuda_cupti_cu11-11.7.101-py3-none-manylinux1_x86_64.whl (11.8 MB)
Collecting nvidia-nvtx-cu11==11.7.91
  Using cached nvidia_nvtx_cu11-11.7.91-py3-none-manylinux1_x86_64.whl (98 kB)
Collecting nvidia-cufft-cu11==10.9.0.58
  Downloading nvidia_cufft_cu11-10.9.0.58-py3-none-manylinux1_x86_64.whl (168.4 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 168.4/168.4 MB 12.8 MB/s eta 0:00:00
Collecting nvidia-nccl-cu11==2.14.3
  Using cached nvidia_nccl_cu11-2.14.3-py3-none-manylinux1_x86_64.whl (177.1 MB)
Collecting nvidia-cusolver-cu11==11.4.0.1
  Using cached nvidia_cusolver_cu11-11.4.0.1-2-py3-none-manylinux1_x86_64.whl (102.6 MB)
Requirement already satisfied: setuptools in ./miniconda3/envs/tmp/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch) (66.0.0)
Requirement already satisfied: wheel in ./miniconda3/envs/tmp/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch) (0.38.4)
Collecting cmake
  Using cached cmake-3.26.3-py2.py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (24.0 MB)
Collecting lit
  Downloading lit-16.0.3.tar.gz (138 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 138.0/138.0 kB 1.2 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting MarkupSafe>=2.0
  Downloading MarkupSafe-2.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB)
Collecting mpmath>=0.19
  Using cached mpmath-1.3.0-py3-none-any.whl (536 kB)
Building wheels for collected packages: lit
  Building wheel for lit (setup.py) ... done
  Created wheel for lit: filename=lit-16.0.3-py3-none-any.whl size=88174 sha256=7d3679299d6300eadb79a8fc0740df64763a7893f9a3a86b1f0e241d37f8f65a
  Stored in directory: /home/nshulga/.cache/pip/wheels/d6/81/1c/a49ba782377339294cc45c9899927b61a92e58d6ad3ac942f7
Successfully built lit
Installing collected packages: mpmath, lit, cmake, typing-extensions, sympy, nvidia-nvtx-cu11, nvidia-nccl-cu11, nvidia-cusparse-cu11, nvidia-curand-cu11, nvidia-cufft-cu11, nvidia-cuda-runtime-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-cupti-cu11, nvidia-cublas-cu11, networkx, MarkupSafe, filelock, nvidia-cusolver-cu11, nvidia-cudnn-cu11, jinja2, triton, torch

@malfet
Copy link
Contributor

malfet commented May 9, 2023

@Martin4R can you share a bit more info about your setup? Also, please run python3 -mtorch.utils.collect_env and share its output here.
Ok, I can reproduce it locally, using very simple pyproject.toml:

[tool.poetry]
name = "foo"
version = "0.1.0"
description = ""
authors = ["Foo Bar <foo@bar.com>"]
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.10"
torch = "2.0.1"


[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

@Martin4R
Copy link
Author

Hi @malfet,
we use Conda to create a separated environment with just a specific python version and then run Poetry to install the dependencies into the Conda environment. So, the minimal pyproject.toml you posted for reproducing the error represents our case pretty well.
When I execute python3 -mtorch.utils.collect_env I get the same error I mentioned already in the "version" section of the description:

(base) azureuser@TestVM:~$ conda activate myenv
(myenv) azureuser@TestVM:~$ python --version
Python 3.10.8
(myenv) azureuser@TestVM:~$ python3 -mtorch.utils.collect_env
Traceback (most recent call last):
  File "/home/azureuser/.conda/envs/myenv/lib/python3.10/site-packages/torch/__init__.py", line 168, in _load_global_deps
    ctypes.CDLL(lib_path, mode=ctypes.RTLD_GLOBAL)
  File "/home/azureuser/.conda/envs/myenv/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcurand.so.10: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/azureuser/.conda/envs/myenv/lib/python3.10/runpy.py", line 187, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/home/azureuser/.conda/envs/myenv/lib/python3.10/runpy.py", line 110, in _get_module_details
    __import__(pkg_name)
  File "/home/azureuser/.conda/envs/myenv/lib/python3.10/site-packages/torch/__init__.py", line 228, in <module>
    _load_global_deps()
  File "/home/azureuser/.conda/envs/myenv/lib/python3.10/site-packages/torch/__init__.py", line 189, in _load_global_deps
    _preload_cuda_deps(lib_folder, lib_name)
  File "/home/azureuser/.conda/envs/myenv/lib/python3.10/site-packages/torch/__init__.py", line 154, in _preload_cuda_deps
    raise ValueError(f"{lib_name} not found in the system path {sys.path}")
ValueError: libnvrtc.so.*[0-9].*[0-9] not found in the system path ['/home/azureuser', '/home/azureuser/.conda/envs/myenv/lib/python310.zip', '/home/azureuser/.conda/envs/myenv/lib/python3.10', '/home/azureuser/.conda/envs/myenv/lib/python3.10/lib-dynload', '/home/azureuser/.conda/envs/myenv/lib/python3.10/site-packages']

I found a workaround for now, by manually listing the nvidia-* dependencies from torch 2.0.0 in my pyproject.toml together with torch 2.0.1, so that they always get installed. Our software is then working with GPU successfully again.
My pyproject.toml

[tool.poetry]
name = "foo"
version = "0.1.0"
description = ""
authors = ["Foo Bar <foo@bar.com>"]

[tool.poetry.dependencies]
python = "3.10.8"
torch = "2.0.1"
nvidia-cublas-cu11 = { version = "11.10.3.66", platform = 'linux' }
nvidia-cuda-cupti-cu11 = { version = "11.7.101", platform = 'linux' }
nvidia-cuda-nvrtc-cu11 = { version = "11.7.99", platform = 'linux' }
nvidia-cuda-runtime-cu11 = { version = "11.7.99", platform = 'linux' }
nvidia-cudnn-cu11 = { version = "8.5.0.96", platform = 'linux' }
nvidia-cufft-cu11 = { version = "10.9.0.58", platform = 'linux' }
nvidia-curand-cu11 = { version = "10.2.10.91", platform = 'linux' }
nvidia-cusolver-cu11 = { version = "11.4.0.1", platform = 'linux' }
nvidia-cusparse-cu11 = { version = "11.7.4.91", platform = 'linux' }
nvidia-nccl-cu11 = { version = "2.14.3", platform = 'linux' }
nvidia-nvtx-cu11 = { version = "11.7.91", platform = 'linux' }
triton = { version = "2.0.0", platform = 'linux' }

When I then run python3 -mtorch.utils.collect_env I get

Collecting environment information...
PyTorch version: 2.0.1+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: version 3.26.3
Libc version: glibc-2.31

Python version: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:26:04) [GCC 10.4.0] (64-bit runtime)
Python platform: Linux-5.4.0-1101-azure-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: Tesla K80
Nvidia driver version: 470.82.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   44 bits physical, 48 bits virtual
CPU(s):                          6
On-line CPU(s) list:             0-5
Thread(s) per core:              1
Core(s) per socket:              6
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           63
Model name:                      Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
Stepping:                        2
CPU MHz:                         2596.995
BogoMIPS:                        5193.99
Hypervisor vendor:               Microsoft
Virtualization type:             full
L1d cache:                       192 KiB
L1i cache:                       192 KiB
L2 cache:                        1.5 MiB
L3 cache:                        30 MiB
NUMA node0 CPU(s):               0-5
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] torch==2.0.1
[pip3] torchvision==0.15.2
[conda] numpy                     1.23.5                   pypi_0    pypi
[conda] torch                     2.0.1                    pypi_0    pypi
[conda] torchvision               0.15.2                   pypi_0    pypi

@twoertwein
Copy link
Contributor

twoertwein commented May 11, 2023

@malfet
It seems that the wheel correctly contains the dependecy information but it is missing in https://pypi.org/pypi/torch/2.0.1/json
See python-poetry/poetry#7902 (comment)

edit: The json file for 2.0.0 contains all the dependecies https://pypi.org/pypi/torch/2.0.0/json

@malfet
Copy link
Contributor

malfet commented May 11, 2023

@twoertwein thank you for the information. Trying to figure out when this file is generated by pypi and whether there is a way to update it without uploading a new binary. My suspicion is that it depends on the package upload order and we got lucky with 2.0.0 to upload Linux package last, which has all the deps...

@gautiervarjo
Copy link

I encountered a related problem in the 2.0.0 release as well: pantsbuild/pants#18936 (comment)

The gist of this link is:

  • I use the Pantsbuild/PEX combination to manage my python dependencies.
  • PEX resolves dependencies and creates a multi-platform lockfile.
  • Platform-specific dependencies must be properly declared using environment markers, eg pytorch's Requires-Dist: triton (==2.0.0) ; platform_system == "Linux" and platform_machine == "x86_64"
  • PEX assumes that all the wheels for a specific dependency version (eg torch 2.0.0 for macosx/win/linux py3.8/3.9/...) have identical requirements, so that it does not need to download all those wheels to resolve dependencies.
  • With pytorch's PyPI wheels this assumption is broken. PEX downloads a single wheel, which will be missing the CUDA deps unless it got lucky and downloaded a Linux x86 wheel.

Could the PyPI wheels all have the same declared dependencies? Linux x86 already has appropriate environment markers, so the CUDA/triton deps that are specific to it would simply be no-ops for other platforms.

@atalman
Copy link
Contributor

atalman commented Jun 8, 2023

@malfet Should we try to address this issue using suggestion in this comment: python-poetry/poetry#7902 (comment)

As I understand it, dependencies on the JSON API are taken from the metadata in the first wheel that is uploaded.

But rather than relying on such wrinkles, it is better to put the same metadata in all uploaded distributions, and use markers for platform-specific variation

@mjachi
Copy link

mjachi commented Jun 12, 2023

Following... was having a related issue with Github Actions. Have not found anything w.r.t. Actions or any CI in particular on this yet. The short version was that, despite not running GPU/ absolutely nothing in these tests going to GPU/ to my understanding, not installing with CUDA/ GPU support, my pytest suite kept failing due to the below:

==================================== ERRORS ====================================
_____________ ERROR collecting tests/test_forward/test_forward.py ______________
{OMITTED TRACE}
E   OSError: libcufft.so.10: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:
{OMITTED PATH}:4: in <module>
    import torch
.venv/lib/python3.9/site-packages/torch/__init__.py:2[28](OMITTED URL): in <module>
    _load_global_deps()
.venv/lib/python3.9/site-packages/torch/__init__.py:189: in _load_global_deps
    _preload_cuda_deps(lib_folder, lib_name)
.venv/lib/python3.9/site-packages/torch/__init__.py:154: in _preload_cuda_deps
    raise ValueError(f"{lib_name} not found in the system path {sys.path}")
E   ValueError: libcublas.so.*[0-9] not found in the system path [OMITTED PATH]

However, copy/pasting all the dependencies explicitly as suggested above fixed it (for now, at least).

@thusithaC
Copy link

Thanks for looking at this issue. For now downgrading to 2.0.0 seems to work for me (poetry and pypi). Do you plan to fix the 2.0.1 installation or do we have to wait for the next iteration ?

sammlapp added a commit to kitzeslab/opensoundscape that referenced this issue Jun 18, 2023
torch 2.0.1 has known issues with installation and caused our CI to fail (see pytorch/pytorch#100974). I think the simplest solution is to just disallow the specific version 2.0.1
@sammlapp
Copy link

I also saw this issue in a GitHub CI. It only occurred for Ubuntu actions, not for Mac OS, and downgrading to torch 2.0.0 resolved the issue for me. Specifically, I've specified torch = ">=2.0.0, !=2.0.1" in pyproject.toml because I assume the issue will be resolved before the next pytorch release after 2.0.1.

@kyaryunha
Copy link

I also solved it the same way as sammlapp. thank you.

@realfresh
Copy link

In case it helps anyone, if you are using arch, installing the nccl package made it start working for me.

pacman -S nccl

https://archlinux.org/packages/extra/x86_64/nccl/

obendidi added a commit to IsmaelMekene/Mistral-GPT that referenced this issue Oct 16, 2023
atalman added a commit to atalman/pytorch that referenced this issue Oct 19, 2023
Will fix package after publishing pytorch#100974
Poetry install requires all wheels on pypi to have same metadata. Hence including linux dependencies in all non-linux wheels

Pull Request resolved: pytorch#111042
Approved by: https://github.com/malfet
atalman added a commit that referenced this issue Oct 19, 2023
* Add pypi required metadata to all wheels except linux (#111042)

Will fix package after publishing #100974
Poetry install requires all wheels on pypi to have same metadata. Hence including linux dependencies in all non-linux wheels

Pull Request resolved: #111042
Approved by: https://github.com/malfet

* Regenerate workflows
@atalman
Copy link
Contributor

atalman commented Oct 26, 2023

validation required, included in 2.1.1

@atalman
Copy link
Contributor

atalman commented Nov 7, 2023

Validated following packages:
torch-2.1.1-cp3xx-cp3xx-manylinux_2_17_aarch64.manylinux2014_aarch64
torch-2.1.1-cp3xx-none-macosx_11_0_arm64
torch-2.1.1-cp3xx-none-macosx_10_9_x86_64
torch-2.1.1+cpu-cp3xx-cp3xx-win_amd64
torch-2.1.1+cu121.with.pypi.cudnn-cp3x-cp3x-linux_x86_64

kacperlukawski added a commit to qdrant/qdrant-haystack that referenced this issue Nov 14, 2023
kacperlukawski added a commit to qdrant/qdrant-haystack that referenced this issue Nov 14, 2023
* Do not restrict torch = "<=2.0.0"

* Restrict problematic pytorch versions

pytorch/pytorch#100974

* Set poetry to 1.7.0 in tests

* Add Python 3.11 to tests

* Rollback installing poetry 1.7.0

* Set PYTORCH_MPS_HIGH_WATERMARK_RATIO to 0.7 in tests

* Remove macos from workflow

* Allow python 3.12

* Rollback to python 3.8-3.11

* Enable macos in tests back again

* Add CPU version of torch in dev group

* Fix using pytorch CPU version

* Separate dev dependency to pytorch per platform

* Disable tests on macos
@atalman
Copy link
Contributor

atalman commented Nov 15, 2023

Release 2.1.1 is out

curl -s https://pypi.org/pypi/torch/2.1.1/json | jq '.info.requires_dist'
[
  "filelock",
  "typing-extensions",
  "sympy",
  "networkx",
  "jinja2",
  "fsspec",
  "nvidia-cuda-nvrtc-cu12 (==12.1.105) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cuda-runtime-cu12 (==12.1.105) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cuda-cupti-cu12 (==12.1.105) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cudnn-cu12 (==8.9.2.26) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cublas-cu12 (==12.1.3.1) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cufft-cu12 (==11.0.2.54) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-curand-cu12 (==10.3.2.106) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cusolver-cu12 (==11.4.5.107) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-cusparse-cu12 (==12.1.0.106) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-nccl-cu12 (==2.18.1) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "nvidia-nvtx-cu12 (==12.1.105) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "triton (==2.1.0) ; platform_system == \"Linux\" and platform_machine == \"x86_64\"",
  "jinja2 ; extra == 'dynamo'",
  "opt-einsum (>=3.3) ; extra == 'opt-einsum'"

@dimbleby
Copy link

https://download.pytorch.org/whl/cu121/torch-2.1.1%2Bcu121-cp310-cp310-linux_x86_64.whl unconditionally contains linux requirements, per python-poetry/poetry#8690 (comment)

Halmoni100 pushed a commit to Halmoni100/pytorch that referenced this issue Nov 25, 2023
* Add pypi required metadata to all wheels except linux (pytorch#111042)

Will fix package after publishing pytorch#100974
Poetry install requires all wheels on pypi to have same metadata. Hence including linux dependencies in all non-linux wheels

Pull Request resolved: pytorch#111042
Approved by: https://github.com/malfet

* Regenerate workflows
@kventinel
Copy link
Contributor

I have similar error with pip install torch==2.1.1:

+ pip3 install nvidia-cuda-nvrtc-cu12==12.1.105
Looking in indexes: https://pypi.yandex-team.ru/simple/
Collecting nvidia-cuda-nvrtc-cu12==12.1.105
  Downloading https://pypi.yandex-team.ru/repo/default/download/nvidia-cuda-nvrtc-cu12/1312846/nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 10.6 MB/s eta 0:00:00
Installing collected packages: nvidia-cuda-nvrtc-cu12
Successfully installed nvidia-cuda-nvrtc-cu12-12.1.105
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
+ pip3 install -f https://download.pytorch.org/whl/cu121 torch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1
Looking in indexes: https://pypi.yandex-team.ru/simple/
Looking in links: https://download.pytorch.org/whl/cu121
Collecting torch==2.1.1
  Downloading https://pypi.yandex-team.ru/repo/default/download/torch/1311602/torch-2.1.1-cp310-cp310-manylinux1_x86_64.whl (670.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 670.2/670.2 MB 39.8 MB/s eta 0:00:00
Collecting torchvision==0.16.1
  Downloading https://pypi.yandex-team.ru/repo/default/download/torchvision/1311661/torchvision-0.16.1-cp310-cp310-manylinux1_x86_64.whl (6.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 6.8/6.8 MB 49.2 MB/s eta 0:00:00
Collecting torchaudio==2.1.1
  Downloading https://pypi.yandex-team.ru/repo/default/download/torchaudio/1311641/torchaudio-2.1.1-cp310-cp310-manylinux1_x86_64.whl (3.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.3/3.3 MB 44.8 MB/s eta 0:00:00
Collecting filelock (from torch==2.1.1)
  Downloading https://pypi.yandex-team.ru/repo/default/download/filelock/1299733/filelock-3.13.1-py3-none-any.whl (11 kB)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch==2.1.1) (4.5.0)
Collecting sympy (from torch==2.1.1)
  Downloading https://pypi.yandex-team.ru/repo/default/download/sympy/1201457/sympy-1.12-py3-none-any.whl (5.7 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.7/5.7 MB 19.2 MB/s eta 0:00:00
Collecting networkx (from torch==2.1.1)
  Downloading https://pypi.yandex-team.ru/repo/default/download/networkx/1299314/networkx-3.2.1-py3-none-any.whl (1.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 19.4 MB/s eta 0:00:00
Collecting jinja2 (from torch==2.1.1)
  Downloading https://pypi.yandex-team.ru/repo/default/download/Jinja2/999427/Jinja2-3.1.2-py3-none-any.whl (133 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.1/133.1 kB 66.6 MB/s eta 0:00:00
Collecting fsspec (from torch==2.1.1)
  Downloading https://pypi.yandex-team.ru/repo/default/download/fsspec/1320536/fsspec-2023.12.0-py3-none-any.whl (168 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 168.9/168.9 kB 75.1 MB/s eta 0:00:00
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch==2.1.1) (12.1.105)
INFO: pip is looking at multiple versions of torch to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement nvidia-cuda-runtime-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64" (from torch) (from versions: none)
ERROR: No matching distribution found for nvidia-cuda-runtime-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64"

@vit-zikmund
Copy link

@kventinel Could you try the default PyPI index? https://pypi.yandex-team.ru/simple/ is hardly the one.

@kventinel
Copy link
Contributor

@vit-zikmund, thanks. It helped me.

lars-reimann added a commit to Safe-DS/Library that referenced this issue Jan 11, 2024
### Summary of Changes

Bump `torch` & `torchvision` to the latest version to fix the issue
described in pytorch/pytorch#100974.
@snewcomer
Copy link

snewcomer commented Mar 23, 2024

I came across this after upgrading from sentence-transformers 2.2.2. A dependency had torch was in the range mentioned and it was removed. pytests failing. Adding torch explicitly solved this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: binaries Anything related to official binaries that we release to users module: regression It used to work, and now it doesn't needs design triage review
Projects
None yet
Development

No branches or pull requests