Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to install esmfold for A100 on cuda 10.1 #594

Closed
linda5mith opened this issue Jul 27, 2023 · 3 comments
Closed

How to install esmfold for A100 on cuda 10.1 #594

linda5mith opened this issue Jul 27, 2023 · 3 comments

Comments

@linda5mith
Copy link

linda5mith commented Jul 27, 2023

Can someone please help me to install esmfold?

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

nvidia-smi
Thu Jul 27 15:17:42 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A100-SXM...  Off  | 00000000:00:07.0 Off |                    0 |
| N/A   32C    P0    55W / 500W |      7MiB / 81920MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       975      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

This is what I'm doing:

  1. conda create --name esmfold python=3.7
  2. conda activate esmfold
    Installing cuda compatible version of pytorch for cuda 10.1:
  3. pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html
  4. pip install "fair-esm[esmfold]"
Attempting uninstall: torch
    Found existing installation: torch 1.9.0+cu102
    Uninstalling torch-1.9.0+cu102:
      Successfully uninstalled torch-1.9.0+cu102
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchaudio 0.9.0 requires torch==1.9.0, but you have torch 2.0.1 which is incompatible.
torchvision 0.10.0+cu102 requires torch==1.9.0, but you have torch 2.0.1 which is incompatible.
  1. pip install 'dllogger @ git+https://github.com/NVIDIA/dllogger.git'
  2. pip install 'openfold @ git+https://github.com/aqlaboratory/openfold.git@4b41059694619831a7db195b7e0988fc4ff3a307'
RuntimeError:
      The detected CUDA version (10.1) mismatches the version that was used to compile
      PyTorch (11.7). Please make sure to use the same CUDA versions.
      
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> openfold

Ok so I try to resolve the torch cuda version conflict by running step 4 again:
8. pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

Attempting uninstall: torch
    Found existing installation: torch 1.13.1
    Uninstalling torch-1.13.1:
      Successfully uninstalled torch-1.13.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
pytorch-lightning 1.9.5 requires torch>=1.10.0, but you have torch 1.9.0+cu111 which is incompatible.
Successfully installed torch-1.9.0+cu111
  1. pip install pytorch_lightning==1.5.10
  2. pip install 'openfold @ git+https://github.com/aqlaboratory/openfold.git@4b41059694619831a7db195b7e0988fc4ff3a307'
File "/home/administrator/programs/miniconda3/envs/esmfold/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1682, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> openfold

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Woudl really appreciate any help. I'm losing my mind. Thanks so much!

@evgeni-nikolaev-bio
Copy link

evgeni-nikolaev-bio commented Jul 31, 2023 via email

@imSeaton
Copy link

same problem with you when executing pip install 'openfold @ git+https://github.com/aqlaboratory/openfold.git@4b41059694619831a7db195b7e0988fc4ff3a307'

@linda5mith
Copy link
Author

linda5mith commented Sep 16, 2023

Hey so there are error lines in the output which are important that I didn't include which were relevant to my issue:

/data0/miniconda3/envs/esmfold/lib/python3.7/site-packages/torch/utils/cpp_extension.py:813: UserWarning: The detected CUDA version (11.5) has a minor version mismatch with the version that was used to compile PyTorch (11.3). Most likely this shouldn't be a problem.

warnings.warn(CUDA_MISMATCH_WARN.format(cuda_str_version, torch.version.cuda))
/data0/miniconda3/envs/esmfold/lib/python3.7/site-packages/torch/utils/cpp_extension.py:820: UserWarning: There are no g++ version bounds defined for CUDA version 11.5

Ultimately my issue was resolved by downgrading my gcc++ version by following this tutorial. My working configuration:

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0

gcc --version
gcc (Ubuntu 10.5.0-1ubuntu1~22.04) 10.5.0
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I followed this table to find CUDA driver and gcc ++ version compatability.

Hope this helps :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants