Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alphafold2 doesn't work on RTX4090? #775

Open
Faezov opened this issue Jun 5, 2023 · 4 comments
Open

alphafold2 doesn't work on RTX4090? #775

Faezov opened this issue Jun 5, 2023 · 4 comments

Comments

@Faezov
Copy link

Faezov commented Jun 5, 2023

I followed everything every step of the instruction

I0605 11:20:06.397100 140157971002496 run_docker.py:235] Traceback (most recent call last):
I0605 11:20:06.397124 140157971002496 run_docker.py:235] File "/app/alphafold/run_alphafold.py", line 459, in
I0605 11:20:06.397146 140157971002496 run_docker.py:235] app.run(main)
I0605 11:20:06.397182 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
I0605 11:20:06.397204 140157971002496 run_docker.py:235] _run_main(main, args)
I0605 11:20:06.397226 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
I0605 11:20:06.397248 140157971002496 run_docker.py:235] sys.exit(main(argv))
I0605 11:20:06.397271 140157971002496 run_docker.py:235] File "/app/alphafold/run_alphafold.py", line 444, in main
I0605 11:20:06.397292 140157971002496 run_docker.py:235] models_to_relax=FLAGS.models_to_relax)
I0605 11:20:06.397313 140157971002496 run_docker.py:235] File "/app/alphafold/run_alphafold.py", line 222, in predict_structure
I0605 11:20:06.397335 140157971002496 run_docker.py:235] random_seed=model_random_seed)
I0605 11:20:06.397356 140157971002496 run_docker.py:235] File "/app/alphafold/alphafold/model/model.py", line 167, in predict
I0605 11:20:06.397378 140157971002496 run_docker.py:235] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
I0605 11:20:06.397400 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/jax/_src/random.py", line 132, in PRNGKey
I0605 11:20:06.397438 140157971002496 run_docker.py:235] key = prng.seed_with_impl(impl, seed)
I0605 11:20:06.397461 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/jax/_src/prng.py", line 267, in seed_with_impl
I0605 11:20:06.397483 140157971002496 run_docker.py:235] return random_seed(seed, impl=impl)
I0605 11:20:06.397505 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/jax/_src/prng.py", line 580, in random_seed
I0605 11:20:06.397528 140157971002496 run_docker.py:235] return random_seed_p.bind(seeds_arr, impl=impl)
I0605 11:20:06.397550 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 329, in bind
I0605 11:20:06.397573 140157971002496 run_docker.py:235] return self.bind_with_trace(find_top_trace(args), args, params)
I0605 11:20:06.397596 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 332, in bind_with_trace
I0605 11:20:06.397617 140157971002496 run_docker.py:235] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I0605 11:20:06.397639 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 712, in process_primitive
I0605 11:20:06.397660 140157971002496 run_docker.py:235] return primitive.impl(*tracers, **params)
I0605 11:20:06.397680 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/jax/_src/prng.py", line 592, in random_seed_impl
I0605 11:20:06.397703 140157971002496 run_docker.py:235] base_arr = random_seed_impl_base(seeds, impl=impl)
I0605 11:20:06.397723 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/jax/_src/prng.py", line 597, in random_seed_impl_base
I0605 11:20:06.397745 140157971002496 run_docker.py:235] return seed(seeds)
I0605 11:20:06.397766 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/jax/_src/prng.py", line 832, in threefry_seed
I0605 11:20:06.397787 140157971002496 run_docker.py:235] lax.shift_right_logical(seed, lax_internal._const(seed, 32)))
I0605 11:20:06.397808 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/jax/_src/lax/lax.py", line 515, in shift_right_logical
I0605 11:20:06.397829 140157971002496 run_docker.py:235] return shift_right_logical_p.bind(x, y)
I0605 11:20:06.397851 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 329, in bind
I0605 11:20:06.397871 140157971002496 run_docker.py:235] return self.bind_with_trace(find_top_trace(args), args, params)
I0605 11:20:06.397893 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 332, in bind_with_trace
I0605 11:20:06.397914 140157971002496 run_docker.py:235] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I0605 11:20:06.397935 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 712, in process_primitive
I0605 11:20:06.397955 140157971002496 run_docker.py:235] return primitive.impl(*tracers, **params)
I0605 11:20:06.397976 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/jax/_src/dispatch.py", line 115, in apply_primitive
I0605 11:20:06.397998 140157971002496 run_docker.py:235] return compiled_fun(*args)
I0605 11:20:06.398021 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/jax/_src/dispatch.py", line 200, in
I0605 11:20:06.398042 140157971002496 run_docker.py:235] return lambda *args, **kw: compiled(*args, **kw)[0]
I0605 11:20:06.398062 140157971002496 run_docker.py:235] File "/opt/conda/lib/python3.7/site-packages/jax/_src/dispatch.py", line 895, in _execute_compiled
I0605 11:20:06.398084 140157971002496 run_docker.py:235] out_flat = compiled.execute(in_flat)
I0605 11:20:06.398105 140157971002496 run_docker.py:235] jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Could not find the corresponding function

and this is my docker image:
(af2) bulat@AuroraR15:~/apps/AF230multimer/alphafold$ docker image inspect alphafold
[
{
"Id": "sha256:fcb14e253899d17d14b0b094c03db3262352ac946df6e6ccd75d8feed624fecf",
"RepoTags": [
"alphafold:latest"
],
"RepoDigests": [],
"Parent": "",
"Comment": "buildkit.dockerfile.v0",
"Created": "2023-06-05T11:11:19.751298916-04:00",
"Container": "",
"ContainerConfig": {
"Hostname": "",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": null,
"Cmd": null,
"Image": "",
"Volumes": null,
"WorkingDir": "",
"Entrypoint": null,
"OnBuild": null,
"Labels": null
},
"DockerVersion": "",
"Author": "",
"Config": {
"Hostname": "",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"NVARCH=x86_64",
"NVIDIA_REQUIRE_CUDA=cuda>=11.1 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=450,driver<451",
"NV_CUDA_CUDART_VERSION=11.1.74-1",
"NV_CUDA_COMPAT_PACKAGE=cuda-compat-11-1",
"CUDA_VERSION=11.1.1",
"LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64",
"NVIDIA_VISIBLE_DEVICES=all",
"NVIDIA_DRIVER_CAPABILITIES=compute,utility",
"NV_CUDA_LIB_VERSION=11.1.1-1",
"NV_NVTX_VERSION=11.1.74-1",
"NV_LIBNPP_VERSION=11.1.2.301-1",
"NV_LIBNPP_PACKAGE=libnpp-11-1=11.1.2.301-1",
"NV_LIBCUSPARSE_VERSION=11.3.0.10-1",
"NV_LIBCUBLAS_PACKAGE_NAME=libcublas-11-1",
"NV_LIBCUBLAS_VERSION=11.3.0.106-1",
"NV_LIBCUBLAS_PACKAGE=libcublas-11-1=11.3.0.106-1",
"NV_LIBNCCL_PACKAGE_NAME=libnccl2",
"NV_LIBNCCL_PACKAGE_VERSION=2.8.4-1",
"NCCL_VERSION=2.8.4-1",
"NV_LIBNCCL_PACKAGE=libnccl2=2.8.4-1+cuda11.1",
"NV_CUDNN_VERSION=8.0.5.39",
"NV_CUDNN_PACKAGE_NAME=libcudnn8",
"NV_CUDNN_PACKAGE=libcudnn8=8.0.5.39-1+cuda11.1"
],
"Cmd": null,
"Image": "",
"Volumes": null,
"WorkingDir": "/app/alphafold",
"Entrypoint": [
"/app/run_alphafold.sh"
],
"OnBuild": null,
"Labels": {
"com.nvidia.cudnn.version": "8.0.5.39",
"maintainer": "NVIDIA CORPORATION cudatools@nvidia.com"
},
"Shell": [
"/bin/bash",
"-o",
"pipefail",
"-c"
]
},
"Architecture": "amd64",
"Os": "linux",
"Size": 9924197318,
"VirtualSize": 9924197318,
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/docker/overlay2/sbxmecbsjt11l3qyccji7l5kf/diff:/var/lib/docker/overlay2/u5kuh6plaibllh9jpuomdyjhn/diff:/var/lib/docker/overlay2/t49o4ey3s5b7ekg1zxidppxbb/diff:/var/lib/docker/overlay2/xmtjesixrlc1mg14cowazbk0m/diff:/var/lib/docker/overlay2/irr673954hnwg04pfp20vq8lz/diff:/var/lib/docker/overlay2/l0vuaq33gpdbn7fzulqngf2kx/diff:/var/lib/docker/overlay2/suhsvcnbzjxh1peozvhn58noz/diff:/var/lib/docker/overlay2/50elvoyrfa07nd835wravqkbl/diff:/var/lib/docker/overlay2/kbmsqxhwpwyqmsj27pvb3ak43/diff:/var/lib/docker/overlay2/y266fduk6jzb7dd3sf1v7dbf9/diff:/var/lib/docker/overlay2/u15ulcsr1gdp7ok28cek67cg0/diff:/var/lib/docker/overlay2/16801172f93f69d9e0d740f62371ca7ba0bee729769a731b7d33c914651ae460/diff:/var/lib/docker/overlay2/f02078794c77293e590608047d0febff6f7dcf4d51e2404ea61e44c46b614017/diff:/var/lib/docker/overlay2/e8d3910161e3e31d142e6f458ea98a78e95fd34053da8a611aaf3c93e54e3289/diff:/var/lib/docker/overlay2/e1c4a091c1795ba8d134f7eaa59e83aff45105578a8b71f04d128753d1b57211/diff:/var/lib/docker/overlay2/691a7bdc38fb271d19270bfc90e903dcf73e8793819ece86b2aeba8d02df91a7/diff:/var/lib/docker/overlay2/0f852a4c5ac28097307f18e42b9c6967d35d5aa322290ab59fb34080b7b2e918/diff:/var/lib/docker/overlay2/7e7f687fede356633f48844f4f09572d730a8e21b2814b1bd00dae3974ee2821/diff:/var/lib/docker/overlay2/2a568af471f4f6a2bca78c0002c18d4baf5379e5838e8aaa8422761d5766e69e/diff",
"MergedDir": "/var/lib/docker/overlay2/whe6wf7mkchy8fvyt3m34yuc2/merged",
"UpperDir": "/var/lib/docker/overlay2/whe6wf7mkchy8fvyt3m34yuc2/diff",
"WorkDir": "/var/lib/docker/overlay2/whe6wf7mkchy8fvyt3m34yuc2/work"
},
"Name": "overlay2"
},
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:69f57fbceb1b420d7e4697e0f6514887b0805ee0059bea7d51e0a832962e74bf",
"sha256:d93d28fd98962ad87ccb1d4120928fbc75b263a3a02680e467c5dcf034165f3b",
"sha256:7f8e22e7b3728674b67db632eaf322f0f5d1d6f88fd81929b274deb0035271c2",
"sha256:e3c807b397c6b872bf574144054b276295027ec33db5398de56b7549831d1704",
"sha256:809a77dc6665ed2de39af508af0d277f5ceb071163811ec602f5a1fb069da095",
"sha256:b8f2664c41aaa22295786369c92b2e984d5273317a26989c6b3d84113deed13e",
"sha256:acfe97fde47e2d82ac4b727956c6ce0a12974c444a5473df70520136735e8c70",
"sha256:4bae95ca3a263e1388cb9bcdf4f5317ea17d0d384b3ac1b5cdb7f5018eaed8b9",
"sha256:5bac3e58e979d44d37323869e39ac1bd347e944ce2903f82842f4133db0071fd",
"sha256:98189b3865160391663b7d3450e56acd391d6fe1ecfad2d79503b71a7a9b27cd",
"sha256:d0d440ba7ef2b09bfa35770d274b4564b823912273c5722dc6eae58e7bcd71d3",
"sha256:b36ed72887ac02f3c772e6ac2267e28c57bdec280674ae64fff2ff54f8cc7946",
"sha256:a7e9e4c7cd5e0e43cfeee24a85ab8194aefd7f0f8b2bf04fd6fd88b2ee89a37f",
"sha256:b8074f4e7d22b866fcd09d60c5262b6b0f7f691d592c98fadf9514d9cf8b9839",
"sha256:a827e81751a7b3c72c87fdf4a160a746085ca77ae10fc4307a0d0b5d03d0bc54",
"sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef",
"sha256:756e5317e1d484ad1e77862430ec8150875a1e49ef66bc6f3b6b9a8523e85c7a",
"sha256:213a79d7456b950ae34c364a88ceb3c6600354cb852fdc605d59528e34dc71c1",
"sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef",
"sha256:f9f35834bc449ef3071cb17822df1748888e683c98a484372859f0f4984c184a"
]
},
"Metadata": {
"LastTagTime": "0001-01-01T00:00:00Z"
}
}
]

@pcrady
Copy link

pcrady commented Jun 23, 2023

I'm having a similar issue. I think its something to do with the cuda version and the jax/jaxlib versions. I've been trying different combinations but i keep getting errors.

@crshin
Copy link

crshin commented Jun 23, 2023

I also use RTX4090. My driver version is 520.61.05, and the CUDA version is 11.8.
In my case, I solved the problem by editing the Dockerfile in docker folder.
Here are the changes I made:
$ ARG CUDA=11.1.1
$ FROM nvidia/cuda:${CUDA}-cudnn8-runtime-ubuntu18.04
to
$ ARG CUDA=11.8.0
$ FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04

After making these changes, I rebuilt the Docker images and run alphafold.

You might find these issues helpful as well:
#764
#646

I'm a newbie on GitHub and coding so I don't know why this method lead to solve, but I solved it anyway.
I hope this helps you solve the problem too.

@ChengkuiZhao
Copy link

I also use RTX4090. My driver version is 520.61.05, and the CUDA version is 11.8. In my case, I solved the problem by editing the Dockerfile in docker folder. Here are the changes I made: $ ARG CUDA=11.1.1 $ FROM nvidia/cuda:${CUDA}-cudnn8-runtime-ubuntu18.04 to $ ARG CUDA=11.8.0 $ FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04

After making these changes, I rebuilt the Docker images and run alphafold.

You might find these issues helpful as well: #764 #646

I'm a newbie on GitHub and coding so I don't know why this method lead to solve, but I solved it anyway. I hope this helps you solve the problem too.

I am having the same problem. And trying different cuda combination. Hope this will work.

@ChengkuiZhao
Copy link

I also use RTX4090. My driver version is 520.61.05, and the CUDA version is 11.8. In my case, I solved the problem by editing the Dockerfile in docker folder. Here are the changes I made: $ ARG CUDA=11.1.1 $ FROM nvidia/cuda:${CUDA}-cudnn8-runtime-ubuntu18.04 to $ ARG CUDA=11.8.0 $ FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04

After making these changes, I rebuilt the Docker images and run alphafold.

You might find these issues helpful as well: #764 #646

I'm a newbie on GitHub and coding so I don't know why this method lead to solve, but I solved it anyway. I hope this helps you solve the problem too.

THX, BRO! I finally made it with your advice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants