Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A problem occurred while I was running the Alphafold #646

Closed
jhyeonv opened this issue Nov 29, 2022 · 6 comments
Closed

A problem occurred while I was running the Alphafold #646

jhyeonv opened this issue Nov 29, 2022 · 6 comments
Labels

Comments

@jhyeonv
Copy link

jhyeonv commented Nov 29, 2022

Hi.
A problem occurred while I was running the Alphafold.
Could I ask for help on how to solve it?
Please check the command below and if you need any information about that plase let me know.

jhyeon@jhyeon-Ubuntu:~/Desktop/data/SW/alphafold$ sudo python3 docker/run_docker.py --fasta_paths=T.fa --max_template_date=2021-12-31 --data_dir=/mnt/8THDD/data/db/AFDB/ --model_preset=monomer --output_dir=/home/jhyeon/Desktop/data/alphatest
[sudo] password for jhyeon:
I1129 20:06:44.587526 140678175391744 run_docker.py:113] Mounting /home/jhyeon/Desktop/data/SW/alphafold -> /mnt/fasta_path_0
I1129 20:06:44.998342 140678175391744 run_docker.py:113] Mounting /mnt/8THDD/data/db/AFDB/uniref90 -> /mnt/uniref90_database_path
I1129 20:06:45.015172 140678175391744 run_docker.py:113] Mounting /mnt/8THDD/data/db/AFDB/mgnify -> /mnt/mgnify_database_path
I1129 20:06:45.015434 140678175391744 run_docker.py:113] Mounting /mnt/8THDD/data/db/AFDB -> /mnt/data_dir
I1129 20:06:45.039951 140678175391744 run_docker.py:113] Mounting /mnt/8THDD/data/db/AFDB/pdb_mmcif/mmcif_files -> /mnt/template_mmcif_dir
I1129 20:06:45.040283 140678175391744 run_docker.py:113] Mounting /mnt/8THDD/data/db/AFDB/pdb_mmcif -> /mnt/obsolete_pdbs_path
I1129 20:06:45.058208 140678175391744 run_docker.py:113] Mounting /mnt/8THDD/data/db/AFDB/pdb70 -> /mnt/pdb70_database_path
I1129 20:06:45.099748 140678175391744 run_docker.py:113] Mounting /mnt/8THDD/data/db/AFDB/uniclust30/uniclust30_2018_08 -> /mnt/uniclust30_database_path
I1129 20:06:45.100375 140678175391744 run_docker.py:113] Mounting /mnt/8THDD/data/db/AFDB/bfd -> /mnt/bfd_database_path
I1129 20:06:47.216925 140678175391744 run_docker.py:255] /opt/conda/lib/python3.7/site-packages/haiku/_src/data_structures.py:37: FutureWarning: jax.tree_structure is deprecated, and will be removed in a future release. Use jax.tree_util.tree_structure instead.
I1129 20:06:47.217015 140678175391744 run_docker.py:255] PyTreeDef = type(jax.tree_structure(None))
I1129 20:06:47.668879 140678175391744 run_docker.py:255] I1129 11:06:47.668288 140554574919488 templates.py:857] Using precomputed obsolete pdbs /mnt/obsolete_pdbs_path/obsolete.dat.
I1129 20:06:50.314509 140678175391744 run_docker.py:255] I1129 11:06:50.314045 140554574919488 xla_bridge.py:353] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker:
I1129 20:06:50.395502 140678175391744 run_docker.py:255] I1129 11:06:50.394992 140554574919488 xla_bridge.py:353] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: Host CUDA Interpreter
I1129 20:06:50.395585 140678175391744 run_docker.py:255] I1129 11:06:50.395253 140554574919488 xla_bridge.py:353] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
I1129 20:06:50.395616 140678175391744 run_docker.py:255] I1129 11:06:50.395327 140554574919488 xla_bridge.py:353] Unable to initialize backend 'plugin': xla_extension has no attributes named get_plugin_device_client. Compile TensorFlow with //tensorflow/compiler/xla/python:enable_plugin_device set to true (defaults to false) to enable this.
I1129 20:07:01.872005 140678175391744 run_docker.py:255] I1129 11:07:01.871499 140554574919488 run_alphafold.py:377] Have 5 models: ['model_1_pred_0', 'model_2_pred_0', 'model_3_pred_0', 'model_4_pred_0', 'model_5_pred_0']
I1129 20:07:01.872115 140678175391744 run_docker.py:255] I1129 11:07:01.871589 140554574919488 run_alphafold.py:393] Using random seed 776916752095554131 for the data pipeline
I1129 20:07:01.872142 140678175391744 run_docker.py:255] I1129 11:07:01.871697 140554574919488 run_alphafold.py:161] Predicting T
I1129 20:07:01.872171 140678175391744 run_docker.py:255] I1129 11:07:01.871899 140554574919488 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmpyb0_qtih/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/T.fa /mnt/uniref90_database_path/uniref90.fasta"
I1129 20:07:01.908833 140678175391744 run_docker.py:255] I1129 11:07:01.908337 140554574919488 utils.py:36] Started Jackhmmer (uniref90.fasta) query
I1129 20:17:08.123620 140678175391744 run_docker.py:255] I1129 11:17:08.122851 140554574919488 utils.py:40] Finished Jackhmmer (uniref90.fasta) query in 606.214 seconds
I1129 20:17:08.124002 140678175391744 run_docker.py:255] I1129 11:17:08.123401 140554574919488 jackhmmer.py:133] Launching subprocess "/usr/bin/jackhmmer -o /dev/null -A /tmp/tmp3h2swg2s/output.sto --noali --F1 0.0005 --F2 5e-05 --F3 5e-07 --incE 0.0001 -E 0.0001 --cpu 8 -N 1 /mnt/fasta_path_0/T.fa /mnt/mgnify_database_path/mgy_clusters_2018_12.fa"
I1129 20:17:08.153515 140678175391744 run_docker.py:255] I1129 11:17:08.152948 140554574919488 utils.py:36] Started Jackhmmer (mgy_clusters_2018_12.fa) query
I1129 20:25:33.106717 140678175391744 run_docker.py:255] I1129 11:25:33.105955 140554574919488 utils.py:40] Finished Jackhmmer (mgy_clusters_2018_12.fa) query in 504.953 seconds
I1129 20:25:33.107178 140678175391744 run_docker.py:255] I1129 11:25:33.106705 140554574919488 hhsearch.py:85] Launching subprocess "/usr/bin/hhsearch -i /tmp/tmpi641el8l/query.a3m -o /tmp/tmpi641el8l/output.hhr -maxseq 1000000 -d /mnt/pdb70_database_path/pdb70"
I1129 20:25:33.138576 140678175391744 run_docker.py:255] I1129 11:25:33.138024 140554574919488 utils.py:36] Started HHsearch query
I1129 20:26:54.573565 140678175391744 run_docker.py:255] I1129 11:26:54.573027 140554574919488 utils.py:40] Finished HHsearch query in 81.435 seconds
I1129 20:26:54.582471 140678175391744 run_docker.py:255] I1129 11:26:54.581876 140554574919488 hhblits.py:128] Launching subprocess "/usr/bin/hhblits -i /mnt/fasta_path_0/T.fa -cpu 4 -oa3m /tmp/tmp0adsap7y/output.a3m -o /dev/null -n 3 -e 0.001 -maxseq 1000000 -realign_max 100000 -maxfilt 100000 -min_prefilter_hits 1000 -d /mnt/bfd_database_path/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt -d /mnt/uniclust30_database_path/uniclust30_2018_08"
I1129 20:26:54.614019 140678175391744 run_docker.py:255] I1129 11:26:54.613526 140554574919488 utils.py:36] Started HHblits query
I1129 20:39:47.502100 140678175391744 run_docker.py:255] I1129 11:39:47.501456 140554574919488 utils.py:40] Finished HHblits query in 772.888 seconds
I1129 20:39:47.511255 140678175391744 run_docker.py:255] I1129 11:39:47.510757 140554574919488 templates.py:878] Searching for template for: MAAAAAAAAAAAAAAAAAAAAAAAAA
I1129 20:39:49.477572 140678175391744 run_docker.py:255] I1129 11:39:49.476987 140554574919488 templates.py:268] Found an exact template match 4d10_F.
I1129 20:39:49.481890 140678175391744 run_docker.py:255] I1129 11:39:49.481489 140554574919488 templates.py:913] Skipped invalid hit 4D10_F COP9 SIGNALOSOME COMPLEX SUBUNIT 1; SIGNALING PROTEIN; 3.8A {HOMO SAPIENS}, error: None, warning: 4d10_F (sum_probs: 0.0, rank: 1): feature extracting errors: Template all atom mask was all zeros: 4d10_F. Residue range: 4-13, mmCIF parsing errors: {}
I1129 20:39:54.642663 140678175391744 run_docker.py:255] I1129 11:39:54.642101 140554574919488 templates.py:268] Found an exact template match 4v7h_BQ.
I1129 20:39:56.860277 140678175391744 run_docker.py:255] I1129 11:39:56.859714 140554574919488 templates.py:268] Found an exact template match 6j3y_W.
I1129 20:39:56.861339 140678175391744 run_docker.py:255] I1129 11:39:56.860958 140554574919488 templates.py:913] Skipped invalid hit 6J3Y_W Photosystem II reaction center protein; Photosystem, ELECTRON TRANSPORT; HET: LMG, HEM, DGD, SQD, LMU, BCR, CLA, OEX, PHO, PL9, LHG, A86; 3.3A {Chaetoceros gracilis}, error: None, warning: 6j3y_W (sum_probs: 0.0, rank: 3): feature extracting errors: Template all atom mask was all zeros: 6j3y_W. Residue range: 0-18, mmCIF parsing errors: {}
I1129 20:39:59.248605 140678175391744 run_docker.py:255] I1129 11:39:59.247956 140554574919488 templates.py:268] Found an exact template match 6j3z_w.
I1129 20:39:59.249619 140678175391744 run_docker.py:255] I1129 11:39:59.249154 140554574919488 templates.py:913] Skipped invalid hit 6J3Z_w Photosystem II reaction center protein; Photosystem, ELECTRON TRANSPORT; HET: LMG, HEM, DGD, SQD, LMU, BCR, CLA, OEX, PHO, PL9, LHG, A86; 3.6A {Chaetoceros gracilis}, error: None, warning: 6j3z_w (sum_probs: 0.0, rank: 4): feature extracting errors: Template all atom mask was all zeros: 6j3z_w. Residue range: 0-18, mmCIF parsing errors: {}
I1129 20:39:59.369462 140678175391744 run_docker.py:255] I1129 11:39:59.369046 140554574919488 templates.py:268] Found an exact template match 1m0u_B.
I1129 20:39:59.372590 140678175391744 run_docker.py:255] I1129 11:39:59.372143 140554574919488 templates.py:913] Skipped invalid hit 1M0U_B GST2 gene product (E.C.2.5.1.18); GST, Flight Muscle Protein, Sigma; HET: SO4, GSH; 1.75A {Drosophila melanogaster} SCOP: a.45.1.1, c.47.1.5, error: None, warning: 1m0u_B (sum_probs: 0.0, rank: 5): feature extracting errors: Template all atom mask was all zeros: 1m0u_B. Residue range: 0-23, mmCIF parsing errors: {}
I1129 20:39:59.973728 140678175391744 run_docker.py:255] I1129 11:39:59.973092 140554574919488 templates.py:268] Found an exact template match 1kn7_A.
I1129 20:40:02.077867 140678175391744 run_docker.py:255] I1129 11:40:02.077314 140554574919488 templates.py:268] Found an exact template match 6rfq_8.
I1129 20:40:04.344768 140678175391744 run_docker.py:255] I1129 11:40:04.336282 140554574919488 templates.py:268] Found an exact template match 6rfr_8.
I1129 20:40:08.787640 140678175391744 run_docker.py:255] I1129 11:40:08.786539 140554574919488 templates.py:268] Found an exact template match 6t59_s3.
I1129 20:40:08.790601 140678175391744 run_docker.py:255] I1129 11:40:08.790089 140554574919488 templates.py:913] Skipped invalid hit 6T59_s3 Ribosomal protein L8, uL3, uL4; TUBULIN, nascent chain-associated complex, ribosome-nascent; HET: MG; 3.11A {Oryctolagus cuniculus}, error: None, warning: 6t59_s3 (sum_probs: 0.0, rank: 9): feature extracting errors: Template all atom mask was all zeros: 6t59_s3. Residue range: 273-297, mmCIF parsing errors: {}
I1129 20:40:09.027597 140678175391744 run_docker.py:255] I1129 11:40:09.027103 140554574919488 templates.py:268] Found an exact template match 3lpj_B.
I1129 20:40:09.033292 140678175391744 run_docker.py:255] I1129 11:40:09.032789 140554574919488 templates.py:913] Skipped invalid hit 3LPJ_B Structure of BACE Bound to; Alzheimer's, Aspartyl protease, Hydrolase; HET: TLA, Z75; 1.79A {Homo sapiens}, error: None, warning: 3lpj_B (sum_probs: 0.0, rank: 10): feature extracting errors: Template all atom mask was all zeros: 3lpj_B. Residue range: 0-24, mmCIF parsing errors: {}
I1129 20:40:09.033350 140678175391744 run_docker.py:255] I1129 11:40:09.033004 140554574919488 pipeline.py:234] Uniref90 MSA size: 1 sequences.
I1129 20:40:09.033376 140678175391744 run_docker.py:255] I1129 11:40:09.033061 140554574919488 pipeline.py:235] BFD MSA size: 1 sequences.
I1129 20:40:09.033398 140678175391744 run_docker.py:255] I1129 11:40:09.033080 140554574919488 pipeline.py:236] MGnify MSA size: 1 sequences.
I1129 20:40:09.033423 140678175391744 run_docker.py:255] I1129 11:40:09.033095 140554574919488 pipeline.py:238] Final (deduplicated) MSA size: 1 sequences.
I1129 20:40:09.033465 140678175391744 run_docker.py:255] I1129 11:40:09.033219 140554574919488 pipeline.py:241] Total number of templates (NB: this can include bad templates and is later filtered to top 4): 4.
I1129 20:40:09.041668 140678175391744 run_docker.py:255] I1129 11:40:09.041206 140554574919488 run_alphafold.py:190] Running model model_1_pred_0 on T
I1129 20:40:10.478550 140678175391744 run_docker.py:255] I1129 11:40:10.478168 140554574919488 model.py:166] Running predict with shape(feat) = {'aatype': (4, 26), 'residue_index': (4, 26), 'seq_length': (4,), 'template_aatype': (4, 4, 26), 'template_all_atom_masks': (4, 4, 26, 37), 'template_all_atom_positions': (4, 4, 26, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 26), 'msa_mask': (4, 508, 26), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 26, 3), 'template_pseudo_beta_mask': (4, 4, 26), 'atom14_atom_exists': (4, 26, 14), 'residx_atom14_to_atom37': (4, 26, 14), 'residx_atom37_to_atom14': (4, 26, 37), 'atom37_atom_exists': (4, 26, 37), 'extra_msa': (4, 5120, 26), 'extra_msa_mask': (4, 5120, 26), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 26), 'true_msa': (4, 508, 26), 'extra_has_deletion': (4, 5120, 26), 'extra_deletion_value': (4, 5120, 26), 'msa_feat': (4, 508, 26, 49), 'target_feat': (4, 26, 22)}
I1129 20:40:10.573517 140678175391744 run_docker.py:255] 2022-11-29 11:40:10.572368: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.9
I1129 20:40:10.573782 140678175391744 run_docker.py:255] 2022-11-29 11:40:10.572425: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas
I1129 20:40:10.586611 140678175391744 run_docker.py:255] 2022-11-29 11:40:10.585689: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:628] failed to get PTX kernel "shift_right_logical" from module: CUDA_ERROR_NOT_FOUND: named symbol not found
I1129 20:40:10.586843 140678175391744 run_docker.py:255] 2022-11-29 11:40:10.585779: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: INTERNAL: Could not find the corresponding function
I1129 20:40:10.594201 140678175391744 run_docker.py:255] Traceback (most recent call last):
I1129 20:40:10.594406 140678175391744 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 422, in <module>
I1129 20:40:10.594515 140678175391744 run_docker.py:255] app.run(main)
I1129 20:40:10.594610 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312, in run
I1129 20:40:10.594703 140678175391744 run_docker.py:255] _run_main(main, args)
I1129 20:40:10.594786 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258, in _run_main
I1129 20:40:10.594868 140678175391744 run_docker.py:255] sys.exit(main(argv))
I1129 20:40:10.594947 140678175391744 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 406, in main
I1129 20:40:10.595050 140678175391744 run_docker.py:255] random_seed=random_seed)
I1129 20:40:10.595142 140678175391744 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 199, in predict_structure
I1129 20:40:10.595224 140678175391744 run_docker.py:255] random_seed=model_random_seed)
I1129 20:40:10.595304 140678175391744 run_docker.py:255] File "/app/alphafold/alphafold/model/model.py", line 167, in predict
I1129 20:40:10.595379 140678175391744 run_docker.py:255] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
I1129 20:40:10.595466 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/random.py", line 132, in PRNGKey
I1129 20:40:10.595538 140678175391744 run_docker.py:255] key = prng.seed_with_impl(impl, seed)
I1129 20:40:10.595611 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/prng.py", line 267, in seed_with_impl
I1129 20:40:10.595727 140678175391744 run_docker.py:255] return random_seed(seed, impl=impl)
I1129 20:40:10.595801 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/prng.py", line 580, in random_seed
I1129 20:40:10.595872 140678175391744 run_docker.py:255] return random_seed_p.bind(seeds_arr, impl=impl)
I1129 20:40:10.595945 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 329, in bind
I1129 20:40:10.596016 140678175391744 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params)
I1129 20:40:10.596088 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 332, in bind_with_trace
I1129 20:40:10.596163 140678175391744 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I1129 20:40:10.596238 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 712, in process_primitive
I1129 20:40:10.596311 140678175391744 run_docker.py:255] return primitive.impl(*tracers, **params)
I1129 20:40:10.596384 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/prng.py", line 592, in random_seed_impl
I1129 20:40:10.596455 140678175391744 run_docker.py:255] base_arr = random_seed_impl_base(seeds, impl=impl)
I1129 20:40:10.596517 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/prng.py", line 597, in random_seed_impl_base
I1129 20:40:10.596581 140678175391744 run_docker.py:255] return seed(seeds)
I1129 20:40:10.596646 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/prng.py", line 832, in threefry_seed
I1129 20:40:10.596710 140678175391744 run_docker.py:255] lax.shift_right_logical(seed, lax_internal._const(seed, 32)))
I1129 20:40:10.596774 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/lax/lax.py", line 515, in shift_right_logical
I1129 20:40:10.596839 140678175391744 run_docker.py:255] return shift_right_logical_p.bind(x, y)
I1129 20:40:10.596904 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 329, in bind
I1129 20:40:10.596969 140678175391744 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params)
I1129 20:40:10.597032 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 332, in bind_with_trace
I1129 20:40:10.597097 140678175391744 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I1129 20:40:10.597160 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 712, in process_primitive
I1129 20:40:10.597218 140678175391744 run_docker.py:255] return primitive.impl(*tracers, **params)
I1129 20:40:10.597279 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/dispatch.py", line 115, in apply_primitive
I1129 20:40:10.597339 140678175391744 run_docker.py:255] return compiled_fun(*args)
I1129 20:40:10.597402 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/dispatch.py", line 200, in <lambda>
I1129 20:40:10.597465 140678175391744 run_docker.py:255] return lambda *args, **kw: compiled(*args, **kw)[0]
I1129 20:40:10.597525 140678175391744 run_docker.py:255] File "/opt/conda/lib/python3.7/site-packages/jax/_src/dispatch.py", line 895, in _execute_compiled
I1129 20:40:10.597588 140678175391744 run_docker.py:255] out_flat = compiled.execute(in_flat)
I1129 20:40:10.597648 140678175391744 run_docker.py:255] jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Could not find the corresponding function
@peterdfields
Copy link

I got this same error since CUDA moved to v.12. I think there will need to be an update on the side of the developer team here or a downgrade in system drivers.

@joshabramson
Copy link
Collaborator

joshabramson commented Dec 23, 2022

does this error persist when using AlphaFold v2.3.0?

@peterdfields
Copy link

@joshabramson I think so. I did a fresh build with the newest dockerfile. Here's the output where the error picks up:

I1223 20:04:51.789108 139688861775680 run_docker.py:255] I1224 01:04:51.788743 139711970813760 run_alphafold.py:191] Running model model_1_pred_0 on ECU03_1140
I1223 20:04:54.797782 139688861775680 run_docker.py:255] I1224 01:04:54.797097 139711970813760 model.py:165] Running predict with shape(feat) = {'aatype': (4, 117), 'residue_index': (4, 117), 'seq_length': (4,), 'template_aatype': (4, 4, 117), 'template_all_atom_masks': (4, 4, 117, 37), 'template_all_atom_positions': (4, 4, 117, 37, 3), 'template_sum_probs': (4, 4, 1), 'is_distillation': (4,), 'seq_mask': (4, 117), 'msa_mask': (4, 508, 117), 'msa_row_mask': (4, 508), 'random_crop_to_size_seed': (4, 2), 'template_mask': (4, 4), 'template_pseudo_beta': (4, 4, 117, 3), 'template_pseudo_beta_mask': (4, 4, 117), 'atom14_atom_exists': (4, 117, 14), 'residx_atom14_to_atom37': (4, 117, 14), 'residx_atom37_to_atom14': (4, 117, 37), 'atom37_atom_exists': (4, 117, 37), 'extra_msa': (4, 5120, 117), 'extra_msa_mask': (4, 5120, 117), 'extra_msa_row_mask': (4, 5120), 'bert_mask': (4, 508, 117), 'true_msa': (4, 508, 117), 'extra_has_deletion': (4, 5120, 117), 'extra_deletion_value': (4, 5120, 117), 'msa_feat': (4, 508, 117, 49), 'target_feat': (4, 117, 22)}
I1223 20:04:56.327672 139688861775680 run_docker.py:255] 2022-12-24 01:04:56.327116: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:231] Falling back to the CUDA driver for PTX compilation; ptxas does not support CC 8.9
I1223 20:04:56.327813 139688861775680 run_docker.py:255] 2022-12-24 01:04:56.327156: W external/org_tensorflow/tensorflow/compiler/xla/stream_executor/gpu/asm_compiler.cc:234] Used ptxas at ptxas
I1223 20:04:56.355589 139688861775680 run_docker.py:255] 2022-12-24 01:04:56.355236: E external/org_tensorflow/tensorflow/compiler/xla/stream_executor/cuda/cuda_driver.cc:628] failed to get PTX kernel "shift_right_logical" from module: CUDA_ERROR_NOT_FOUND: named symbol not found
I1223 20:04:56.355756 139688861775680 run_docker.py:255] 2022-12-24 01:04:56.355273: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2153] Execution of replica 0 failed: INTERNAL: Could not find the corresponding function
I1223 20:04:56.480186 139688861775680 run_docker.py:255] Traceback (most recent call last):
I1223 20:04:56.480370 139688861775680 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 432, in <module>
I1223 20:04:56.480440 139688861775680 run_docker.py:255] app.run(main)
I1223 20:04:56.480502 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 312, in run
I1223 20:04:56.480558 139688861775680 run_docker.py:255] _run_main(main, args)
I1223 20:04:56.480613 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
I1223 20:04:56.480718 139688861775680 run_docker.py:255] sys.exit(main(argv))
I1223 20:04:56.480788 139688861775680 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 408, in main
I1223 20:04:56.480844 139688861775680 run_docker.py:255] predict_structure(
I1223 20:04:56.480898 139688861775680 run_docker.py:255] File "/app/alphafold/run_alphafold.py", line 199, in predict_structure
I1223 20:04:56.480951 139688861775680 run_docker.py:255] prediction_result = model_runner.predict(processed_feature_dict,
I1223 20:04:56.481002 139688861775680 run_docker.py:255] File "/app/alphafold/alphafold/model/model.py", line 167, in predict
I1223 20:04:56.481052 139688861775680 run_docker.py:255] result = self.apply(self.params, jax.random.PRNGKey(random_seed), feat)
I1223 20:04:56.481102 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/random.py", line 132, in PRNGKey
I1223 20:04:56.481152 139688861775680 run_docker.py:255] key = prng.seed_with_impl(impl, seed)
I1223 20:04:56.481202 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 267, in seed_with_impl
I1223 20:04:56.481253 139688861775680 run_docker.py:255] return random_seed(seed, impl=impl)
I1223 20:04:56.481304 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 580, in random_seed
I1223 20:04:56.481354 139688861775680 run_docker.py:255] return random_seed_p.bind(seeds_arr, impl=impl)
I1223 20:04:56.481404 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 329, in bind
I1223 20:04:56.481456 139688861775680 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params)
I1223 20:04:56.481508 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace
I1223 20:04:56.481559 139688861775680 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I1223 20:04:56.481609 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive
I1223 20:04:56.481659 139688861775680 run_docker.py:255] return primitive.impl(*tracers, **params)
I1223 20:04:56.481709 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 592, in random_seed_impl
I1223 20:04:56.481759 139688861775680 run_docker.py:255] base_arr = random_seed_impl_base(seeds, impl=impl)
I1223 20:04:56.481808 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 597, in random_seed_impl_base
I1223 20:04:56.481858 139688861775680 run_docker.py:255] return seed(seeds)
I1223 20:04:56.481911 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/prng.py", line 832, in threefry_seed
I1223 20:04:56.481950 139688861775680 run_docker.py:255] lax.shift_right_logical(seed, lax_internal._const(seed, 32)))
I1223 20:04:56.481987 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 515, in shift_right_logical
I1223 20:04:56.482024 139688861775680 run_docker.py:255] return shift_right_logical_p.bind(x, y)
I1223 20:04:56.482062 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 329, in bind
I1223 20:04:56.482100 139688861775680 run_docker.py:255] return self.bind_with_trace(find_top_trace(args), args, params)
I1223 20:04:56.482138 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 332, in bind_with_trace
I1223 20:04:56.482177 139688861775680 run_docker.py:255] out = trace.process_primitive(self, map(trace.full_raise, args), params)
I1223 20:04:56.482217 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/core.py", line 712, in process_primitive
I1223 20:04:56.482254 139688861775680 run_docker.py:255] return primitive.impl(*tracers, **params)
I1223 20:04:56.482291 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 115, in apply_primitive
I1223 20:04:56.482333 139688861775680 run_docker.py:255] return compiled_fun(*args)
I1223 20:04:56.482372 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 200, in <lambda>
I1223 20:04:56.482409 139688861775680 run_docker.py:255] return lambda *args, **kw: compiled(*args, **kw)[0]
I1223 20:04:56.482447 139688861775680 run_docker.py:255] File "/opt/conda/lib/python3.8/site-packages/jax/_src/dispatch.py", line 895, in _execute_compiled
I1223 20:04:56.482484 139688861775680 run_docker.py:255] out_flat = compiled.execute(in_flat)
I1223 20:04:56.482520 139688861775680 run_docker.py:255] jaxlib.xla_extension.XlaRuntimeError: INTERNAL: Could not find the corresponding function

@peterdfields
Copy link

@joshabramson I was able to get alphafold to run by substituting nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 into the dockerfile.

@joshabramson
Copy link
Collaborator

closing this for now as it sounds like there is a workaround and it doesn't seem to be affecting all users.

@HanLiii
Copy link

HanLiii commented Aug 15, 2023

For 4090 machine, you need to change the followings in dockfile:

ARG CUDA=11.1.1------->ARG CUDA=11.8.0
FROM nvidia/cuda:${CUDA}-cudnn8-runtime-ubuntu18.04------->FROM nvidia/cuda:${CUDA}-cudnn8-devel-ubuntu20.04

Then rebuild.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants