Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apptainer (singualrity) container fails: Can't find ptxas binary #4

Closed
hseara opened this issue Aug 9, 2022 · 10 comments
Closed

apptainer (singualrity) container fails: Can't find ptxas binary #4

hseara opened this issue Aug 9, 2022 · 10 comments

Comments

@hseara
Copy link

hseara commented Aug 9, 2022

Does anybody know how to fix this?

Version 2.2.2

2022-08-09 17:46:30.176360: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:75] Can't find ptxas binary in ${CUDA_DIR}/bin.  Custom ptxas location can be specified using $PATH.
2022-08-09 17:46:30.176464: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:76] Searched for CUDA in the following directories:
2022-08-09 17:46:30.176483: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79]   /opt/conda/lib/python3.7/site-packages/jaxlib/cuda
2022-08-09 17:46:30.176496: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79]   /usr/local/cuda-11.1
2022-08-09 17:46:30.176508: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79]   /usr/local/cuda
2022-08-09 17:46:30.176519: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:79]   .
2022-08-09 17:46:30.176529: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:81] You can choose the search directory by setting xla_gpu_cuda_data_dir in HloModule's DebugOptions.  For most apps, setting the environment variable XLA_FLAGS=--xla_gpu_cuda_data_dir=/path/to/cuda will work.
2022-08-09 17:46:30.176541: F external/org_tensorflow/tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:452] Can't find ptxas binary.  You can pass the flag --xla_gpu_unsafe_fallback_to_driver_on_ptxas_not_found to use the GPU driver for compiling ptx instead. However this option is discouraged and can lead to increased memory consumptions and other subtle runtime issues.
Fatal Python error: Aborted

Thread 0x00007fe858733740 (most recent call first):
  File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 360 in backend_compile
  File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 743 in _xla_callable
  File "/opt/conda/lib/python3.7/site-packages/jax/linear_util.py", line 262 in memoized_fun
  File "/opt/conda/lib/python3.7/site-packages/jax/interpreters/xla.py", line 593 in _xla_call_impl
  File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 606 in process_call
  File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 1563 in process
  File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 1551 in call_bind
  File "/opt/conda/lib/python3.7/site-packages/jax/core.py", line 1560 in bind
  File "/opt/conda/lib/python3.7/site-packages/jax/_src/api.py", line 427 in cache_miss
  File "/opt/conda/lib/python3.7/site-packages/jax/_src/traceback_util.py", line 183 in reraise_with_filtered_traceback
  File "/app/alphafold/alphafold/model/model.py", line 167 in predict
  File "/app/alphafold/run_alphafold.py", line 199 in predict_structure
  File "/app/alphafold/run_alphafold.py", line 406 in main
  File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 258 in _run_main
  File "/opt/conda/lib/python3.7/site-packages/absl/app.py", line 312 in run
  File "/app/alphafold/run_alphafold.py", line 422 in <module>
INFO:    Cleaning up image...
@hseara hseara changed the title contaoiner fails: Can't find ptxas binary apptainer (singualrity) container fails: Can't find ptxas binary Aug 9, 2022
@prehensilecode
Copy link
Owner

@hseara Can you provide more detail? What did you try to do that produced that error message?

@hseara
Copy link
Author

hseara commented Aug 10, 2022

I was just executing alphafold with the following command:

python /hpcg/local/soft/af2/alphafold-2.2.2/singularity/run_singularity.py --data_dir=/hpcg/local/soft/af2/db/ --fasta_paths=complete_construct.fasta --max_template_date=2022-11-01

I even obtained the following relaxed and unrelaxed models:

msas
features.pkl
result_model_1_pred_0.pkl
result_model_2_pred_0.pkl
result_model_3_pred_0.pkl
result_model_4_pred_0.pkl
relaxed_model_1_pred_0.pdb
relaxed_model_2_pred_0.pdb
relaxed_model_3_pred_0.pdb  
relaxed_model_4_pred_0.pdb
unrelaxed_model_1_pred_0.pdb
unrelaxed_model_2_pred_0.pdb
unrelaxed_model_3_pred_0.pdb
unrelaxed_model_4_pred_0.pdb

Any ideas why it failed then?

@prehensilecode
Copy link
Owner

Sorry, no idea. Are you able to try the Docker version? I don't have access to Docker, just Podman.

I am trying to determine if it is specifically an Singularity issue, or if it is generically a container issue that also affects Docker. The Singularity image was built almost exactly like the Docker image, as far as NVIDIA drivers and dev toolkit go.

If it is something that affects both container formats, then you should ask at https://github.com/deepmind/alphafold

@hseara
Copy link
Author

hseara commented Aug 10, 2022

I am using almalinux 9. I have only access to podman, and somehow, I did not manage with the GPU support. This is why I moved to singularity. Therefore, I cannot tell if it is singularity specific or something else in my system that would also arise in docker. If I manage with docker somehow I will post again.

@prehensilecode
Copy link
Owner

Possibly related: google/trax#249

I'll have to look closer at the Singularity recipe.

@prehensilecode
Copy link
Owner

Can you try using this branch? https://github.com/prehensilecode/alphafold_singularity/tree/cuda-path

@prehensilecode
Copy link
Owner

Otherwise, tell me how to reproduce your case, including where you get the file complete_construct.fasta

@prehensilecode
Copy link
Owner

Can you try using this branch? https://github.com/prehensilecode/alphafold_singularity/tree/cuda-path

New recipe which adds the proper CUDA path seems to work:

$  singularity shell -C alphafold.sif
Singularity> which ptxas
/usr/local/cuda-11.1/bin/ptxas

@hseara
Copy link
Author

hseara commented Aug 16, 2022

Hi, sorry for the delayed answer. I was on vacation.

I have run the branch you suggested, and now it works. I am now running a bunch of fastas, to be sure. I will report back if I see the same behavior. Otherwise, the issue is solved for me.

Will you commit the changes in the branch, or shall we use the branch in the future?

@prehensilecode
Copy link
Owner

I made a new release that includes the fix: https://github.com/prehensilecode/alphafold_singularity/releases/tag/v2.2.2-1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants