New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output Killed
with no other information
#1
Comments
Thanks for reporting. If you add the |
Thanks! $ cgdms simulate -i 1CRN.txt -o traj.pdb -s predss -n 1.2e7 -d cpu
Step 1 / 12000000 - acc 0.005 - vel 0.024 - energy -44.03 ( -21.59 -15.59 -6.85 ) - Cα RMSD 32.59
$ python
Python 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:18)
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch;
>>> torch.rand(5, 5, device="cuda")
tensor([[0.9750, 0.8992, 0.7012, 0.1458, 0.7875],
[0.1238, 0.1129, 0.4178, 0.7608, 0.2411],
[0.3505, 0.2031, 0.9376, 0.4649, 0.3073],
[0.5086, 0.2415, 0.9404, 0.9678, 0.4551],
[0.7188, 0.8842, 0.8739, 0.2875, 0.8161]], device='cuda:0') |
I'm on a cluster. I tested a few GPU types and realized that |
Seems like a weird one, possibly related to the CUDA version too. I tried running |
I directly gave it a try with PyTorch v1.12. I had some compatibility issues with A100 and memory issues with V100. With A100: $ cgdms simulate -i 1CRN.txt -o traj.pdb -s predss -n 1.2e7
/miniconda3/envs/cgdms3/lib/python3.9/site-packages/torch/cuda/__init__.py:146: UserWarning:
NVIDIA A100-SXM-80GB with CUDA capability sm_80 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the NVIDIA A100-SXM-80GB GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Killed With V100: $ cgdms simulate -i 1CRN.txt -o traj.pdb -s predss -n 1.2e7
/miniconda3/envs/cgdms3/lib/python3.9/site-packages/cgdms/cgdms.py:532: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /opt/conda/conda-bld/pytorch_1656352616446/work/torch/csrc/utils/tensor_new.cpp:204.)
coords[len(atoms) * i + ai] = torch.tensor(
Killed
$ logout
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=21647000.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: gpu213-14: task 0: Out Of Memory Installing with PyTorch v1.11 now. |
Thanks! With PyTorch v1.11. I can run on V100 with 256GB MEM, but still with the above memory warning.
With A100, I still have the CUDA capability issue. |
Great. The CUDA compatibility issue sounds like a PyTorch issue rather than an issue with this software. Not sure about the memory issue but if it runs okay then it can probably be ignored. |
Hi, Thanks for the excellent work.
I installed the tool and can run the file generation command:
cgdms makeinput -i 1CRN.pdb -s 1CRN.ss2 > 1CRN.txt
But I cannot run the
simulate
command. The output is just one wordKilled
.Could you please take a look and advise how to debug?
Best,
Roden
My conda environment is attached:
The text was updated successfully, but these errors were encountered: