-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ray Aborted on a cluster managed by slurm #14426
Comments
Can you show us logs inside |
@rkooo567 [2021-03-02 06:37:44,220 E 53311 53360] dlmalloc.cc:112: mmap failed with error: Cannot allocate memory And the following log, when using slurm sbatch: [2021-03-02 06:41:44,781 E 621 655] dlmalloc.cc:112: mmap failed with error: Cannot allocate memory |
Dear @rkooo567 I find where was the problem. In fact, as i suspect in my first post, that, it may have a a relation with slurm, since I am using a cluster. The error was in the command line to submit a job :
The program worked fine without any problem. As a result, it looks like in a cluster managed by Slurm, we have to go through it for ray to work properly. But the problem remain in the case where we want to launch a small program without submitting a job to Slurm. |
Hi, I'm a bot from the Ray team :) To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months. If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel. |
What is the problem?
ray is Aborted just after the function ray.init()
Sys: Linux cedar1.cedar.computecanada.ca 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC 2020 x86_64 GNU/Linux
Python 3.8
ray version : '1.1.0'
Reproduction (REQUIRED)
note:
It may have relation with slurm, since it is a cluster. But according to the cluster information, they say that we can execute small program without using the job scheduler.
edit: I used slurm to lunch my program as a job, but I get the same problem
my script is as follow:
The text was updated successfully, but these errors were encountered: