Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: After taking into account object store and redis memory usage, the amount of memory on this node available for tasks and actors (-0.01 GB) is less than 0% of total. You can adjust these settings with ray.init(memory=<bytes>, object store memory=<bytes> #11966

Open
2 tasks
LianShuaiLong opened this issue Nov 12, 2020 · 21 comments
Labels
core Issues that should be addressed in Ray Core core-ux docs An issue or change related to documentation fix-error-msg This issue has a bad error message that should be improved. P1 Issue that should be fixed within a few weeks question Just a question :)

Comments

@LianShuaiLong
Copy link

LianShuaiLong commented Nov 12, 2020

What is the problem?

when i run ray in ML platform,

ValueError: After taking into account object store and redis memory usage, the amount of memory on this node available for tasks and actors (-0.01 GB) is less than 0% of total. You can adjust these settings with ray.init(memory=<bytes>, object store memory=<bytes>

occurs
can you tell me the approximate value of memory size /object store memory i should set ?
thanks

Ray version and other system information (Python version, TensorFlow version, OS):

Reproduction (REQUIRED)

Please provide a script that can be run to reproduce the issue. The script should have no external library dependencies (i.e., use fake or mock data / environments):

If we cannot run your script, we cannot fix your issue.

  • I have verified my script runs in a clean environment and reproduces the issue.
  • I have verified the issue also occurs with the latest wheels.
@LianShuaiLong LianShuaiLong added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Nov 12, 2020
@rkooo567 rkooo567 added question Just a question :) fix-docs P1 Issue that should be fixed within a few weeks and removed bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) P1 Issue that should be fixed within a few weeks labels Nov 12, 2020
@rkooo567
Copy link
Contributor

This is a great question! Usually these memory should be set automatically by ray. Did you set artificial values when you run ray.init?

@rkooo567
Copy link
Contributor

Also cc @richardliaw

@LianShuaiLong
Copy link
Author

This is a great question! Usually these memory should be set automatically by ray. Did you set artificial values when you run ray.init?

i set num_cpus=16 only

@rkooo567 rkooo567 added the fix-error-msg This issue has a bad error message that should be improved. label Nov 13, 2020
@rkooo567 rkooo567 added this to the Better Error Messages milestone Nov 13, 2020
@rkooo567
Copy link
Contributor

@ericl Do you know what's the recommended setup?

Also, what's your total memory @LianShuaiLong?

@LianShuaiLong
Copy link
Author

@ericl Do you know what's the recommended setup?

Also, what's your total memory @LianShuaiLong?

I run it in Machine Learning Platform, I set 16 CPUS , 8GPUS, 48GB memory for all my training trials

@LianShuaiLong
Copy link
Author

@ericl Do you know what's the recommended setup?
Also, what's your total memory @LianShuaiLong?

I run it in Machine Learning Platform, I set 16 CPUS , 8GPUS, 48GB memory for all my training trials

i find that ray can start successfully when i set no params for ray.init(),why?

@rkooo567
Copy link
Contributor

Hmm, so are you saying

ray.init() # works
ray.init(num_cpus) # doesn't work

?

@LianShuaiLong
Copy link
Author

Hmm, so are you saying

ray.init() # works
ray.init(num_cpus) # doesn't work

?

yeap,it failed when i set num_cpus/_temp_dir/_memory/
by the way i run my experiment in Machine Learning Platform, and it works well on my own computer

@LianShuaiLong
Copy link
Author

since i get this error again, i reopen this issue

@rahulmadanraju
Copy link

Is there a way to get out of this error? I have upgraded the ray to 1.8 as well.. but still shows up with this issue.

On trying for both ray.init() and ray.init(num_cpus) the error remains.

@rkooo567
Copy link
Contributor

What about if you do num_cpus=4 or sth? (Provide a kwarg instead of arg)

@rahulmadanraju
Copy link

rahulmadanraju commented Nov 11, 2021

It's still the same. Initially had the same setup.
The initialization of the ray is on the jupyterlab, which often sets the jupyter kernel to restart the moment it enters the function.remote()

On debugging, it showed with the mentioned error.
ValueError: After taking into account object store and redis memory usage, the amount of memory on this node available for tasks and actors (-0.01 GB) is less than 0% of total. You can adjust these settings with ray.init(memory=, object store memory=)

did an initial checkup with the node and resources. They are as follows:
ray.available_resources()

{'node:172.18.0.24': 1.0,
 'CPU': 4.0,
 'memory': 3957492942.0,
 'object_store_memory': 1978746470.0}

ray.nodes()

[{'NodeID': 'c17c3cfcfee9490830d2e8e1b49ed0c3f021bb7b6a6ac9fd7c0d5aa9',
  'Alive': True,
  'NodeManagerAddress': '172.18.0.24',
  'NodeManagerHostname': '6f7ac0772b5f',
  'NodeManagerPort': 45627,
  'ObjectManagerPort': 36912,
  'ObjectStoreSocketName': '/tmp/ray/session_2021-11-11_10-52-33_830664_27560/sockets/plasma_store',
  'RayletSocketName': '/tmp/ray/session_2021-11-11_10-52-33_830664_27560/sockets/raylet',
  'MetricsExportPort': 58984,
  'alive': True,
  'Resources': {'memory': 3957492942.0,
   'node:172.18.0.24': 1.0,
   'object_store_memory': 1978746470.0,
   'CPU': 4.0}}]

@rkooo567
Copy link
Contributor

Hmm can you also tell me the memory size of the machine/container you runs your jupyter on?

@rahulmadanraju
Copy link

image

@orcahmlee
Copy link
Contributor

This is a great question! Usually these memory should be set automatically by ray. Did you set artificial values when you run ray.init?

Hi @rkooo567,
Could you provide more details about how Ray set the memory automatically?
Does any documentation that describes it, or any other reference I can check?

@bveeramani bveeramani added docs An issue or change related to documentation and removed fix-docs labels May 24, 2022
@scottsun94
Copy link
Contributor

scottsun94 commented Oct 17, 2022

@rkooo567 Bump this.
cc: @jjyao on the documentation feedback.

@chongxiaoc
Copy link

Any documentation about correctly initializing Ray inside a docker container? Hitting same issue here.

@scottsun94
Copy link
Contributor

cc: @DmitriGekhtman

@chongxiaoc
Copy link

Any documentation about correctly initializing Ray inside a docker container? Hitting same issue here.

After upgrading to Ray 2.0, issue is gone on my side.

@jjyao jjyao added the core Issues that should be addressed in Ray Core label Oct 25, 2022
@oscartackstrom
Copy link

Same issue for me with ray 2.0.0. When calling ray.init() in a jupyter notebook from vscode. Sometimes, I instead get an error like Attempting to cap object store memory usage at 58042368 bytes, but the minimum allowed is 78643200 bytes.

@rkooo567
Copy link
Contributor

rkooo567 commented Nov 17, 2022

When you don't specify the object store memory, it uses 20% of available memory. I think your machine doesn't have enough available memory (20% of available memory is even less than 80MB).

You can manually specify object_store_memory to avoid this.

ray.init(object_store_memory=<bytes>)

the minimal you should specify is 78643200.

Better solution is to use an instance that has more available memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Issues that should be addressed in Ray Core core-ux docs An issue or change related to documentation fix-error-msg This issue has a bad error message that should be improved. P1 Issue that should be fixed within a few weeks question Just a question :)
Projects
None yet
Development

No branches or pull requests

10 participants