Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis has started but no raylets have registered yet. #8152

Closed
lizelive opened this issue Apr 23, 2020 · 8 comments
Closed

Redis has started but no raylets have registered yet. #8152

lizelive opened this issue Apr 23, 2020 · 8 comments
Labels
question Just a question :) stale The issue is stale. It will be closed within 7 days unless there are further conversation

Comments

@lizelive
Copy link

lizelive commented Apr 23, 2020

I started a ray cluster ray start --head --redis-port=6379
I connect to in from my jupyter notebook using ray.init(address='auto', redis_password='5241590000000000')

Then i try to add an aditional node to the cluster using ray.init(address='10.0.0.4:6379', redis_password='5241590000000000')

But get Redis has started but no raylets have registered yet. See bellow for full log.

It works when started from the command line.
Thanks!

ray 0.8.3, Ubuntu 16.04.6 LTS

Full output:

2020-04-23 20:12:43,199	WARNING worker.py:792 -- When connecting to an existing cluster, _internal_config must match the cluster's _internal_config.
2020-04-23 20:12:43,208	WARNING services.py:183 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2020-04-23 20:12:44,213	WARNING services.py:183 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2020-04-23 20:12:45,218	WARNING services.py:183 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2020-04-23 20:12:46,223	WARNING services.py:183 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2020-04-23 20:12:47,227	WARNING services.py:183 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?


The experiment failed. Finalizing run...
Cleaning up all outstanding Run operations, waiting 300.0 seconds
1 items cleaning up...
Cleanup took 0.17613768577575684 seconds
Starting the daemon thread to refresh tokens in background for process with pid = 138
Traceback (most recent call last):
  File "node.py", line 2, in <module>
    ray.init(address='10.0.0.4:6379', redis_password='5241590000000000')
  File "/azureml-envs/azureml_4f5506ddd0f6ef8dca81d6df2a7ac40d/lib/python3.6/site-packages/ray/worker.py", line 809, in init
    connect_only=True)
  File "/azureml-envs/azureml_4f5506ddd0f6ef8dca81d6df2a7ac40d/lib/python3.6/site-packages/ray/node.py", line 125, in __init__
    redis_password=self.redis_password)
  File "/azureml-envs/azureml_4f5506ddd0f6ef8dca81d6df2a7ac40d/lib/python3.6/site-packages/ray/services.py", line 176, in get_address_info_from_redis
    redis_address, node_ip_address, redis_password=redis_password)
  File "/azureml-envs/azureml_4f5506ddd0f6ef8dca81d6df2a7ac40d/lib/python3.6/site-packages/ray/services.py", line 159, in get_address_info_from_redis_helper
    "Redis has started but no raylets have registered yet.")
RuntimeError: Redis has started but no raylets have registered yet.
@lizelive lizelive added the question Just a question :) label Apr 23, 2020
@lizelive
Copy link
Author

lizelive commented Apr 23, 2020

simlar to #5437 but it works from commandline

@lizelive lizelive reopened this Apr 25, 2020
@maxgillett
Copy link

maxgillett commented May 22, 2020

Any insight into how to fix this? On the worker node, I'm using openconnect/ocproxy to set up a VPN connection that redirects SSH traffic to the head node to another ssh_port port on localhost (openconnect --script-tun --script "ocproxy -L ssh_port:headnode:22" vpnhostname), and tunneling to the head node by running ssh -L 0.0.0.0:redis_port:localhost:redis_port -p ssh_port user@headnode -N -T. I can call ray start --address=localhost:redis_port with no issue, but in both cases (command line and python ray.init call) the worker node doesn't show up in the head's available resources. Unfortunately I can't make the head node a public IP.

It seems the issue involves network address translation (see #5437 (comment)), as the worker node I'm setting up is missing from client_table, but the other worker nodes are included.

@stale
Copy link

stale bot commented Nov 12, 2020

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

  • If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
  • If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 12, 2020
@lizelive
Copy link
Author

is this issue still problem in 1.0?

@stale stale bot removed the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 23, 2020
@Chickenmarkus
Copy link

Chickenmarkus commented Dec 21, 2020

is this issue still problem in 1.0?

Yes, it is.

$ docker run --name head -d -it rayproject/ray ray start --head --block
e452d51da9c0961bc180aefd1dfea2ee9081f2c60f856f1f132101a9d95f690f
$ docker logs head
Local node IP: 172.17.0.2
[...]

$ docker run --rm -it rayproject/ray
(base) root@28a7ebee46f3:/# python -c "import ray; ray.init(address='172.17.0.2:6379')"
2020-12-21 10:10:21,974 INFO worker.py:651 -- Connecting to existing Ray cluster at address: 172.17.0.2:6379
2020-12-21 10:10:21,990 WARNING services.py:202 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2020-12-21 10:10:22,999 WARNING services.py:202 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2020-12-21 10:10:24,009 WARNING services.py:202 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2020-12-21 10:10:25,020 WARNING services.py:202 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
2020-12-21 10:10:26,031 WARNING services.py:202 -- Some processes that the driver needs to connect to have not registered with Redis, so retrying. Have you run 'ray start' on this node?
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/root/anaconda3/lib/python3.7/site-packages/ray/worker.py", line 759, in init
    connect_only=True)
  File "/root/anaconda3/lib/python3.7/site-packages/ray/node.py", line 163, in __init__
    redis_password=self.redis_password))
  File "/root/anaconda3/lib/python3.7/site-packages/ray/_private/services.py", line 195, in get_address_info_from_redis
    redis_address, node_ip_address, redis_password=redis_password)
  File "/root/anaconda3/lib/python3.7/site-packages/ray/_private/services.py", line 178, in get_address_info_from_redis_helper
    "Redis has started but no raylets have registered yet.")
RuntimeError: Redis has started but no raylets have registered yet.

Contrary, the cmdline works:

root@587f2b85b776:/# ray start --address 172.17.0.2:6379
Local node IP: 172.17.0.4
2020-12-21 10:11:27,562 WARNING services.py:1560 -- WARNING: The object store is using /tmp instead of /dev/shm because /dev/shm has only 67108864 bytes available. This may slow down performance! You may be able to free up space by deleting files in /dev/shm or terminating any running plasma_store_server processes. If you are inside a Docker container, you may need to pass an argument with the flag '--shm-size' to 'docker run'.

--------------------
Ray runtime started.
--------------------

To terminate the Ray runtime, run
  ray stop
root@587f2b85b776:/# python -c "import ray; ray.init(address='auto')"
2020-12-21 10:14:23,615 INFO worker.py:651 -- Connecting to existing Ray cluster at address: 172.17.0.2:6379

The version of ray inside the container is 1.0.1.post1.

@stale
Copy link

stale bot commented Apr 20, 2021

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

  • If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
  • If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Apr 20, 2021
@stale
Copy link

stale bot commented May 4, 2021

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

@stale stale bot closed this as completed May 4, 2021
@gefei456
Copy link

I meet the same issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Just a question :) stale The issue is stale. It will be closed within 7 days unless there are further conversation
Projects
None yet
Development

No branches or pull requests

4 participants