Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [Ray Client] [M1] RuntimeError: Starting up Server Failed! with empty logs on M1 chip #18876

Closed
2 tasks done
tas17 opened this issue Sep 24, 2021 · 11 comments
Closed
2 tasks done
Labels
bug Something that is supposed to be working; but isn't core-client ray client related issues P1 Issue that should be fixed within a few weeks QS Quantsight triage label

Comments

@tas17
Copy link

tas17 commented Sep 24, 2021

Search before asking

  • I searched the issues and found no similar issues.

Ray Component

Ray Core

What happened + What you expected to happen

What happened:
While trying a simple ray program, I get the error: RuntimeError: Starting up Server Failed! Check ray_client_server_[port].err on the cluster.

:/# cat tmp/ray/session_2021-09-24_08-09-35_797329_1/logs/ray_client_server.err

INFO:ray.util.client.server.server:Starting Ray Client server on 0.0.0.0:10001
INFO:ray.util.client.server.proxier:New data connection from client a3d9f822957747bb8854e073795c8f99:
ERROR:ray.util.client.server.proxier:SpecificServer startup failed for client: a3d9f822957747bb8854e073795c8f99
INFO:ray.util.client.server.proxier:SpecificServer started on port: 23000 with PID: 125 for client: a3d9f822957747bb8854e073795c8f99
ERROR:ray.util.client.server.proxier:Server startup failed for client: a3d9f822957747bb8854e073795c8f99, using JobConfig: <ray.job_config.JobConfig object at 0x403696e160>!
ERROR:ray.util.client.server.proxier:Timeout waiting for channel for a3d9f822957747bb8854e073795c8f99
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 269, in get_channel
    grpc.channel_ready_future(
  File "/usr/local/lib/python3.8/site-packages/grpc/_utilities.py", line 139, in result
    self._block(timeout)
  File "/usr/local/lib/python3.8/site-packages/grpc/_utilities.py", line 85, in _block
    raise grpc.FutureTimeoutError()
grpc.FutureTimeoutError
WARNING:ray.util.client.server.proxier:Retrying Logstream connection. 1 attempts failed.
ERROR:ray.util.client.server.proxier:Timeout waiting for channel for a3d9f822957747bb8854e073795c8f99
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 269, in get_channel
    grpc.channel_ready_future(
  File "/usr/local/lib/python3.8/site-packages/grpc/_utilities.py", line 139, in result
    self._block(timeout)
  File "/usr/local/lib/python3.8/site-packages/grpc/_utilities.py", line 85, in _block
    raise grpc.FutureTimeoutError()
grpc.FutureTimeoutError
WARNING:ray.util.client.server.proxier:Retrying Logstream connection. 2 attempts failed.
ERROR:ray.util.client.server.proxier:Timeout waiting for channel for a3d9f822957747bb8854e073795c8f99
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/ray/util/client/server/proxier.py", line 269, in get_channel
    grpc.channel_ready_future(
  File "/usr/local/lib/python3.8/site-packages/grpc/_utilities.py", line 139, in result
    self._block(timeout)
  File "/usr/local/lib/python3.8/site-packages/grpc/_utilities.py", line 85, in _block
    raise grpc.FutureTimeoutError()
grpc.FutureTimeoutError
WARNING:ray.util.client.server.proxier:Retrying Logstream connection. 3 attempts failed.
ERROR:ray.util.client.server.proxier:Unable to find channel for client: a3d9f822957747bb8854e073795c8f99
WARNING:ray.util.client.server.proxier:Retrying Logstream connection. 4 attempts failed.
ERROR:ray.util.client.server.proxier:Unable to find channel for client: a3d9f822957747bb8854e073795c8f99
WARNING:ray.util.client.server.proxier:Retrying Logstream connection. 5 attempts failed.

:/# cat tmp/ray/session_2021-09-24_08-09-35_797329_1/logs/ray_client_server_23000.err

INFO:ray.util.client.server.server:Starting Ray Client server on 0.0.0.0:23000
INFO:ray.util.client.server.server:25 idle checks before shutdown.
INFO:ray.util.client.server.server:20 idle checks before shutdown.
INFO:ray.util.client.server.server:15 idle checks before shutdown.
INFO:ray.util.client.server.server:10 idle checks before shutdown.
INFO:ray.util.client.server.server:5 idle checks before shutdown.

What I expected:
No particular log.

Reproduction script

Reproduce:

docker-compose build
docker-compose up
python main.py

On a MacBook Air with M1 chip

docker --version
Docker version 20.10.5, build 55c4c88

docker-compose --version
docker-compose version 1.29.0, build 07737305

python --version
Python 3.9.7

pip freeze

attrs==21.2.0
click==8.0.1
filelock==3.0.12
grpcio==1.40.0
msgpack==1.0.2
numpy==1.21.2
protobuf==3.18.0
PyYAML==5.4.1
ray==1.6.0
redis==3.5.3
six==1.16.0

Dockerfile:

FROM python:3.9.7

RUN pip install ray==1.6.0

docker-compose.yaml:

version: "3.5"

services:
  rayhead:
    platform: "linux/amd64"
    build: .
    shm_size: '2gb'
    entrypoint: [ '/usr/local/bin/ray']
    command: ['start', '--head', '--address=127.0.0.1:6379', '--redis-password=truc', '--block', '--node-ip-address=127.0.0.1']
    ports:
      - "8265:8265"
      - "6379:6379"
      - "10001:10001"

  ray-worker:
    platform: "linux/amd64"
    build: .
    shm_size: '2gb'
    entrypoint: [ '/usr/local/bin/ray']
    command: ['start', '--address=rayhead:6379', '--redis-password=truc', '--block']
    depends_on:
      - "rayhead"

main.py:

import ray

ray.init("ray://127.0.0.1:10001")


@ray.remote
def hello():
    print("hello on remote")
    return "hello"


r = hello.remote()
print(ray.get(r))

Anything else

Works on a mac without M1 chip

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@tas17 tas17 added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Sep 24, 2021
@tas17 tas17 changed the title [Bug] Cannot start ray on my MacBook Air with M1 chip [Bug] RuntimeError: Starting up Server Failed! with empty logs on M1 chip Sep 24, 2021
@tas17
Copy link
Author

tas17 commented Sep 24, 2021

@gjoliver
Copy link
Member

Ray does not work on ARM chip yet.
Dev support on M1 is being worked on. You can follow this issue: #16621

@tas17
Copy link
Author

tas17 commented Sep 24, 2021

Thank you for your answer.

Is it relevant to add that I was using docker with rosetta, which from what I understand should make it like I was not using M1 chip ?

@gjoliver gjoliver reopened this Sep 24, 2021
@gjoliver gjoliver added m1 and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Sep 24, 2021
@gjoliver
Copy link
Member

oh, ok, let's track this issue then.
your testing script is extremely simple, so those translated binaries probably aren't working at all.
the more issues we see on M1, the more effort there will be to prioritize the support.

@stale
Copy link

stale bot commented Jan 22, 2022

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

  • If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
  • If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Jan 22, 2022
@stale
Copy link

stale bot commented Feb 7, 2022

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

@stale stale bot closed this as completed Feb 7, 2022
@wjrforcyber
Copy link
Contributor

Saw this exact same issue again, how's everything going...

@architkulkarni
Copy link
Contributor

architkulkarni commented Oct 21, 2022

@wjrforcyber could you please post the script you used to reproduce the error? Is it the same script in the original post in this issue?

@stale stale bot removed stale The issue is stale. It will be closed within 7 days unless there are further conversation labels Oct 21, 2022
@wjrforcyber
Copy link
Contributor

@wjrforcyber could you please post the script you used to reproduce the error? Is it the same script in the original post in this issue?

Yes, my error has exactly the same log in the original post, I think you have already seen my reply in #19792. Thanks for reopening this.

@architkulkarni architkulkarni changed the title [Bug] RuntimeError: Starting up Server Failed! with empty logs on M1 chip [Bug] [Ray Client] [M1] RuntimeError: Starting up Server Failed! with empty logs on M1 chip Oct 24, 2022
@architkulkarni architkulkarni added core-client ray client related issues triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Oct 24, 2022
@hora-anyscale hora-anyscale added QS Quantsight triage label P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Dec 14, 2022
@mattip
Copy link
Contributor

mattip commented Mar 29, 2023

There are arm64 wheels on pypi for ray, so pip install ray should work. Could you try them out?

@anyscalesam
Copy link
Collaborator

Closing since no response; @tas17 please reopen if you still run into issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't core-client ray client related issues P1 Issue that should be fixed within a few weeks QS Quantsight triage label
Projects
None yet
Development

No branches or pull requests

7 participants