- 
                Notifications
    
You must be signed in to change notification settings  - Fork 6.9k
 
Closed
Labels
questionJust a question :)Just a question :)
Description
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): 16.04
 - Ray installed from (source or binary): pip
 - Ray version: 0.7.0
 - Python version: 3.6.7
 - Exact command to reproduce:
 
Describe the problem
I am following private cluster setup instructions, but only head node starts. Few interesting points:
- Seems similar to issue [autoscaler] Workers can silently fail to start. #3408
 - Adding 
initialization_commands: []fixes theKeyErrormentioned in [autoscaler] KeyError when starting private cluster #4559 
Source code / logs
cluster_name: tesq_cluster
min_workers: 48
max_workers: 48
initial_workers: 48
provider:
    type: local
    head_ip: ip1
    worker_ips: [ip2, ip3, ip4]
auth:
    ssh_user: tesq
    ssh_private_key: /home/me/.ssh/keys/local_user
file_mounts: {}
setup_commands: []
initialization_commands: []
head_setup_commands: []
worker_setup_commands: []
head_start_ray_commands:
    - source activate py3_prod && ray stop
    - echo 'I am here' >> /home/tesq/new_file.txt
    - source activate py3_prod && ulimit -c unlimited && ray start --head --redis-port=6379
worker_start_ray_commands:
    - echo 'I am there' >> /home/tesq/new_file.txt
    - source activate py3_prod && ray stop
    - echo 'I am there' >> /home/tesq/new_file.txt
    - source activate py3_prod && ray start --redis-address=ip1:6379
After that only head node starts, and only on the head node I see the created file new_file.txt
Example output of command ray.global_state.client_table()
{'ClientID': 'a7ce937ffcbece9b25a779fa126ba47edef27267',
  'IsInsertion': True,
  'NodeManagerAddress': 'ip1',
  'NodeManagerPort': 45759,
  'ObjectManagerPort': 34107,
  'ObjectStoreSocketName': '/tmp/ray/session_2019-05-30_15-51-46_16481/sockets/plasma_store',
  'RayletSocketName': '/tmp/ray/session_2019-05-30_15-51-46_16481/sockets/raylet',
  'Resources': {'GPU': 3.0, 'CPU': 24.0}},
Update:
Seems very similar to issue #3190
But files monitor.err and monitor.out are empty.
gimzmoe
Metadata
Metadata
Assignees
Labels
questionJust a question :)Just a question :)