- 
                Notifications
    
You must be signed in to change notification settings  - Fork 6.9k
 
Closed
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn't
Description
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux 16
 - Ray installed from (source or binary): Source
 - Ray version: master
 - Python version: 3.6.6
 - Exact command to reproduce:
 
ray.global_state.available_resources() on a small cluster
Describe the problem
In [8]: ray.global_state.available_resources()
hangs after a node is removed.
Source code / logs
...
  'NodeManagerAddress': '169.229.49.172',
  'NodeManagerPort': 38033,
  'ObjectManagerPort': 42375,
  'ObjectStoreSocketName': '/tmp/plasma_store65771381',
  'RayletSocketName': '/tmp/raylet8912751',
  'Resources': {'GPU': 0.0, 'CPU': 2.0}},
 {'ClientID': 'ed085bb78046ccbc423fb6587ab1fcbf838514da',
  'IsInsertion': True,
  'NodeManagerAddress': '169.229.49.173',
  'NodeManagerPort': 32821,
  'ObjectManagerPort': 34913,
  'ObjectStoreSocketName': '/tmp/plasma_store60872177',
  'RayletSocketName': '/tmp/raylet24528210',
  'Resources': {'GPU': 0.0, 'CPU': 2.0}}]
In [7]: The node with client ID ed085bb78046ccbc423fb6587ab1fcbf838514da has been marked dead because the monitor has missed too many heartbeats from it.
In [7]: ray.global_state.client_table()
Out[7]:
[{'ClientID': 'c9f0b409ca553018762fadba26de1dffa4338478',
  'IsInsertion': True,
  'NodeManagerAddress': '169.229.49.172',
  'NodeManagerPort': 38033,
  'ObjectManagerPort': 42375,
  'ObjectStoreSocketName': '/tmp/plasma_store65771381',
  'RayletSocketName': '/tmp/raylet8912751',
  'Resources': {'GPU': 0.0, 'CPU': 2.0}},
 {'ClientID': 'ed085bb78046ccbc423fb6587ab1fcbf838514da',
  'IsInsertion': True,
  'NodeManagerAddress': '169.229.49.173',
  'NodeManagerPort': 32821,
  'ObjectManagerPort': 34913,
  'ObjectStoreSocketName': '/tmp/plasma_store60872177',
  'RayletSocketName': '/tmp/raylet24528210',
  'Resources': {'GPU': 0.0, 'CPU': 2.0}},
 {'ClientID': 'ed085bb78046ccbc423fb6587ab1fcbf838514da',
  'IsInsertion': False,
  'NodeManagerAddress': '',
  'NodeManagerPort': 0,
  'ObjectManagerPort': 0,
  'ObjectStoreSocketName': '',
  'RayletSocketName': '',
  'Resources': {}}]
In [8]: ray.global_state.available_resources()
...
Metadata
Metadata
Assignees
Labels
bugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn't