You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
In ray master, If a actor created with max_restarts=-1 is restarting, call actor method will raise exception instead of pending in caller, which make ray worker failover not work.
To Reproduce
To help us reproducing this bug, please provide information below:
Your Python version: 3.7.9
The version of Mars you use: master
Versions of crucial packages, such as ray, numpy, scipy and pandas: ray master
Full stack of the error.
Minimized code to reproduce the error.
import ray
import time
@ray.remote(max_restarts=-1)
class A:
def __init__(self):
if ray.get_runtime_context().was_current_actor_reconstructed:
import os
os._exit(-1)
print(ray.get_runtime_context().was_current_actor_reconstructed)
time.sleep(3)
def f(self):
return 1
def f1():
time.sleep(30)
a = A.remote()
print(ray.get(a.f.remote()))
r = a.f1.remote()
ray.kill(a, no_restart=False)
try:
ray.get(r)
except Exception:
pass
print(ray.get(a.f.remote()))
Expected behavior
This can be fixed by specifying max_retries=-1 when call actro methods
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Describe the bug
In ray master, If a actor created with
max_restarts=-1
is restarting, call actor method will raise exception instead of pending in caller, which make ray worker failover not work.To Reproduce
To help us reproducing this bug, please provide information below:
Expected behavior
This can be fixed by specifying max_retries=-1 when call actro methods
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: