Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] test_detached_actor does not handle messages from sockets properly #617

Closed
2 tasks done
kevin85421 opened this issue Oct 5, 2022 · 4 comments · Fixed by #619
Closed
2 tasks done

[Bug] test_detached_actor does not handle messages from sockets properly #617

kevin85421 opened this issue Oct 5, 2022 · 4 comments · Fixed by #619
Assignees
Labels
bug Something isn't working

Comments

@kevin85421
Copy link
Member

Search before asking

  • I searched the issues and found no similar issues.

KubeRay Component

ci

What happened + What you expected to happen

s._sock.sendall(b'''
def get_detached_actor():
return ray.get_actor("a")
a = retry_with_timeout(get_detached_actor)
def get_new_value():
return ray.get(a.ready.remote())
res2 = retry_with_timeout(get_new_value)
if res1 != res2:
print('successful: {} {}'.format(res1, res2))
sys.exit(0)
else:
print('failed: {} {}'.format(res1, res2))
raise Exception('failed')
''')
count = 0
while count < 90:
try:
buf = s._sock.recv(4096)
logger.info(buf.decode())
if buf.decode().find('successful') != -1:
break
if buf.decode().find('failed') != -1:
raise Exception('test failed {}'.format(buf.decode()))
except Exception as e:
pass
time.sleep(1)
count += 1
if count >= 90:
raise Exception('failed to run script')

L351 checks whether the received message includes "successful" or not. If the message includes "successful", it will leave the loop. Ideally, we expected the "successful" string is produced by L339. However, the received message is "Unable to connect to GCS at 10.244.0.8:6379. Check that (1) Ray GCS with matching version started successfully at the specified address, and (2) there is no firewall setting preventing access"

The following log is printed by L350.

INFO:__main__:Unable to connect to GCS at 10.244.0.8:6379. Check that (1) Ray GCS with matching version started successfully at the specified address, and (2) there is no firewall setting preventing access.

Reproduction script

python3 tests/compatibility-test.py RayFTTestCase.test_detached_actor 2>&1 | tee log

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@kevin85421 kevin85421 added the bug Something isn't working label Oct 5, 2022
@kevin85421 kevin85421 self-assigned this Oct 5, 2022
@kevin85421
Copy link
Member Author

cc @DmitriGekhtman

@DmitriGekhtman
Copy link
Collaborator

cc @wilsonwang371

@wilsonwang371
Copy link
Collaborator

good catch! I can make some change or are you planing to make a patch? @kevin85421

@kevin85421
Copy link
Member Author

@wilsonwang371 I plan to make a patch. I hope to learn more about the KubeRay E2E tests by fixing this issue. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants