Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Come up with a reconnect/failover mechanism #9

Closed
michaelosthege opened this issue Aug 5, 2022 · 0 comments
Closed

Come up with a reconnect/failover mechanism #9

michaelosthege opened this issue Aug 5, 2022 · 0 comments
Labels
question Further information is requested

Comments

@michaelosthege
Copy link
Owner

michaelosthege commented Aug 5, 2022

This is the relevant traceback when the server disconnects while streaming:

  File "...\aesara_federated\common.py", line 131, in evaluate
    logp, *gradients = self._client.evaluate(*inputs, use_stream=use_stream)
  File "...\aesara_federated\service.py", line 203, in evaluate
    output = loop.run_until_complete(eval_task)
  File "...\aefenv\lib\asyncio\base_events.py", line 646, in run_until_complete
    return future.result()
  File "...\aesara_federated\service.py", line 219, in _streamed_evaluate
    response = await self._lazy_stream.recv_message()
  File ...\aefenv\lib\site-packages\grpclib\client.py", line 427, in recv_message
    with self._wrapper:
  File "...\aefenv\lib\site-packages\grpclib\utils.py", line 70, in __exit__
    raise self._error
grpclib.exceptions.StreamTerminatedError: Connection lost

And this is the error when each request is sent as an independent message:

  File "...\aesara_federated\common.py", line 131, in evaluate
    logp, *gradients = self._client.evaluate(*inputs, use_stream=use_stream)
  File "...\aesara_federated\service.py", line 203, in evaluate
    output = loop.run_until_complete(eval_task)
  File "...\aefenv\lib\asyncio\base_events.py", line 646, in run_until_complete
    return future.result()
  File "...\aesara_federated\rpc.py", line 54, in evaluate
    return await self._unary_unary(
  File "...\aefenv\lib\site-packages\betterproto\grpc\grpclib_client.py", line 85, in _unary_unary
    response = await stream.recv_message()
  File "...\aefenv\lib\site-packages\grpclib\client.py", line 425, in recv_message
    await self.recv_initial_metadata()
  File "...\aefenv\lib\site-packages\grpclib\client.py", line 367, in recv_initial_metadata
    with self._wrapper:
  File "...\aefenv\lib\site-packages\grpclib\utils.py", line 70, in __exit__
    raise self._error
grpclib.exceptions.StreamTerminatedError: Connection lost

⚠ Note that with the demo example, use_stream=True takes 40 seconds for the parallelized MCMC sampling while use_stream=False takes 51 seconds.

@michaelosthege michaelosthege added the question Further information is requested label Aug 5, 2022
@michaelosthege michaelosthege changed the title Diagnose what happens to the gRPC client when the server disconnects Come up with a reconnect/failover mechanism Aug 5, 2022
michaelosthege added a commit that referenced this issue Sep 11, 2022
michaelosthege added a commit that referenced this issue Sep 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant