Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconnect disconnected channels automatically #11154

Merged
merged 1 commit into from May 22, 2017

Conversation

@kpayson64
Copy link
Contributor

commented May 16, 2017

Submission pending on #10813

The memory leak described in that poll request is likely the cause of #9461,
and without a fix this will introduce that leak to all channels.

@nathanielmanistaatgoogle
Copy link
Member

left a comment

Just to double-check, because I think it might be slightly amazing: you've confirmed that the test fails in the absence of the non-test portion of this change?

src/python/grpcio/grpc/_channel.py Outdated
@@ -926,6 +926,11 @@ def __init__(self, target, options, credentials):
self._call_state = _ChannelCallState(self._channel)
self._connectivity_state = _ChannelConnectivityState(self._channel)

# Temporary work around UNAVAILABLE issues

This comment has been minimized.

Copy link
@nathanielmanistaatgoogle

nathanielmanistaatgoogle May 16, 2017

Member

This should be a TODO (possibly a TODO(https://github.com/grpc/grpc/issues/11043)?).

This comment has been minimized.

Copy link
@kpayson64

kpayson64 May 16, 2017

Author Contributor

I think we can close out #11043 after this change.
#9884 is the "Make a proper fix in c-core"

This comment has been minimized.

Copy link
@nathanielmanistaatgoogle

nathanielmanistaatgoogle May 16, 2017

Member

I don't think this change is sufficient to close #11043 because #11043 is a wrapped-languages issue and this is a Python-only change.

#9884 might be the right issue for scoping the temporary-ness of this workaround; I think I'm a little thrown by the way it's being called "retries" and being described as a feature when I keep encountering it as "not trying hard enough the first time" and thinking of it as a bug.

src/python/grpcio/grpc/_channel.py Outdated
@@ -969,4 +974,5 @@ def stream_stream(self,
_common.encode(method), request_serializer, response_deserializer)

def __del__(self):
_unsubscribe(self._connectivity_state, self._dummy_callback)

This comment has been minimized.

Copy link
@nathanielmanistaatgoogle

nathanielmanistaatgoogle May 16, 2017

Member

How certain are you that this is necessary? What does this do that the _moot call on the next line doesn't also do? If we can eliminate this call, we can also eliminate the maintenance of the dummy callback as an instance field, right?

This comment has been minimized.

Copy link
@kpayson64

kpayson64 May 16, 2017

Author Contributor

Removed, thanks.

src/python/grpcio_tests/tests/unit/_reconnect_test.py Outdated
from tests.unit.framework.common import test_constants

_REQUEST = b'\x00\x00\x00'
_RESPONSE = b'\x00\x00\x00'

This comment has been minimized.

Copy link
@nathanielmanistaatgoogle

nathanielmanistaatgoogle May 16, 2017

Member

Just for test sanity's sake let's make the value of this constant different from the value of _REQUEST.

This comment has been minimized.

Copy link
@kpayson64

kpayson64 May 16, 2017

Author Contributor

Done

src/python/grpcio_tests/tests/unit/_reconnect_test.py Outdated

def testReconnect(self):
multi_callable = self._channel.unary_unary(_UNARY_UNARY)
self.assertEquals(_RESPONSE, multi_callable(_REQUEST))

This comment has been minimized.

Copy link
@nathanielmanistaatgoogle

nathanielmanistaatgoogle May 16, 2017

Member

Always leave the unnecessary ess off the end of assertEqual.

This comment has been minimized.

Copy link
@kpayson64

kpayson64 May 16, 2017

Author Contributor

Done

src/python/grpcio_tests/tests/unit/_reconnect_test.py Outdated
def testReconnect(self):
multi_callable = self._channel.unary_unary(_UNARY_UNARY)
self.assertEquals(_RESPONSE, multi_callable(_REQUEST))
self._server.stop(None)

This comment has been minimized.

Copy link
@nathanielmanistaatgoogle

nathanielmanistaatgoogle May 16, 2017

Member

"Rebalance" the statements in this test so that nothing in the test method "undoes" what was done in setUp and also so that nothing in the test method "does" what will later be undone in tearDown. (I think this will mean that what will be left in your test will be only instantiating the thread pool, at which point you may choose to simply not bother with having a separate setUp/tearDown.)

A unit test with a setUp/tearDown and also only a single test method is particularly susceptible to premature generalization.

This comment has been minimized.

Copy link
@kpayson64

kpayson64 May 16, 2017

Author Contributor

Removed setUp/tearDown

src/python/grpcio_tests/tests/unit/_reconnect_test.py Outdated
def tearDown(self):
self._server.stop(None)

def testReconnect(self):

This comment has been minimized.

Copy link
@nathanielmanistaatgoogle

nathanielmanistaatgoogle May 16, 2017

Member

We haven't been good about this in the past but it's on my list of things to clean up (and in the meantime, to avoid digging deeper): use the PEP8-style name test_reconnect.

This comment has been minimized.

Copy link
@kpayson64

kpayson64 May 16, 2017

Author Contributor

Done

src/python/grpcio_tests/tests/unit/_reconnect_test.py Outdated

_UNARY_UNARY = '/test/UnaryUnary'

def handle_unary_unary(unused_request, unused_servicer_context):

This comment has been minimized.

Copy link
@nathanielmanistaatgoogle

nathanielmanistaatgoogle May 16, 2017

Member

Missing leading underscore on this name.

This comment has been minimized.

Copy link
@kpayson64

kpayson64 May 16, 2017

Author Contributor

Done

@kpayson64

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2017

When the non-test changes are removed, the test fails:

  File "src/python/grpcio_tests/tests/unit/_reconnect_test.py", line 70, in testReconnect
    self.assertEquals(_RESPONSE, multi_callable(_REQUEST))
  File "/usr/local/google/home/kpayson/grpc2/py27/local/lib/python2.7/site-packages/grpc/_channel.py", line 507, in __call__
    return _end_unary_response_blocking(state, call, False, deadline)
  File "/usr/local/google/home/kpayson/grpc2/py27/local/lib/python2.7/site-packages/grpc/_channel.py", line 455, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
_Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, Endpoint read failed)>

@kpayson64 kpayson64 force-pushed the kpayson64:hack_for_disconnects branch May 16, 2017

@kpayson64

This comment has been minimized.

Copy link
Contributor Author

commented May 16, 2017

@nathanielmanistaatgoogle Addressed comments, back to you

# TODO(https://github.com/grpc/grpc/issues/9884)
# Temporary work around UNAVAILABLE issues
# Remove this once c-core has retry support
_subscribe(self._connectivity_state, lambda *args: None, None)

This comment has been minimized.

Copy link
@nathanielmanistaatgoogle

nathanielmanistaatgoogle May 16, 2017

Member

What's the significance of returning None, None rather than just None?

This comment has been minimized.

Copy link
@kpayson64

kpayson64 May 19, 2017

Author Contributor

The second none is the try_to_connect arg, not part of the lambda.

@@ -926,6 +926,11 @@ def __init__(self, target, options, credentials):
self._call_state = _ChannelCallState(self._channel)
self._connectivity_state = _ChannelConnectivityState(self._channel)

# TODO(https://github.com/grpc/grpc/issues/9884)

This comment has been minimized.

Copy link
@nathanielmanistaatgoogle

nathanielmanistaatgoogle May 16, 2017

Member

Typography nitpick: missing colon after closing parenthesis and start the prose of the comment on the same line rather than line-wrapping after the TODO(...):.

Huh. I guess the colon isn't required. Aww phooey; I've always liked it.

src/python/grpcio/grpc/_channel.py Outdated
@@ -926,6 +926,11 @@ def __init__(self, target, options, credentials):
self._call_state = _ChannelCallState(self._channel)
self._connectivity_state = _ChannelConnectivityState(self._channel)

# Temporary work around UNAVAILABLE issues

This comment has been minimized.

Copy link
@nathanielmanistaatgoogle

nathanielmanistaatgoogle May 16, 2017

Member

I don't think this change is sufficient to close #11043 because #11043 is a wrapped-languages issue and this is a Python-only change.

#9884 might be the right issue for scoping the temporary-ness of this workaround; I think I'm a little thrown by the way it's being called "retries" and being described as a feature when I keep encountering it as "not trying hard enough the first time" and thinking of it as a bug.

@nathanielmanistaatgoogle
Copy link
Member

left a comment

I'm a little skittish about spawning a thread per channel, but... okay.

src/python/grpcio_tests/tests/unit/_reconnect_test.py Outdated
channel = grpc.insecure_channel(
'localhost:%d' % port)
multi_callable = channel.unary_unary(_UNARY_UNARY)
self.assertEquals(_RESPONSE, multi_callable(_REQUEST))

This comment has been minimized.

Copy link
@nathanielmanistaatgoogle

nathanielmanistaatgoogle May 16, 2017

Member

assertEqual without the trailing ess here too.

This comment has been minimized.

Copy link
@kpayson64

kpayson64 May 19, 2017

Author Contributor

Done

@kpayson64 kpayson64 changed the base branch from master to v1.3.x May 19, 2017

@kpayson64 kpayson64 force-pushed the kpayson64:hack_for_disconnects branch 2 times, most recently May 19, 2017

@kpayson64 kpayson64 force-pushed the kpayson64:hack_for_disconnects branch to 8f7bc54 May 22, 2017

@kpayson64 kpayson64 merged commit 866190f into grpc:v1.3.x May 22, 2017

13 of 16 checks passed

Bazel Full Build & Tests Build finished.
Details
UBsan C Build finished.
Details
gRPC_interop_pull_requests Build finished.
Details
Asan C Build finished.
Details
Asan C++ Build finished.
Details
Basic Tests Linux Build finished.
Details
Basic Tests Mac Build finished.
Details
Basic Tests Windows Build finished.
Details
Bazel Basic Build Build finished.
Details
Msan C Build finished.
Details
Portability Tests Linux Build finished.
Details
Portability Tests Windows Build finished.
Details
Tsan C Build finished.
Details
Tsan C++ Build finished.
Details
cla/google All necessary CLAs are signed
gRPC_performance_pull_requests Build finished.
Details

@lock lock bot locked as resolved and limited conversation to collaborators Jan 23, 2019

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants
You can’t perform that action at this time.