Skip to content

Conversation

zhangkun83
Copy link
Contributor

If all RR servers are unhealthy, it's possible that at least one
connection is CONNECTING at every moment which causes RR to stay in
CONNECTING. It's better to keep the TRANSIENT_FAILURE state in that
case so that fail-fast RPCs can fail fast.

The same changes have been made for RoundRobinLoadBalancer in #6657

…READY

If all RR servers are unhealthy, it's possible that at least one
connection is CONNECTING at every moment which causes RR to stay in
CONNECTING. It's better to keep the TRANSIENT_FAILURE state in that
case so that fail-fast RPCs can fail fast.

The same changes have been made for RoundRobinLoadBalancer in grpc#6657
@zhangkun83 zhangkun83 requested a review from ejona86 January 15, 2021 21:59
@zhangkun83 zhangkun83 merged commit 23d2796 into grpc:master Jan 15, 2021
@zhangkun83 zhangkun83 deleted the grpclb-staytransientfailure branch January 15, 2021 23:19
// Switch subchannel1 to TRANSIENT_FAILURE, making the general state TRANSIENT_FAILURE too.
Status error = Status.UNAVAILABLE.withDescription("error1");
deliverSubchannelState(subchannel1, ConnectivityStateInfo.forTransientFailure(error));
inOrder.verify(helper).updateBalancingState(eq(TRANSIENT_FAILURE), pickerCaptor.capture());
Copy link
Contributor

@voidzcy voidzcy Mar 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is weird. subchannel2 is still connecting, so the overall state should be CONNECTING. This is mostly because subchannels with CONNECTING state are ignored when aggregating the overall state.

So, observing TRANSIENT_FAILURE isn't related to your change. And I believe that should be fixed. However, for this test case specifically, you should bring all subchannels into TRANSIENT_FAILURE and then perform the test for behaviors related to your change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we should stop ignoring CONNECTING with this change (and use the same logic as RR and elsewhere)

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants