Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clientconn: go idle if conn closed after preface received #5714

Merged
merged 1 commit into from Oct 18, 2022

Conversation

fuweid
Copy link
Contributor

@fuweid fuweid commented Oct 12, 2022

Fixes #5688

The select branch will be selected randomly, if there are several ready
branches. If the preface has been received and then connection is
closed, the createTransport might hit the connClosed.Done() branch.
Ideally, the subConn should go idle and reconnect.

RELEASE NOTES:

  • client: fix race that could lead to an incorrect connection state if it was closed immediately after the server's HTTP/2 preface was received

@fuweid
Copy link
Contributor Author

fuweid commented Oct 13, 2022

ping @dfawley ~

The select branch will be selected randomly, if there are several ready
branches. If the preface has been received and then connection is
closed, the `createTransport` might hit the `connClosed.Done()` branch.
Ideally, the subConn should go idle and reconnect.

Fixes: grpc#5688

Signed-off-by: Wei Fu <fuweid89@gmail.com>
@fuweid
Copy link
Contributor Author

fuweid commented Oct 17, 2022

Hi @zasweq would you mind to take a look on this? Thanks

We are hitting this issue if my understanding is correct ~

@dfawley dfawley closed this Oct 17, 2022
@dfawley dfawley reopened this Oct 17, 2022
@dfawley
Copy link
Contributor

dfawley commented Oct 17, 2022

Sorry, I was out of town for a few days; I can review this.

Close+reopen to make GA run the recently-added Go 1.19 tests.

@dfawley dfawley assigned dfawley and unassigned zasweq Oct 17, 2022
@dfawley dfawley requested review from dfawley and removed request for zasweq October 17, 2022 23:02
@dfawley
Copy link
Contributor

dfawley commented Oct 17, 2022

I believe in the event this case of the select occurs, onClose should be called, which will set our state to IDLE. Are you not seeing that?

(By the same logic, I think setting the state to IDLE in the case where the preface was received but connClose.HasFired() is true should also be unnecessary.)

@dfawley
Copy link
Contributor

dfawley commented Oct 17, 2022

Ideally, the subConn should go idle and reconnect.

Maybe there's a misunderstanding as to the functionality here? gRPC will not reconnect automatically unless there is a new RPC or ClientConn.Connect() is called.

@fuweid
Copy link
Contributor Author

fuweid commented Oct 17, 2022

Hi @dfawley thanks for the review.

In the onClose function, if the hcStarted == false, it will not set it IDLE. I am trying to add some sleep time to reproduce it in that issue #5688. Hope it can help~

if !hcStarted || hctx.Err() != nil {
			// We didn't start the health check or set the state to READY, so
			// no need to do anything else here.
			//
			// OR, we have already cancelled the health check context, meaning
			// we have already called onClose once for this transport.  In this
			// case it would be dangerous to clear the transport and update the
			// state, since there may be a new transport in this addrConn.
			return
		}

gRPC will not reconnect automatically unless there is a new RPC or ClientConn.Connect() is called.

For the unary call, the caller's goroutine is just waiting to get subConn. The gRPC client's background loop-goroutine will try to connect it if the subConn is idle.

sc.Connect()

@fuweid
Copy link
Contributor Author

fuweid commented Oct 17, 2022

It seems that the CI issue https://github.com/grpc/grpc-go/actions/runs/3269279382/jobs/5376604126 is not related to this change.

@dfawley
Copy link
Contributor

dfawley commented Oct 17, 2022

Flaky test is #5631

@fuweid fuweid closed this Oct 17, 2022
@fuweid fuweid reopened this Oct 17, 2022
Copy link
Contributor

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I see. I think this is fine, then. What we really need to do is make the preface receipt part of creating the transport, synchronously, to avoid all this extra complexity. But this fixes the small race that we have, so LGTM.

@dfawley dfawley assigned easwars and unassigned fuweid Oct 17, 2022
@dfawley dfawley requested a review from easwars October 17, 2022 23:58
@fuweid
Copy link
Contributor Author

fuweid commented Oct 18, 2022

reopen to rerun CI.

@easwars easwars merged commit 79ccdd8 into grpc:master Oct 18, 2022
37 of 38 checks passed
@easwars
Copy link
Contributor

easwars commented Oct 18, 2022

Thanks for your contribution @fuweid !!

@fuweid fuweid deleted the fix-5688 branch October 18, 2022 23:59
@fuweid
Copy link
Contributor Author

fuweid commented Oct 18, 2022

Thanks for your contribution @fuweid !!

Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

the subConn state is stuck in CONNECTING if connection is lose after preface is received
4 participants