Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/tls: handshake race between read and write goroutines in go1.7 #17101

Closed
zellyn opened this issue Sep 14, 2016 · 5 comments
Closed

crypto/tls: handshake race between read and write goroutines in go1.7 #17101

zellyn opened this issue Sep 14, 2016 · 5 comments
Assignees
Milestone

Comments

@zellyn
Copy link

@zellyn zellyn commented Sep 14, 2016

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go1.7

What operating system and processor architecture are you using (go env)?

linux

What did you do?

We have a simple Stubby-alike RPC protocol ("Sake"), which spins up input and output goroutines. We don't currently force a handshake, so input and output race to handshake. (I'll be fixing that after I file this bug.)

  1. "Read" and "Write" goroutines race to handshake, racing for in at the point where the handshake routine tries to give up handshakeMutex and grab in.
  2. "Read" gets in and completes handshake.
  3. "Read" gets in, and blocks forever, waiting for input. Because this is a simple client-server protocol, with client-initiated communication, after the connection is established and the handshake is finished, nothing will ever be read except in response to a client-initiated request.
  4. "Write" gets stuck forever trying to take in inside of Handshake().

The comment here https://github.com/golang/go/blob/5a589904a3/src/crypto/tls/conn.go#L1204 exactly describes the problem.

It was introduced in this commit: af125a5#diff-ef0187d9cfe69a02cab179f844c8e712R1167

What did you expect to see?

Successful connections and communications.

What did you see instead?

The output thread stuck indefinitely until a redeploy of the other end tears down all the connections.

Notes

In our staging environment, I have a test client running on two hosts, each making 300 connections to a test server. It took me six or seven restarts to see this problem happen on one connection, so let's guess one attempt in two thousand. Ish.

An apology: it should be possible to reproduce in a test case, but I haven't the time to write it right now: set up a simple server, using TLS, and repeatedly try to connect pairs of input/output goroutines. A watchdog goroutine checks for hung connections.

@agl

@bradfitz bradfitz added this to the Go1.7.2 milestone Sep 14, 2016
@bradfitz

This comment has been minimized.

Copy link
Contributor

@bradfitz bradfitz commented Sep 14, 2016

Tagging as Go 1.7.2 since I believe this is a regression from Go 1.6. But this might be Go 1.8 material instead. I'll let @agl decide.

@bradfitz bradfitz changed the title TLS handshake race between read and write goroutines in go1.7 crypto/tls: handshake race between read and write goroutines in go1.7 Sep 14, 2016
@zellyn

This comment has been minimized.

Copy link
Author

@zellyn zellyn commented Sep 14, 2016

It's definitely a regression since go1.5.3, but unfortunately, I haven't tested on go1.6.

@agl

This comment has been minimized.

Copy link
Contributor

@agl agl commented Sep 14, 2016

And I though that I might have gotten away with caving on support renegotiation without any nasty bugs :(

Will look at this today.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Sep 14, 2016

CL https://golang.org/cl/29164 mentions this issue.

@gopherbot gopherbot closed this in 254169d Sep 22, 2016
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Oct 17, 2016

CL https://golang.org/cl/31268 mentions this issue.

gopherbot pushed a commit that referenced this issue Oct 17, 2016
…te handshake.

After renegotiation support was added (af125a5) it's possible for a
Write to block on a Read when racing to complete the handshake:
   1. The Write determines that a handshake is needed and tries to
      take the neccesary locks in the correct order.
   2. The Read also determines that a handshake is needed and wins
      the race to take the locks.
   3. The Read goroutine completes the handshake and wins a race
      to unlock and relock c.in, which it'll hold when waiting for
      more network data.

If the application-level protocol requires the Write to complete before
data can be read then the system as a whole will deadlock.

Unfortunately it doesn't appear possible to reverse the locking order of
c.in and handshakeMutex because we might read a renegotiation request at
any point and need to be able to do a handshake without unlocking.

So this change adds a sync.Cond that indicates that a goroutine has
committed to doing a handshake. Other interested goroutines can wait on
that Cond when needed.

The test for this isn't great. I was able to reproduce the deadlock with
it only when building with -race. (Because -race happened to alter the
timing just enough.)

Fixes #17101.

Change-Id: I4e8757f7b82a84e46c9963a977d089f0fb675495
Reviewed-on: https://go-review.googlesource.com/29164
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-on: https://go-review.googlesource.com/31268
ceseo added a commit to powertechpreview/go that referenced this issue Dec 1, 2016
…te handshake.

After renegotiation support was added (af125a5) it's possible for a
Write to block on a Read when racing to complete the handshake:
   1. The Write determines that a handshake is needed and tries to
      take the neccesary locks in the correct order.
   2. The Read also determines that a handshake is needed and wins
      the race to take the locks.
   3. The Read goroutine completes the handshake and wins a race
      to unlock and relock c.in, which it'll hold when waiting for
      more network data.

If the application-level protocol requires the Write to complete before
data can be read then the system as a whole will deadlock.

Unfortunately it doesn't appear possible to reverse the locking order of
c.in and handshakeMutex because we might read a renegotiation request at
any point and need to be able to do a handshake without unlocking.

So this change adds a sync.Cond that indicates that a goroutine has
committed to doing a handshake. Other interested goroutines can wait on
that Cond when needed.

The test for this isn't great. I was able to reproduce the deadlock with
it only when building with -race. (Because -race happened to alter the
timing just enough.)

Fixes golang#17101.

Change-Id: I4e8757f7b82a84e46c9963a977d089f0fb675495
Reviewed-on: https://go-review.googlesource.com/29164
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-on: https://go-review.googlesource.com/31268
@golang golang locked and limited conversation to collaborators Oct 17, 2017
FiloSottile pushed a commit to FiloSottile/go that referenced this issue Oct 12, 2018
After renegotiation support was added (af125a5) it's possible for a
Write to block on a Read when racing to complete the handshake:
   1. The Write determines that a handshake is needed and tries to
      take the neccesary locks in the correct order.
   2. The Read also determines that a handshake is needed and wins
      the race to take the locks.
   3. The Read goroutine completes the handshake and wins a race
      to unlock and relock c.in, which it'll hold when waiting for
      more network data.

If the application-level protocol requires the Write to complete before
data can be read then the system as a whole will deadlock.

Unfortunately it doesn't appear possible to reverse the locking order of
c.in and handshakeMutex because we might read a renegotiation request at
any point and need to be able to do a handshake without unlocking.

So this change adds a sync.Cond that indicates that a goroutine has
committed to doing a handshake. Other interested goroutines can wait on
that Cond when needed.

The test for this isn't great. I was able to reproduce the deadlock with
it only when building with -race. (Because -race happened to alter the
timing just enough.)

Fixes golang#17101.

Change-Id: I4e8757f7b82a84e46c9963a977d089f0fb675495
Reviewed-on: https://go-review.googlesource.com/29164
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
FiloSottile pushed a commit to FiloSottile/go that referenced this issue Oct 12, 2018
After renegotiation support was added (af125a5) it's possible for a
Write to block on a Read when racing to complete the handshake:
   1. The Write determines that a handshake is needed and tries to
      take the neccesary locks in the correct order.
   2. The Read also determines that a handshake is needed and wins
      the race to take the locks.
   3. The Read goroutine completes the handshake and wins a race
      to unlock and relock c.in, which it'll hold when waiting for
      more network data.

If the application-level protocol requires the Write to complete before
data can be read then the system as a whole will deadlock.

Unfortunately it doesn't appear possible to reverse the locking order of
c.in and handshakeMutex because we might read a renegotiation request at
any point and need to be able to do a handshake without unlocking.

So this change adds a sync.Cond that indicates that a goroutine has
committed to doing a handshake. Other interested goroutines can wait on
that Cond when needed.

The test for this isn't great. I was able to reproduce the deadlock with
it only when building with -race. (Because -race happened to alter the
timing just enough.)

Fixes golang#17101.

Change-Id: I4e8757f7b82a84e46c9963a977d089f0fb675495
Reviewed-on: https://go-review.googlesource.com/29164
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.