New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
internal: remove transportMonitor, replace with callbacks #2032
internal: remove transportMonitor, replace with callbacks #2032
Conversation
@@ -1515,6 +1517,20 @@ func (ac *addrConn) tearDown(err error) { | |||
if channelz.IsOn() { | |||
channelz.RemoveEntry(ac.channelzID) | |||
} | |||
|
|||
// TODO(deklerk) is this bad? | |||
transports := map[int32]transport.ClientTransport{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive way to get around needing the transports outside the lock (GracefulClose takes the lock), but not wanting to cause data races.
Is there a better way to do this?
@@ -261,6 +261,11 @@ func newClientStream(ctx context.Context, desc *StreamDesc, cc *ClientConn, meth | |||
return nil, err | |||
} | |||
|
|||
// TODO(deklerk) what?? | |||
if t == nil { | |||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure out why this is happening, but sometimes I would get nil, true, nil which blows up a little below when we call t.NewStream. This conditional fixes the problem, but I thought I'd bring it up. I suspect it's a race between the transport closing - and getting deleted - and getTransport looking for, finding, and returning it.
If so, is this fix appropriate or should I make getTransport and transport deletion synchronized? Alternatively, should I document getTransport's race with closing transport? (again, all this predicated on my guess above)
clientconn.go
Outdated
} | ||
if connected { | ||
return nil | ||
if t == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this solves the problem appropriately. AFAICT there's a race between the reader goroutine closing and onDeadline. This puts a plaster over the problem but I think there's a still a chance t.Close just below could happen incorrectly. Is that accurate? Is there a better way to handle this? Do I need to do more synchronization?
clientconn.go
Outdated
defer shutdownLock.Unlock() | ||
|
||
if timer != nil { | ||
timer.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this appropriate? I'm a bit confused by the godoc in timer.Stop; it seems to suggest that we need to drain the channel, too. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine.
Yes, that's right. It's only failing in 1.10 because we don't run the vet/lint/etc checks other builds. |
@dfawley Is there a way to see what in clientconn.go caused it to fail? Or, repro locally? edit: |
What version is your local go? ...Now I'm even more confused. The output shows it grepping for "unsafe", but if I look in your branch, your vet.sh doesn't have the change to do that yet. I think we should remove the "-l" parameter from the command-line so we can see the full diffs. You can do that in your branch, anyway, to get travis to output it. |
Missed that. Are you not running 1.10.1? That's what travis used. |
Updated and rebased with master, still not seeing anything with |
233a399
to
86498f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed it partially. Looks good, mostly. We might need to update the structure a little bit. Let's chat when you're in SVL again.
@@ -1174,152 +1179,217 @@ func (ac *addrConn) resetTransport() error { | |||
ac.ready = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update documentation for this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
clientconn.go
Outdated
shutdownLock := &sync.Mutex{} | ||
var timer *time.Timer | ||
ac.mu.Lock() | ||
if !ac.connectDeadline.IsZero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When can this be zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, never. Removed conditional
clientconn.go
Outdated
defer shutdownLock.Unlock() | ||
|
||
if timer != nil { | ||
timer.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine.
clientconn.go
Outdated
newTrID := atomic.AddInt32(&ac.transportIdx, 1) | ||
|
||
onGoAway := func() { | ||
shutdownLock.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this lock protect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timer. I'll tighten it up.
clientconn.go
Outdated
shutdownLock.Lock() | ||
defer shutdownLock.Unlock() | ||
|
||
if timer != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if connectionDeadline
is always non-zero we can get rid of timer != nil
checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Done.
clientconn.go
Outdated
} | ||
t.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this closure, how about transport's reader goroutine wait on connectionDeadline
as well while it waits on reading settings frame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I'm clear, since the diff is a bit hard to read; you're suggesting that we remove onDeadline and move the onDeadline logic into the transport reader goroutine by way of passing connectDeadline down to the transport?
clientconn.go
Outdated
newTr, err := transport.NewClientTransport(connectCtx, ac.cc.ctx, target, copts, onPrefaceReceipt) | ||
|
||
ac.mu.Lock() | ||
delete(ac.transports, newTrID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also update ac.transport = nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. Done
clientconn.go
Outdated
case <-ac.ctx.Done(): | ||
} | ||
|
||
err = ac.resetTransport() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resetTransport()
always resets connectionDeadline
, but if haven't exhausted our list of backends we'd like to use the same deadline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that is the current behavior, no?
clientconn.go
Outdated
} | ||
return ac.resetTransport() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather not replace loop with recursion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
clientconn.go
Outdated
case <-connectCtx.Done(): | ||
// Didn't receive server preface, must kill this new transport now. | ||
grpclog.Warningf("grpc: addrConn.createTransport failed to receive server preface before deadline.") | ||
newTr.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onDeadline()
closure will do this for us now, so this can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@MakMukhi Updated from recursion to loop edit: I should be able to get to your other couple of comments next couple of days |
Just realized I was pushing the wrong branch heh. The updates I promised above are now actually here :) |
clientconn.go
Outdated
@@ -1,3 +1,11 @@ | |||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I should and will remove this
b0fd5be
to
88d55c0
Compare
@MakMukhi PTAL.
|
This removes a goroutine (in which transportMonitor was running) and the backoff and addrs for loops. Instead we use callbacks to handle the deadline, goaway, and close logic that transport monitor handled before, and we descend into recursive resetTransport calls (from the callbacks) instead of loops. As a result, backoffIdx and addrIdx now live on addrConn. Also, transport.Close now blocks until the addrConn that initiated it is connected - this is because t.Close calls onClose, whose last step is to begin reconnecting. In the event of a shutdown, all reconnect logic ends and t.Close will return.
- Refactor to use reader goroutine for transport reset - Rename back to resetTransport - Trim down onDeadline extensively; consolidate a lot of logic into resetTransport - General cleanup, codereview fixes, and optimizations
88d55c0
to
2725a51
Compare
b1b4ba9
to
b1ba044
Compare
Now that location differs, I'm closing this and re-opening in a new, clean PR. |
This removes a goroutine (in which transportMonitor was running) and the backoff and addrs loops. Instead we use callbacks to handle the deadline, goaway, and close logic that transport monitor handled before, and we descend into recursive createTransport calls instead of loops.
As a result, backoffIdx and addrIdx now live on addrConn. Also, transport.Close now blocks until the addrConn that initiated it is connected - this is because t.Close calls onClose, whose last step is to begin reconnecting. In the event of a shutdown, all reconnect logic ends and t.Close will return.