New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
give a long gregor reconnect backoff to devices that don't send chats #9720
Conversation
d139a3a
to
189afc9
Compare
Will replace the "dummy revendor" commit after keybase/go-framed-msgpack-rpc#129 lands. |
@mmaxim this is ready for your review. |
189afc9
to
e10089f
Compare
Upstream landed, and I've replaced the dummy revendor. Will add a corresponding KBFS PR now. (I expect Android CI will be broken until both of these land.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple q's
go/chat/server.go
Outdated
@@ -928,6 +928,8 @@ func (h *Server) PostLocal(ctx context.Context, arg chat1.PostLocalArg) (res cha | |||
return chat1.PostLocalRes{}, err | |||
} | |||
|
|||
RecordChatSend(h.G()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we put this in BlockingSender.Send
?
go/chat/active.go
Outdated
func RecordChatSend(g *globals.Context) { | ||
err := g.LocalChatDb.PutObj(lastSendTimeDbKey(), nil, time.Now().Unix()) | ||
if err != nil { | ||
g.Log.Warning("Failed to store chat last send time: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make all these prints debugs so they don't show up on the CLI.
go/service/gregor.go
Outdated
// given a long backoff. | ||
lastSendTime := chat.GetLastSendTime(g) | ||
if now.Sub(lastSendTime) < chat.ActiveIntervalAfterSend { | ||
return GregorConnectionShortRetryInterval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we wait at all for active people?
668c717
to
09a7814
Compare
Ok, the current state of this PR:
|
09a7814
to
f980a9f
Compare
General comment to use |
f980a9f
to
840af7d
Compare
Made @mmaxim's suggested logging changes and rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks! Two points:
1.) Is there anyway we can test this? I was thinking a unit test for active.go itself (without any connection business) could be useful.
2.) I think this should hold until everything is released.
840af7d
to
ca9e382
Compare
Added a test and rebased. Landing after this passes CI and some hand testing. |
This should mitigate some of the reconnect flood that the gregor servers have to deal with when they restart. Most idle clients in the wild aren't participating in chat, and don't need to reconnect very aggressively. There were a few different heuristics we could've used here, and others we might want to use in the future. One in particular we almost chose was "has this user ever received a message". However, we sometimes send system-wide messages, like when Linux updates are broken, which could confuse that heuristic. Chat sending is a surer sign of activity than receiving, and it also has the benefit of being individual-device-specific. Two things had to change to make this work. First, we had to configure a chat-activity-based backoff (using a couple new keys in LevelDB). Second, we had to make sure that the backoff was respected on reconnect, which required the new ForceInitialBackoff ConnectionOpts parameter upstream, since we don't keep a persisitent Connection object after disconnects.
ca9e382
to
069968a
Compare
This should mitigate some of the reconnect flood that the gregor servers
have to deal with when they restart. Most idle clients in the wild
aren't participating in chat, and don't need to reconnect very
aggressively.
There were a few different heuristics we could've used here, and others
we might want to use in the future. One in particular we almost chose
was "has this user ever received a message". However, we sometimes send
system-wide messages, like when Linux updates are broken, which could
confuse that heuristic. Chat sending is a surer sign of activity than
receiving, and it also has the benefit of being
individual-device-specific.
Two things had to change to make this work. First, we had to configure a
chat-activity-based backoff (using a couple new keys in LevelDB).
Second, we had to make sure that the backoff was respected on reconnect,
which required the new ForceInitialBackoff ConnectionOpts parameter
upstream, since we don't keep a persisitent Connection object after
disconnects.