Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent: shutdown if the http server goroutine exits #8234

Merged
merged 1 commit into from
Sep 3, 2020

Conversation

dnephin
Copy link
Contributor

@dnephin dnephin commented Jul 2, 2020

Fixes #6677

x/sync/errgroup is designed to handle exactly this problem. I have used it in the past and found that it handled the problem well.

Eventually we may be able to use a single errgroup.Group as the core of the agent. That would require that every goroutine shutdown on a cancelled context (instead of calling Shutdown). Until that is possible I've put it into a new apiServers struct and exposed the channel so that command can be notified when one of them exits, and proceed with the shutdown.

As a follow up to this PR I think we can move at least the dnsServers into apiServers, which should further shrink agent.Agent. Maybe it would be appropriate to move the grpcServers in as well?

The retryJoiner for Lan/Wan pools also works the same way. So using this errgroup for them would help shrink the Agent even further.

@dnephin dnephin force-pushed the dnephin/shutdown-on-http-server-error branch from a77b864 to 97ff981 Compare July 2, 2020 23:06
@dnephin dnephin force-pushed the dnephin/shutdown-on-http-server-error branch from 97ff981 to f41b3e8 Compare July 2, 2020 23:44
start := func(proto string, addrs []net.Addr) error {
listeners, err := a.startListeners(addrs)
if err != nil {
return err
}
ln = append(ln, listeners...)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was moved up from line 1178. I believe the old code would not properly close the listener that caused the error because it was not appended to the list until after an error is returned.

}()

select {
case addr := <-notif:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This select/case no longer exists. All this was doing was waiting for a goroutine to start. The server would not actually be started when the addr is sent on the channel.

With this change, a failure to server will cause a shutdown, so we now properly handle the case without the need for the notif channel.

@dnephin dnephin force-pushed the dnephin/shutdown-on-http-server-error branch from f41b3e8 to 1ac74a4 Compare July 2, 2020 23:58
@dnephin dnephin requested a review from a team July 7, 2020 15:49
@dnephin dnephin force-pushed the dnephin/unembed-HTTPServer-Server branch from 9a3617e to df40882 Compare July 9, 2020 20:42
Base automatically changed from dnephin/unembed-HTTPServer-Server to master July 9, 2020 21:42
@dnephin dnephin force-pushed the dnephin/shutdown-on-http-server-error branch from 1ac74a4 to 0a7a2c8 Compare July 9, 2020 21:49
@dnephin dnephin requested a review from a team July 21, 2020 20:14
Copy link
Member

@hanshasselberg hanshasselberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about errgroup 🎉! LGTM

Remove Server field from HTTPServer. The field is no longer used.
@dnephin
Copy link
Contributor Author

dnephin commented Sep 3, 2020

I amended the last commit to fix a merge conflict and added more godoc to apiServers.

@dnephin dnephin merged commit 964fa81 into master Sep 3, 2020
@dnephin dnephin deleted the dnephin/shutdown-on-http-server-error branch September 3, 2020 20:44
dnephin added a commit that referenced this pull request Sep 4, 2020
https server.

In #8234 I changed a few tests to use TestAgent.HTTPAddr() to find the
addr used in the test. Due to the way HTTPAddr() was implemented these
tests were passing, but I think the pass was incidental. HTTPAddr() was
not matching any servers, and was instead returning the last server,
which happened to be the one these tests wanted.

This commit fixes the implementation of HTTPAddr to panic if no match
was found. The tests which require an HTTPS server are changed to use
a new firstAddr() to look up the correct address.
@banks banks mentioned this pull request Sep 23, 2020
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

consul may continue running without a working HTTP server
3 participants