Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restarting Postgres server used for backend while Teleport is running results in odd behaviour #13215

Closed
webvictim opened this issue Jun 6, 2022 · 3 comments
Assignees
Labels

Comments

@webvictim
Copy link
Contributor

Expected behavior:
If you restart the Postgres server which is providing Teleport's backend database, Teleport should reconnect and recover gracefully.

Current behavior:
Restarting the Postgres server doesn't appear to cause any errors, but attempting to perform an SSO login via Github resulted in an error:

Jun 06 14:02:13 artemis teleport[2436596]: 2022-06-06T14:02:13-03:00 DEBU [AUTH]      Retrieved 1 teams for GitHub user webvictim. auth/github.go:443
Jun 06 14:02:13 artemis teleport[2436596]: 2022-06-06T14:02:13-03:00 DEBU [GITHUB]    Claims: &types.GithubClaims{Username:"webvictim", OrganizationToTeams:map[string][]string{"test-org":[]string{"admins"}}, Teams:[]string{"admins"}}. auth/github.go:461
Jun 06 14:02:13 artemis teleport[2436596]: 2022-06-06T14:02:13-03:00 DEBU [GITHUB]    Generating dynamic identity github/webvictim with logins: [admin access editor]. auth/github.go:371
Jun 06 14:02:13 artemis teleport[2436596]: 2022-06-06T14:02:13-03:00 INFO [AUDIT]     user.login attributes:map[webvictim-org:[admins]] cluster_name:teleport.example.com code:T1001W ei:0 error:[user "webvictim" is not found] event:user.login message:user "webvictim" is not found method:github success:false time:2022-06-06T17:02:13.125Z uid:b368ef6c-567f-4360-95>
Jun 06 14:02:13 artemis teleport[2436596]: 2022-06-06T14:02:13-03:00 ERRO [WEB]       Error while processing callback. auth:github error:[
Jun 06 14:02:13 artemis teleport[2436596]: ERROR REPORT:
Jun 06 14:02:13 artemis teleport[2436596]: Original Error: *trace.NotFoundError user "webvictim" is not found
Jun 06 14:02:13 artemis teleport[2436596]: Stack Trace:
Jun 06 14:02:13 artemis teleport[2436596]: Caught:
Jun 06 14:02:13 artemis teleport[2436596]:         /go/src/github.com/gravitational/teleport/lib/httplib/httplib.go:142 github.com/gravitational/teleport/lib/httplib.ConvertResponse
Jun 06 14:02:13 artemis teleport[2436596]:         /go/src/github.com/gravitational/teleport/lib/auth/clt.go:278 github.com/gravitational/teleport/lib/auth.(*Client).PostJSON
Jun 06 14:02:13 artemis teleport[2436596]:         /go/src/github.com/gravitational/teleport/lib/auth/clt.go:1258 github.com/gravitational/teleport/lib/auth.(*Client).ValidateGithubAuthCallback
Jun 06 14:02:13 artemis teleport[2436596]:         /go/src/github.com/gravitational/teleport/lib/web/apiserver.go:1114 github.com/gravitational/teleport/lib/web.(*Handler).githubCallback
Jun 06 14:02:13 artemis teleport[2436596]:         /go/src/github.com/gravitational/teleport/lib/web/apiserver.go:2562 github.com/gravitational/teleport/lib/web.(*Handler).WithMetaRedirect.func1
Jun 06 14:02:13 artemis teleport[2436596]:         /tmp/gomodcache/github.com/gravitational/httprouter@v1.3.1-0.20220408074523-c876c5e705a5/router.go:399 github.com/julienschmidt/httprouter.(*Router).ServeHTTP
Jun 06 14:02:13 artemis teleport[2436596]:         /opt/go/src/net/http/server.go:2090 net/http.StripPrefix.func1
Jun 06 14:02:13 artemis teleport[2436596]:         /opt/go/src/net/http/server.go:2047 net/http.HandlerFunc.ServeHTTP
Jun 06 14:02:13 artemis teleport[2436596]:         /go/src/github.com/gravitational/teleport/lib/web/apiserver.go:454 github.com/gravitational/teleport/lib/web.NewHandler.func1
Jun 06 14:02:13 artemis teleport[2436596]:         /opt/go/src/net/http/server.go:2047 net/http.HandlerFunc.ServeHTTP
Jun 06 14:02:13 artemis teleport[2436596]:         /tmp/gomodcache/github.com/gravitational/httprouter@v1.3.1-0.20220408074523-c876c5e705a5/router.go:460 github.com/julienschmidt/httprouter.(*Router).ServeHTTP
Jun 06 14:02:13 artemis teleport[2436596]:         /go/src/github.com/gravitational/teleport/lib/web/apiserver.go:215 github.com/gravitational/teleport/lib/web.(*APIHandler).ServeHTTP
Jun 06 14:02:13 artemis teleport[2436596]:         /tmp/gomodcache/github.com/gravitational/oxy@v0.0.0-20211213172937-a1ba0900a4c9/ratelimit/tokenlimiter.go:118 github.com/gravitational/oxy/ratelimit.(*TokenLimiter).ServeHTTP
Jun 06 14:02:13 artemis teleport[2436596]:         /tmp/gomodcache/github.com/gravitational/oxy@v0.0.0-20211213172937-a1ba0900a4c9/connlimit/connlimit.go:75 github.com/gravitational/oxy/connlimit.(*ConnLimiter).ServeHTTP
Jun 06 14:02:13 artemis teleport[2436596]:         /opt/go/src/net/http/server.go:2879 net/http.serverHandler.ServeHTTP
Jun 06 14:02:13 artemis teleport[2436596]:         /opt/go/src/net/http/server.go:1930 net/http.(*conn).serve
Jun 06 14:02:13 artemis teleport[2436596]:         /opt/go/src/runtime/asm_arm64.s:1133 runtime.goexit
Jun 06 14:02:13 artemis teleport[2436596]: User Message: user "webvictim" is not found
Jun 06 14:02:13 artemis teleport[2436596]: ] web/apiserver.go:1116

It's possible that this would have resolved itself if I'd waited a little longer (this was probably ~20-30 seconds after the Postgres server restarted) but it seemed odd that only certain operations were erroring. Everything was fine after I restarted Teleport.

Bug details:

  • Teleport version: 9.2.3
@webvictim webvictim added the bug label Jun 6, 2022
@r0mant r0mant assigned alistanis and unassigned r0mant Jun 16, 2022
@alistanis
Copy link
Contributor

This appears to be an issue with the postgres driver we're using: jackc/pgx#672

I know lib/pq will receive the connection close from postgres' administrative broadcast on shutdown, store the error in the connection object in the connection pool, and on next attempt to execute a command will simply attempt to reconnect. Reading through that issue, it appears that pgx still lacks that capability in a stable release.

It looks like he did put a fix into an alpha 19 days ago though: jackc/pgx#672 (comment)

I took a look through that code and it looks to me like exactly the type of code we really want to vet before merging from upstream.

@webvictim @r0mant Should we keep this issue open until we can update the dependency once it's hit a stable release?

@webvictim
Copy link
Contributor Author

@alistanis Sounds good, yes! Thanks for investigating.

@r0mant
Copy link
Collaborator

r0mant commented Dec 7, 2022

We removed Postgres backend so this is no longer a concern.

@r0mant r0mant closed this as completed Dec 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants