Restarting Postgres server used for backend while Teleport is running results in odd behaviour #13215

webvictim · 2022-06-06T18:44:14Z

Expected behavior:
If you restart the Postgres server which is providing Teleport's backend database, Teleport should reconnect and recover gracefully.

Current behavior:
Restarting the Postgres server doesn't appear to cause any errors, but attempting to perform an SSO login via Github resulted in an error:

Jun 06 14:02:13 artemis teleport[2436596]: 2022-06-06T14:02:13-03:00 DEBU [AUTH]      Retrieved 1 teams for GitHub user webvictim. auth/github.go:443
Jun 06 14:02:13 artemis teleport[2436596]: 2022-06-06T14:02:13-03:00 DEBU [GITHUB]    Claims: &types.GithubClaims{Username:"webvictim", OrganizationToTeams:map[string][]string{"test-org":[]string{"admins"}}, Teams:[]string{"admins"}}. auth/github.go:461
Jun 06 14:02:13 artemis teleport[2436596]: 2022-06-06T14:02:13-03:00 DEBU [GITHUB]    Generating dynamic identity github/webvictim with logins: [admin access editor]. auth/github.go:371
Jun 06 14:02:13 artemis teleport[2436596]: 2022-06-06T14:02:13-03:00 INFO [AUDIT]     user.login attributes:map[webvictim-org:[admins]] cluster_name:teleport.example.com code:T1001W ei:0 error:[user "webvictim" is not found] event:user.login message:user "webvictim" is not found method:github success:false time:2022-06-06T17:02:13.125Z uid:b368ef6c-567f-4360-95>
Jun 06 14:02:13 artemis teleport[2436596]: 2022-06-06T14:02:13-03:00 ERRO [WEB]       Error while processing callback. auth:github error:[
Jun 06 14:02:13 artemis teleport[2436596]: ERROR REPORT:
Jun 06 14:02:13 artemis teleport[2436596]: Original Error: *trace.NotFoundError user &#34;webvictim&#34; is not found
Jun 06 14:02:13 artemis teleport[2436596]: Stack Trace:
Jun 06 14:02:13 artemis teleport[2436596]: Caught:
Jun 06 14:02:13 artemis teleport[2436596]:         /go/src/github.com/gravitational/teleport/lib/httplib/httplib.go:142 github.com/gravitational/teleport/lib/httplib.ConvertResponse
Jun 06 14:02:13 artemis teleport[2436596]:         /go/src/github.com/gravitational/teleport/lib/auth/clt.go:278 github.com/gravitational/teleport/lib/auth.(*Client).PostJSON
Jun 06 14:02:13 artemis teleport[2436596]:         /go/src/github.com/gravitational/teleport/lib/auth/clt.go:1258 github.com/gravitational/teleport/lib/auth.(*Client).ValidateGithubAuthCallback
Jun 06 14:02:13 artemis teleport[2436596]:         /go/src/github.com/gravitational/teleport/lib/web/apiserver.go:1114 github.com/gravitational/teleport/lib/web.(*Handler).githubCallback
Jun 06 14:02:13 artemis teleport[2436596]:         /go/src/github.com/gravitational/teleport/lib/web/apiserver.go:2562 github.com/gravitational/teleport/lib/web.(*Handler).WithMetaRedirect.func1
Jun 06 14:02:13 artemis teleport[2436596]:         /tmp/gomodcache/github.com/gravitational/httprouter@v1.3.1-0.20220408074523-c876c5e705a5/router.go:399 github.com/julienschmidt/httprouter.(*Router).ServeHTTP
Jun 06 14:02:13 artemis teleport[2436596]:         /opt/go/src/net/http/server.go:2090 net/http.StripPrefix.func1
Jun 06 14:02:13 artemis teleport[2436596]:         /opt/go/src/net/http/server.go:2047 net/http.HandlerFunc.ServeHTTP
Jun 06 14:02:13 artemis teleport[2436596]:         /go/src/github.com/gravitational/teleport/lib/web/apiserver.go:454 github.com/gravitational/teleport/lib/web.NewHandler.func1
Jun 06 14:02:13 artemis teleport[2436596]:         /opt/go/src/net/http/server.go:2047 net/http.HandlerFunc.ServeHTTP
Jun 06 14:02:13 artemis teleport[2436596]:         /tmp/gomodcache/github.com/gravitational/httprouter@v1.3.1-0.20220408074523-c876c5e705a5/router.go:460 github.com/julienschmidt/httprouter.(*Router).ServeHTTP
Jun 06 14:02:13 artemis teleport[2436596]:         /go/src/github.com/gravitational/teleport/lib/web/apiserver.go:215 github.com/gravitational/teleport/lib/web.(*APIHandler).ServeHTTP
Jun 06 14:02:13 artemis teleport[2436596]:         /tmp/gomodcache/github.com/gravitational/oxy@v0.0.0-20211213172937-a1ba0900a4c9/ratelimit/tokenlimiter.go:118 github.com/gravitational/oxy/ratelimit.(*TokenLimiter).ServeHTTP
Jun 06 14:02:13 artemis teleport[2436596]:         /tmp/gomodcache/github.com/gravitational/oxy@v0.0.0-20211213172937-a1ba0900a4c9/connlimit/connlimit.go:75 github.com/gravitational/oxy/connlimit.(*ConnLimiter).ServeHTTP
Jun 06 14:02:13 artemis teleport[2436596]:         /opt/go/src/net/http/server.go:2879 net/http.serverHandler.ServeHTTP
Jun 06 14:02:13 artemis teleport[2436596]:         /opt/go/src/net/http/server.go:1930 net/http.(*conn).serve
Jun 06 14:02:13 artemis teleport[2436596]:         /opt/go/src/runtime/asm_arm64.s:1133 runtime.goexit
Jun 06 14:02:13 artemis teleport[2436596]: User Message: user &#34;webvictim&#34; is not found
Jun 06 14:02:13 artemis teleport[2436596]: ] web/apiserver.go:1116

It's possible that this would have resolved itself if I'd waited a little longer (this was probably ~20-30 seconds after the Postgres server restarted) but it seemed odd that only certain operations were erroring. Everything was fine after I restarted Teleport.

Bug details:

Teleport version: 9.2.3

alistanis · 2022-07-15T10:41:36Z

This appears to be an issue with the postgres driver we're using: jackc/pgx#672

I know lib/pq will receive the connection close from postgres' administrative broadcast on shutdown, store the error in the connection object in the connection pool, and on next attempt to execute a command will simply attempt to reconnect. Reading through that issue, it appears that pgx still lacks that capability in a stable release.

It looks like he did put a fix into an alpha 19 days ago though: jackc/pgx#672 (comment)

I took a look through that code and it looks to me like exactly the type of code we really want to vet before merging from upstream.

@webvictim @r0mant Should we keep this issue open until we can update the dependency once it's hit a stable release?

webvictim · 2022-08-29T18:37:36Z

@alistanis Sounds good, yes! Thanks for investigating.

r0mant · 2022-12-07T17:42:27Z

We removed Postgres backend so this is no longer a concern.

webvictim added the bug label Jun 6, 2022

webvictim assigned r0mant Jun 6, 2022

r0mant assigned alistanis and unassigned r0mant Jun 16, 2022

r0mant closed this as completed Dec 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restarting Postgres server used for backend while Teleport is running results in odd behaviour #13215

Restarting Postgres server used for backend while Teleport is running results in odd behaviour #13215

webvictim commented Jun 6, 2022

alistanis commented Jul 15, 2022

webvictim commented Aug 29, 2022

r0mant commented Dec 7, 2022

Restarting Postgres server used for backend while Teleport is running results in odd behaviour #13215

Restarting Postgres server used for backend while Teleport is running results in odd behaviour #13215

Comments

webvictim commented Jun 6, 2022

alistanis commented Jul 15, 2022

webvictim commented Aug 29, 2022

r0mant commented Dec 7, 2022