Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connect_timeout is not obeyed for sslmode=allow|prefer #1672

Closed
smaher-edb opened this issue Jul 5, 2023 · 5 comments
Closed

connect_timeout is not obeyed for sslmode=allow|prefer #1672

smaher-edb opened this issue Jul 5, 2023 · 5 comments
Labels

Comments

@smaher-edb
Copy link
Contributor

smaher-edb commented Jul 5, 2023

Describe the bug
connect_timeout given in conn string is not obeyed if sslmode is not specified or equals sslmode=allow|prefer. It takes twice the amount specified by connect_timeout in conn string. While this behavior is correct if multi-host is provided in conn string, it doesn't look correct in case of single host. This behavior is also not matching with libpq (example given below).

To Reproduce
Steps to reproduce the behavior:

package main

import (
	"context"
	"fmt"
	"os"

	"github.com/jackc/pgx/v4"
)

func main() {
	// invalid port is provided to reproduce the connect timeout behavior here
	urlExample := "host=3.110.94.120 port=7432 dbname=mydb user=postgres connect_timeout=2"
	conn, err := pgx.Connect(context.Background(), urlExample)
	if err != nil {
		fmt.Fprintf(os.Stderr, "Unable to connect to database: %v\n", err)
		os.Exit(1)
	}
	defer conn.Close(context.Background())

	var name string
	err = conn.QueryRow(context.Background(), "select 'xyz'").Scan(&name)
	if err != nil {
		fmt.Fprintf(os.Stderr, "QueryRow failed: %v\n", err)
		os.Exit(1)
	}

	fmt.Println(name)
}

Output

Without sslmode it takes 4s instead of 2s.

$ date; go run cmd/main.go; date
Wed Jul  5 13:17:00 IST 2023
Unable to connect to database: failed to connect to `host=3.110.94.120 user=postgres database=mydb`: dial error (timeout: dial tcp 3.110.94.120:7432: i/o timeout)
exit status 1
Wed Jul  5 13:17:04 IST 2023

If sslmode=require is provided in connection string it correctly takes 2s.

$ date; go run cmd/main.go; date
Wed Jul  5 13:17:31 IST 2023
Unable to connect to database: failed to connect to `host=3.110.94.120 user=postgres database=mydb`: dial error (timeout: dial tcp 3.110.94.120:7432: i/o timeout)
exit status 1
Wed Jul  5 13:17:33 IST 2023

Expected behavior
Expectation is to match the behavior of libpq/psql.

$ date; psql -d "host=3.110.94.120 port=7432 dbname=mydb user=postgres connect_timeout=2"; date
Wed Jul  5 08:09:03 UTC 2023
psql: error: connection to server at "3.110.94.120", port 7432 failed: timeout expired
Wed Jul  5 08:09:05 UTC 2023

Actual behavior
When connection string has single host and no sslmode specified (or sslmode=allow or prefer) then connect_timeout given in connection string should be obeyed. Currently, it is taking two times that value. It seems to be happening because to implement these 2 sslmode, code correctly tries to connect to the server first with TLSConfig and if it fails then without TLSConfig. However, while doing this connect_timeout should still be obeyed. Otherwise it increases the actual timeout by multiple of 2. (most of the time when timeout error occurs it is expected to fail with or without TLSConfig)

In case of multi-host (-d "host=host1,host2 ..."), this behavior is correct as it is also given here.

For example, if you specify two hosts and connect_timeout is 5, each host will time out if no connection is made within 5 seconds, so the total time spent waiting for a connection might be up to 10 seconds.

However, it doesn't look ok if single host is provided and sslmode is not specified (default is prefer). libpq behavior in this case is also different than multi-host behavior (example given above).

Version

  • Go: go version go1.20.4 darwin/amd64
  • PostgreSQL: PostgreSQL 14.8 on x86_64-pc-linux-gnu, compiled by gcc (Debian 8.3.0-6) 8.3.0, 64-bit
  • pgx: require github.com/jackc/pgx/v4 v4.18.11

Additional context
I think this behavior got introduced because the code path for supporting multi-host as well as sslmode=allow|prefer is same. Both of them seems to be using fallbackConfig.

With following temporary fix it seems to be working fine. However, not sure whether it is the correct way to do it.

diff --git a/pgconn.go b/pgconn.go
--- pgconn.go
+++ pgconn.go
@@ -155,14 +155,17 @@
 	}
 
 	foundBestServer := false
 	var fallbackConfig *FallbackConfig
-	for _, fc := range fallbackConfigs {
+	for i, fc := range fallbackConfigs {
 		// ConnectTimeout restricts the whole connection process.
 		if config.ConnectTimeout != 0 {
-			var cancel context.CancelFunc
-			ctx, cancel = context.WithTimeout(octx, config.ConnectTimeout)
-			defer cancel()
+			// create new context first time or when previous host was different
+			if i == 0 || (fallbackConfigs[i].Host != fallbackConfigs[i-1].Host) {
+				var cancel context.CancelFunc
+				ctx, cancel = context.WithTimeout(octx, config.ConnectTimeout)
+				defer cancel()
+			}
 		} else {
 			ctx = octx
 		}
 		pgConn, err = connect(ctx, config, fc, false)
@smaher-edb smaher-edb added the bug label Jul 5, 2023
@jackc
Copy link
Owner

jackc commented Jul 8, 2023

I think that would work.

smaher-edb added a commit to smaher-edb/pgconn that referenced this issue Jul 13, 2023
connect_timeout given in conn string was not obeyed if sslmode is not specified or equals sslmode=allow|prefer. It took twice the amount of time specified by connect_timeout in conn string. While this behavior is correct if multi-host is provided in conn string, it doesn't look correct in case of single host. This behavior was also not matching with libpq.

Ref: [1672](jackc/pgx#1672)
smaher-edb added a commit to smaher-edb/pgx that referenced this issue Jul 13, 2023
connect_timeout given in conn string was not obeyed if sslmode is not specified (default is prefer) or equals sslmode=allow|prefer. It took twice the amount of time specified by connect_timeout in conn string. While this behavior is correct if multi-host is provided in conn string, it doesn't look correct in case of single host. This behavior was also not matching with libpq.

The root cause was to implement sslmode=allow|prefer conn are tried twice. First with TLSConfig and if that doesn't work then without TLSConfig. The fix for this issue now uses the same context if same host is being tried out. This change won't affect the existing multi-host behavior.

This PR goal is to close issue [jackc/issues/1672](jackc#1672)
@smaher-edb
Copy link
Contributor Author

I think that would work.

Thanks for the confirmation. I've created these 2 PRs for v4 and v5. Please take a look at it. This is the first time so I'm not sure whether needs to do anything else here.

jackc pushed a commit to jackc/pgconn that referenced this issue Jul 15, 2023
connect_timeout given in conn string was not obeyed if sslmode is not specified or equals sslmode=allow|prefer. It took twice the amount of time specified by connect_timeout in conn string. While this behavior is correct if multi-host is provided in conn string, it doesn't look correct in case of single host. This behavior was also not matching with libpq.

Ref: [1672](jackc/pgx#1672)
jackc pushed a commit that referenced this issue Jul 15, 2023
connect_timeout given in conn string was not obeyed if sslmode is not specified (default is prefer) or equals sslmode=allow|prefer. It took twice the amount of time specified by connect_timeout in conn string. While this behavior is correct if multi-host is provided in conn string, it doesn't look correct in case of single host. This behavior was also not matching with libpq.

The root cause was to implement sslmode=allow|prefer conn are tried twice. First with TLSConfig and if that doesn't work then without TLSConfig. The fix for this issue now uses the same context if same host is being tried out. This change won't affect the existing multi-host behavior.

This PR goal is to close issue [/issues/1672](#1672)
@jackc
Copy link
Owner

jackc commented Jul 15, 2023

LGTM - merged both.

@jackc jackc closed this as completed Jul 15, 2023
@smaher-edb
Copy link
Contributor Author

smaher-edb commented Jul 17, 2023

Thanks @jackc. Not sure what is the standard release cycle/date but as we currently need this in pgx/v4 is it possible to release this patch through v1.14.1 of pgconn or any idea when the next release will be?

@jackc
Copy link
Owner

jackc commented Jul 20, 2023

Just released pgconn v1.14.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants