Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pgx v5 beta 5 available for testing #1273

Closed
jackc opened this issue Aug 6, 2022 · 37 comments
Closed

pgx v5 beta 5 available for testing #1273

jackc opened this issue Aug 6, 2022 · 37 comments

Comments

@jackc
Copy link
Owner

jackc commented Aug 6, 2022

I'm pleased to announce the first beta release of pgx v5 is available.

The code is in branch v5-dev. There are a significant number of changes. Check out the changelog to see what's new.

Try it with:

go get github.com/jackc/pgx/v5@v5.0.0-beta.1

If no major issues are reported I plan to release v5.0.0 in September.


There have only been minor changes since v5.0.0-alpha.5.

  • Renamed pgxpool.NewConfig() to NewWithConfig()
  • Cancelling a context that caused the pgxpool to start a establishing a connection will not interrupt it.
  • Added pgxpool.Pool.Reset() (and prerequisite puddle update)
  • Port sslpassword support in pgconn from pgx v4
  • Documentation updates
  • Miscellaneous bug fixes
@drakkan
Copy link
Contributor

drakkan commented Aug 6, 2022

Hello, I just tested beta1 with SFTPGo by replacing lib/pq with pgx, all tests pass, but I'm a little worried about the increase in binary size. I'm using pgx as database/sql driver:

  • current binary size (go 1.19 and lib/pq 1.10.6): 36389448 bytes
  • v5.0.0-alpha.1: 38133192 bytes
  • v5.0.0-alpha.2: 38141384 bytes
  • v5.0.0-alpha.3: 39964072 bytes
  • v5.0.0-alpha.4: 40070792 bytes
  • v5.0.0-beta.1: 40136328 bytes

can we at least have a binary size increase as with alpha.1/.2? I'm using pgx as database/sql driver since SFTPGo also supports MySQL/MariaDB and SQLite with the same code, so no pgx specific features are used.

Just for reference here are my changes:

diff --git a/go.mod b/go.mod
index 1c46e2f..3a1ae65 100644
--- a/go.mod
+++ b/go.mod
@@ -34,10 +34,10 @@ require (
        github.com/hashicorp/go-hclog v1.2.2
        github.com/hashicorp/go-plugin v1.4.4
        github.com/hashicorp/go-retryablehttp v0.7.1
+       github.com/jackc/pgx/v5 v5.0.0-beta.1
        github.com/jlaffaye/ftp v0.0.0-20201112195030-9aae4d151126
        github.com/klauspost/compress v1.15.9
        github.com/lestrrat-go/jwx v1.2.25
-       github.com/lib/pq v1.10.6
        github.com/lithammer/shortuuid/v3 v3.0.7
        github.com/mattn/go-sqlite3 v1.14.14
        github.com/mhale/smtpd v0.8.0
@@ -46,7 +46,7 @@ require (
        github.com/pires/go-proxyproto v0.6.2
        github.com/pkg/sftp v1.13.5
        github.com/pquerna/otp v1.3.0
-       github.com/prometheus/client_golang v1.12.2
+       github.com/prometheus/client_golang v1.13.0
        github.com/robfig/cron/v3 v3.0.1
        github.com/rs/cors v1.8.3-0.20220619195839-da52b0701de5
        github.com/rs/xid v1.4.0
@@ -114,6 +114,8 @@ require (
        github.com/hashicorp/hcl v1.0.0 // indirect
        github.com/hashicorp/yamux v0.1.1 // indirect
        github.com/inconshreveable/mousetrap v1.0.0 // indirect
+       github.com/jackc/pgpassfile v1.0.0 // indirect
+       github.com/jackc/pgservicefile v0.0.0-20200714003250-2b9c44734f2b // indirect
        github.com/jmespath/go-jmespath v0.4.0 // indirect
        github.com/klauspost/cpuid/v2 v2.1.0 // indirect
        github.com/kr/fs v0.1.0 // indirect
git diff internal/dataprovider/pgsql.go
diff --git a/internal/dataprovider/pgsql.go b/internal/dataprovider/pgsql.go
index 8bdad15..9618e9c 100644
--- a/internal/dataprovider/pgsql.go
+++ b/internal/dataprovider/pgsql.go
@@ -26,8 +26,8 @@ import (
        "strings"
        "time"
 
-       // we import lib/pq here to be able to disable PostgreSQL support using a build tag
-       _ "github.com/lib/pq"
+       // we import pgx here to be able to disable PostgreSQL support using a build tag
+       _ "github.com/jackc/pgx/v5/stdlib"
 
        "github.com/drakkan/sftpgo/v2/internal/logger"
        "github.com/drakkan/sftpgo/v2/internal/version"
@@ -189,7 +189,7 @@ func init() {
 
 func initializePGSQLProvider() error {
        var err error
-       dbHandle, err := sql.Open("postgres", getPGSQLConnectionString(false))
+       dbHandle, err := sql.Open("pgx", getPGSQLConnectionString(false))
        if err == nil {
                providerLog(logger.LevelDebug, "postgres database handle created, connection string: %#v, pool size: %v",
                        getPGSQLConnectionString(true), config.PoolSize)

Thanks for your hard work, really appreciated!

@jackc
Copy link
Owner Author

jackc commented Aug 6, 2022

Glad it worked!

I've looked into the size increase and have found where it is occurring, but its not obvious to me what can or should be done about it.

It is the combination of a few things.

PostgreSQL supports a simple and an extended protocol with which to execute queries and the extended protocol has several ways it can be used. pgx supports multiple query execution modes pgx.QueryExecMode for various use cases.

In pgx's normal mode of operation it uses the extended protocol and gets the statement description before executing a statement. This means it knows the type of every parameter. This enables one of pgx's most convenient features: you can add support for encoding and decoding 3rd party types without modifying those types. e.g. https://github.com/shopspring/decimal can be used with directly without a wrapper type. This is only possible because it knows the underlying PostgreSQL type.

However, pgx also supports the simple protocol which never knows the PostgreSQL parameter types (and one of the extended protocol modes lacks parameter type knowledge as well). These modes are necessary for people behind connection pools such as PgBouncer that do not support prepared statements. In these modes, pgx needs some way of determining what type to encode the parameter into. Thus: https://pkg.go.dev/github.com/jackc/pgx/v5@v5.0.0-beta.1/pgtype#Map.RegisterDefaultPgType. It takes a Go value and associates it with the name of a PostgreSQL type. e.g.

m.RegisterDefaultPgType(int32(0), "int4")

This builds a map that is used to to decide how to encode parameters.

pgx also supports arrays. PostgreSQL arrays are extremely flexible. They support multidimensional arrays, arbitrary lower bounds, and NULL values. However, that is inconvenient for applications that don't need it (which is presumably the overwhelming majority). So pgx provides generic types Array[T] for complete PostgreSQL compatibility and FlatArray[T] for what an application probably really wants. Then for both of those there is the question of do you want the element to be a T or *T. pgx works with both.

pgx needs to register all those generic types for the execution modes that lack known parameter types. So it does this for each type:

func registerDefaultPgTypeVariants[T any](m *Map, name string) {
	arrayName := "_" + name

	var value T
	m.RegisterDefaultPgType(value, name)  // T
	m.RegisterDefaultPgType(&value, name) // *T

	var sliceT []T
	m.RegisterDefaultPgType(sliceT, arrayName)  // []T
	m.RegisterDefaultPgType(&sliceT, arrayName) // *[]T

	var slicePtrT []*T
	m.RegisterDefaultPgType(slicePtrT, arrayName)  // []*T
	m.RegisterDefaultPgType(&slicePtrT, arrayName) // *[]*T

	var arrayOfT Array[T]
	m.RegisterDefaultPgType(arrayOfT, arrayName)  // Array[T]
	m.RegisterDefaultPgType(&arrayOfT, arrayName) // *Array[T]

	var arrayOfPtrT Array[*T]
	m.RegisterDefaultPgType(arrayOfPtrT, arrayName)  // Array[*T]
	m.RegisterDefaultPgType(&arrayOfPtrT, arrayName) // *Array[*T]

	var flatArrayOfT FlatArray[T]
	m.RegisterDefaultPgType(flatArrayOfT, arrayName)  // FlatArray[T]
	m.RegisterDefaultPgType(&flatArrayOfT, arrayName) // *FlatArray[T]

	var flatArrayOfPtrT FlatArray[*T]
	m.RegisterDefaultPgType(flatArrayOfPtrT, arrayName)  // FlatArray[*T]
	m.RegisterDefaultPgType(&flatArrayOfPtrT, arrayName) // *FlatArray[*T]
}

This means that a lot of code is generated for each type pgx registers (59 at the present) regardless of if the application actually uses it. Removing these saves 2.3 MB (about 1.9 MB of which is array support).

pgx tries real hard to have the same behavior regardless of the query execution mode and succeeds in almost all cases. The only way I see to save that 2.3 MB is to move this code to a different package and require the application to perform some additional step if it planned to use one of those non-default query execution modes. It would probably be less than 10 additional lines of code for the affected applications, but it bothers me for it to not "just work" out of the box. But 2.3 MB wasted for most users bothers me to... I don't know...

@drakkan
Copy link
Contributor

drakkan commented Aug 7, 2022

@jackc thanks for your detailed explanation and hard work.

I want to add a little more context.
Some SFTPGo users consider it "heavy" or "light" based on its binary size.
If I add features that require a new third party library and then the binary size increases, I get comments like this

you are adding too many features, I don't want them, SFTPGo is getting very "heavy"

Obviously the binary size does not necessarily affect the runtime speed, however to avoid comments like this I pay attention to the binary size.

The latest versions of Go (apart from 1.19) have greatly reduced the size of the SFTPGo binary, so the developers of Go seem to be paying attention too.

lib/pq is barely maintained, I'm preparing the switch to pgx from a while and I think it is inevitable in the future.

For example I added SQLState to lib/pq, so cockroach-go no longer need to import lib/pq and this avoids its inclusion in the SFTPGo binary once I'll switch to pgx.

I think that pgx should "just work" out of box. What do you think about adding a build tag (pgx<something>) to avoid the inclusion of these additional types and features?

I mean something like this

https://github.com/drakkan/sftpgo/blob/main/internal/dataprovider/pgsql_disabled.go#L15
https://github.com/drakkan/sftpgo/blob/main/internal/dataprovider/pgsql.go#L15

so users that don't need these features can decide to add this build tag and reduce the final binary size.
Do you think it will complicate the pgx code base too much?
Maybe this way we can have a binary size comparable to lib/pq.

This method works fine in SFTPGo, default build:

go build -trimpath -ldflags "-s -w -X github.com/drakkan/sftpgo/v2/internal/version.commit=`git describe --always --dirty` -X github.com/drakkan/sftpgo/v2/internal/version.date=`date -u +%FT%TZ`" -o sftpgo
ls -la sftpgo
-rwxr-xr-x 1 nicola nicola 36405960  7 ago 08.00 sftpgo

minimal build:

go build -trimpath -tags nogcs,nos3,noportable,nobolt,nomysql,nopgsql,nosqlite,nometrics,noazblob -ldflags "-s -w -X github.com/drakkan/sftpgo/v2/internal/internal/version.commit=`git describe --always --dirty` -X github.com/drakkan/sftpgo/v2/internal/version.date=`date -u +%FT%TZ`" -o sftpgo
ls -la sftpgo
-rwxr-xr-x 1 nicola nicola 23363584  7 ago 08.01 sftpgo

thanks!

@jackc
Copy link
Owner Author

jackc commented Aug 13, 2022

@drakkan Build tags seem to work. just pushed 8256ab1 which adds the nopgxregisterdefaulttypes tag.

@drakkan
Copy link
Contributor

drakkan commented Aug 13, 2022

@drakkan Build tags seem to work. just pushed 8256ab1 which adds the nopgxregisterdefaulttypes tag.

Thanks! I confirm that it works:

  • lib/pq, sftpgo binary size: 36546720
  • pgx no build tag, size: 40285376
  • pgx with the above build tag, size: 38499552

we still have about 2MB more than lib/pq but it is better than before

@jackc jackc changed the title pgx v5 beta 1 available for testing pgx v5 beta 2 available for testing Aug 13, 2022
@jackc
Copy link
Owner Author

jackc commented Aug 13, 2022

Just tagged v5.0.0-beta.2.

Here are the changes since v5.0.0-beta.1:

  • Fix build on Windows.
  • Add build tag to eliminate default type registration code to reduce binary size.
  • CopyFrom now works with string values that need to be converted to binary format.
  • Fail-safe timeout for background pool connections does not override connect_timeout.
  • connect_timeout applies to each connect attempt individually when multiple hosts are specified instead of the entire process. This now matches PostgreSQL libpq behavior.

@drakkan
Copy link
Contributor

drakkan commented Aug 13, 2022

@drakkan Build tags seem to work. just pushed 8256ab1 which adds the nopgxregisterdefaulttypes tag.

@jackc I did a quick test using pgx v4, the sftpgo binary size is 38536672 so the new build tag offers no improvements over v4

@cristaloleg
Copy link
Contributor

BTW, are there huge performance wins between v4 and v5 ?

@vikstrous2
Copy link

FYI: We ran into this bug introduced in v5: #1281

@jameshartig
Copy link
Contributor

The LazyConnect option has been removed. Pools always lazily connect.

Can there be an optional method that blocks until the createIdleResources finishes? There could be a generic method Ready(context.Context) that blocks until a channel internally is closed or the context is cancelled.

We don't want to start our application accepting requests until our pool has at least MinConns.

@jackc
Copy link
Owner Author

jackc commented Aug 17, 2022

BTW, are there huge performance wins between v4 and v5 ?

@cristaloleg There may be minor changes one way or another but performance should be similar in most cases.

@jackc
Copy link
Owner Author

jackc commented Aug 17, 2022

Can there be an optional method that blocks until the createIdleResources finishes? There could be a generic method Ready(context.Context) that blocks until a channel internally is closed or the context is cancelled.

@jameshartig We could, but this can be done easily in application code with a function that calls Acquire n times and doesn't Release them until it has acquired min conns.

@jameshartig
Copy link
Contributor

@jameshartig We could, but this can be done easily in application code with a function that calls Acquire n times and doesn't Release them until it has acquired min conns.

If I'm understanding the code correctly that would result in more than MinConns to be created since createIdleResources would create MinConns asynchronously but while that's happening Acquire would also potentially create connections if it's called before createIdleResources or before the connections are ready.

@jackc
Copy link
Owner Author

jackc commented Aug 17, 2022

If I'm understanding the code correctly that would result in more than MinConns to be created since createIdleResources would create MinConns asynchronously but while that's happening Acquire would also potentially create connections if it's called before createIdleResources or before the connections are ready.

It might, I'd have to look at the code to be sure... but I don't see how that would really matter. When it is done the actual number of connections will be >= minconns and <= maxconns.

If you really wanted to be sure you didn't create connections you could loop over a Stat() call until the right number of conns was created. I initially suggested the Acquire() style so you would get an error if a connect failed.

@stephensli
Copy link

stephensli commented Aug 17, 2022

Hi @jackc,

Regarding the new type system, whats the best implementation flow for handling cases like the following: failed to encode args[0]: unable to encode []string{"3ffffd64-3521-4aac-8555-5a816c43467e"} into binary format for _uuid (OID 2951): cannot find encode plan in which the written value is a string slice to a UUID ? Using the github.com/vgarvardt/pgx-google-uuid/v5 register already.

Example query would be:

select id from table where id = any($1)

where id is a uuid.


It also does appear scanning a JSON value in which the JSON value is empty results in a scan fault.

Example

select data->'property' from table;
if err := rows.Scan(&item.Property); err != nil {
    return nil, err
}

error: can't scan into dest[0]: unexpected end of JSON input. Previously this would result in the default value. Is this change expected?


This seems to extend into many locations which use to handle null values but no longer, another example is:

row := db.QueryRow(ctx, "select deleted_at from table where id = $1", id)
var timeStamp time.Time
err := row.Scan(&timeStamp)

error: can't scan into dest[0]: cannot scan NULL into *time.Time

@jackc
Copy link
Owner Author

jackc commented Aug 20, 2022

@stephensli

Regarding []string that holds UUID: github.com/vgarvardt/pgx-google-uuid/v5 does not know how to encode a string. This is normally okay because pgx sends strings directly to PostgreSQL in the text format. However, pgx's array support works by using the the type to encode each element. Since it doesn't know how to encode a string it fails.

There are several possible solutions:

  • Use a []uuid.UUID instead.
  • Have PostgreSQL do the conversion. e.g. select id from table where id = any(($1::text[])::uuid[])
  • github.com/vgarvardt/pgx-google-uuid/v5 could add support to encoding string (or you could wrap it and add support).
  • Perhaps pgx could special case []string like it does string and send it as an text formatted array. However, I think there are unsolvable edge cases. e.g. []string{"abc", "def"} is encoded as {abc,def} into a PostgreSQL text[] but is encoded as ["abc", "def"] if encoded into a PostgreSQL json.

This seems to extend into many locations which use to handle null values but no longer, another example is:

row := db.QueryRow(ctx, "select deleted_at from table where id = $1", id)
var timeStamp time.Time
err := row.Scan(&timeStamp)

error: can't scan into dest[0]: cannot scan NULL into *time.Time

That is the same behavior as v4 -- and it is correct behavior. There is a difference between SQL NULL and the zero value.

I suspect the JSON example is the same -- trying to scan a SQL NULL into a type that cannot handle it.

@stephensli
Copy link

stephensli commented Aug 20, 2022

@jackc Thank you for getting back to me, I did have a quick look into wrapping github.com/vgarvardt/pgx-google-uuid/v5 to include it but I was capped on time to verifying a V5 upgrade and was struggling with the new type system.

Additionally I can confirm with the second point that this did work in V4. We did try to upgrade to v4 but the new memory pinning caused us a large amount of memory allocation, in a production environment not rotating the connections caused memory crashes.

If this is not meant to work and should have errored in V4 the error is not bubbling up.

@jackc
Copy link
Owner Author

jackc commented Aug 20, 2022

@stephensli

Additionally I can confirm with the second point that this did work in V4.

That's not what my test shows:

package main

import (
	"context"
	"log"
	"os"
	"time"

	"github.com/jackc/pgx/v4"
)

func main() {
	ctx := context.Background()

	conn, err := pgx.Connect(ctx, os.Getenv("DATABASE_URL"))
	if err != nil {
		log.Fatal(err)
	}
	defer conn.Close(ctx)

	var timeStamp time.Time
	err = conn.QueryRow(ctx, "select null::timestamptz").Scan(&timeStamp)
	if err != nil {
		log.Fatal(err)
	}

	log.Println(timeStamp)
}
jack@glados ~/dev/pgx_issues/v4-pgx ±master⚡ » go run main.go
2022/08/20 12:27:59 can't scan into dest[0]: cannot assign NULL to *time.Time
exit status 1

@stephensli
Copy link

stephensli commented Aug 20, 2022

@jackc I have some repeatable examples, I should have been more clear as I was focusing around JSON values, sorry. This might be now by design but it requires a lot of annoying tweaks to code like coalesce((data->'stub')::bool, false), to handle these edge cases, something not easily possible with time.

JSON Parsing

The following example works in V4 & V3 but no longer works in V5. The below example shows how JSON values now returned as nil are no longer falling to the default value instead they fail to parse with the following error: can't scan into dest[X]: unexpected end of JSON input

type Example struct {
	id        int
	stub      bool
	timestamp time.Time
}

func main() {
	var example Example

        query := "select id, data->'stub', data->'SourceTimestamp' from example where id=$1"
	rows := conn.QueryRow(ctx, query, 1)

	scanErr := rows.Scan(&example.id, &example.stub, &example.timestamp)
	must(scanErr)
}

Schema

create table example
(
   data jsonb,
   id   serial
       constraint example_pk
           primary key
);

create unique index example_id_uindex
   on example (id);

@jackc
Copy link
Owner Author

jackc commented Aug 20, 2022

@stephensli Thank you. That example was very helpful. I was able to step through the code in v4 and v5 and find the difference.

In v4 when a SQL NULL::json is assigned to a x it is assigned with json.Unmarshal([]byte("null"), x). However, to my great surprise json.Unmarshal() has the following documented behavior:

The JSON null value unmarshals into an interface, map, pointer, or slice by setting that Go value to nil. Because null is often used in JSON to mean “not present,” unmarshaling a JSON null into any other Go type has no effect on the value and produces no error.

This leads to this behavior:

b := true
err := json.Unmarshal([]byte("null"), &b)
fmt.Println(b, err) // => true <nil>

I would have expected an error or for b to be set to the zero value.

Frankly, this strikes me as wrong behavior especially in the context of of scanning rows. This means that if you were reading and processing row by row into a single record that a value from one row could leak into the next row. e.g. in your structure if row 1 had a stub value but row 2 did not row 2 would see row 1's value. It also means there is different behavior scanning a PostgreSQL NULL::json or NULL::jsonb than any other kind of NULL.

So I'm inclined to say that the v5 behavior is more correct (though it should have a better error message). But on the other hand, it would be changing long-standing behavior and would be kind of following json.Unmarshal() precedent (though with surprising and possibly intended implications).

Not really sure...

@buni
Copy link

buni commented Aug 22, 2022

Hi @jackc,

I ran into an issue, while upgrading to the latest v5 beta, I've a struct that looks smth like this

type RPS struct {
   ID                      string          `db:"id"`
   Name                string          `db:"name"`
   Timer                 time.Duration   `db:"timer"`
   MinEntry            decimal.Decimal `db:"min_entry"`  // this is using the shopspring decimal lib
}

Since upgrading to v5 I get this error on Insert: failed to encode args[3]: unable to encode \"99.9\" into binary format for numeric (OID 1700): cannot find encode plan

@stephensli
Copy link

@buni looks like you are missing the adapter: https://github.com/jackc/pgx-shopspring-decimal you can set it up with. OnConnect function.

import (
	"context"

	shop "github.com/jackc/pgx-shopspring-decimal"
	"github.com/jackc/pgx/v5"

	"github.com/deliveroo/co-restaurants/internal/database/types"
)

func AfterConnectHook(_ context.Context, conn *pgx.Conn) error {
	shop.Register(conn.TypeMap())
}

@jackc
Copy link
Owner Author

jackc commented Aug 23, 2022

@stephensli and @buni As it happens there was a common fix that solved encoding []string into uuid[] and encoding decimal.Decimal when it is using database/sql compatibility (i.e. github.com/jackc/pgx-shopspring-decimal is not in use). The solution was to fallback to the text format if encoding failed in the binary format.

@stephensli Regarding the json / NULL issue. After considering it more the last couple days I am convinced that the new v5 behavior is correct and the v4 behavior is a bug. I have fixed the error message in v5 to make clear the problem is scanning SQL NULL into a value that doesn't support it.

@jackc
Copy link
Owner Author

jackc commented Aug 23, 2022

Just tagged v5.0.0-beta.3.

Here are the changes since v5.0.0-beta.2

  • Fixed a bug where the column names (FieldDescription.Name) of a result set would be corrupted if read after a large amount of row data (Make FieldDescription.Name a string to avoid corruption #1281). Resolving this involved introducing a new type pgconn.FieldDescription that has a string Name field and using that in pgx and pgconn instead of the pgproto3.FieldDescription which has a []byte Name. This should not have any impact on most users.
  • Merged SNI support for SSL connections from v4.
  • Fixed a bug where non-string values that can be converted to a string (e.g. database/sql/driver.Valuer or elements of a slice) could not be used as a query argument when the preferred format for the PostgreSQL type was binary. It now converts the value to a string and uses the PostgreSQL text format for the argument.
  • Improved error message when failing to scan a NULL::json.
  • Standardized casing for NULL in error messages.

@stephensli
Copy link

Thank you for the update Jack!

@jackc jackc changed the title pgx v5 beta 2 available for testing pgx v5 beta 3 available for testing Aug 23, 2022
@pashagolub
Copy link
Contributor

Hello @jackc,

Thanks for your work. I'm trying to update pgxmock to support pgx/v5. The issue is pgconn.CommandTag declaration has a private field

// CommandTag is the status text returned by PostgreSQL for a query.
type CommandTag struct {
	s string
}

Thus I cannot init the result inside my library to mock it (I cannot create a result expectation as well to compare them later), e.g.

func NewResult(op string, rowsAffected int64) pgconn.CommandTag {
	return pgconn.CommandTag{fmt.Sprint(op, rowsAffected)}
}

I would suggest either declaring CommandTag as

// CommandTag is the status text returned by PostgreSQL for a query.
type CommandTag struct {
	S string
}

or provide a public constructor (now it's private)

func (pgConn *PgConn) MakeCommandTag(buf []byte) CommandTag {
	return CommandTag{s: string(buf)}
}

Kind regards

jackc added a commit that referenced this issue Aug 24, 2022
Useful for mocking and testing.

#1273 (comment)
@jackc
Copy link
Owner Author

jackc commented Aug 24, 2022

@pashagolub Fair enough. I added the function pgconn.NewCommandTag().

@jackc
Copy link
Owner Author

jackc commented Sep 3, 2022

Just tagged v5.0.0-beta.4.

Here are the changes since v5.0.0-beta.3:

  • Added NewCommandTag() to allow for mocking
  • Fix JSON scan not completely overwriting destination
  • Simplified implementation of the connection pool continuing the establishment of a connection in the background when an Acquire is canceled. These background connection attempts are immediately canceled when the pool is closed instead of blocking the close call.

Overall, this is a very small change since beta 3. However, any change to tricky concurrent code like the connection pool is worth thorough testing.

Assuming no further issues arise this will be the final beta release.

@jackc jackc changed the title pgx v5 beta 3 available for testing pgx v5 beta 4 available for testing Sep 3, 2022
@pashagolub
Copy link
Contributor

pashagolub commented Sep 5, 2022

Not sure if this is a bug or a feature.

Consider this:

type S struct{
  F1 int
  F2 string
}

type SuperS struct{
  S
  F3 bool
}

Now, If I'm trying to

rows, _ := pge.ConfigDb.Query(ctx, "select 1, 'foo', true")
dest, err := pgx.CollectRows(rows, pgx.RowToStructByPos[SuperS])

I got the error: got 3 values, but dst struct has only 2 fields

Should pgx unwrap embedded structs?

UPD:
This solves the error but I'm not sure if this is clear enough for a user:

rows, _ := pge.ConfigDb.Query(ctx, "select ROW(1, 'foo'), true")
dest, err := pgx.CollectRows(rows, pgx.RowToStructByPos[SuperS])

UPD2:
On the other hand pgx.RowToStructByName reading property names and/or db tags might help, e.g.

type S struct{
  f1 int `db:"int_field"`
  f2 string
}

UPD3:
Capitalized field names

@jackc
Copy link
Owner Author

jackc commented Sep 5, 2022

@pashagolub

As you found by using ROW() pgx can read PostgreSQL composite values into structs. I think this is the correct approach. Or you could use a custom RowTo function.

I don't think it is possible to do what you originally suggested because somehow we would need to know whether to scan the first field into S as a whole or to scan it into S.f1.

Also, I'm assuming the private fields are just a mistake in the example code -- obviously they would need to be public for any mapping based on reflection to work at all.

@jameshartig
Copy link
Contributor

I don't think it is possible to do what you originally suggested because somehow we would need to know whether to scan the first field into S as a whole or to scan it into S.f1.

Most other libraries handle this (mongo, json, etc) by scanning into f1 because S is embedded. If it was S S (as in a field named S of type S) then you'd scan into S.

@pashagolub
Copy link
Contributor

pashagolub commented Sep 6, 2022

Also, I'm assuming the private fields are just a mistake in the example code -- obviously they would need to be public for any mapping based on reflection to work at all.

Yes, my bad. Of course, they should be public. I was shrinking real-life code to the minimal valuable example. :) Thanks for noticing

@jackc
Copy link
Owner Author

jackc commented Sep 6, 2022

Most other libraries handle this (mongo, json, etc) by scanning into f1 because S is embedded. If it was S S (as in a field named S of type S) then you'd scan into S.

Oh, that's interesting. Hadn't thought about that distinction.

Surprised to find out that is what the json library does too. More magical than I would have expected.

Just added support for this in ee2622a. Didn't really want to make a change this late close to final release, but since changing it is a breaking change it seemed worthwhile.

@jackc jackc changed the title pgx v5 beta 4 available for testing pgx v5 beta 5 available for testing Sep 6, 2022
@jackc
Copy link
Owner Author

jackc commented Sep 6, 2022

Just tagged v5.0.0-beta.5.

The only change is pgx.RowToStructByPos now scans into embedded struct fields like the json package does.

@pashagolub
Copy link
Contributor

hey @jackc,

thanks a lot for your fix.

I want to propose RowToStructByName implementation. I'm not sure if you're interested to include it on this stage. But if you are, I can create a proper PR against the latest beta.

Here is the implementation followed by tests with full coverage:

// RowToStructByName returns a T scanned from row. T must be a struct. T must have the same number a named public fields as row
// has fields. The row and T fields will by matched by name.
func RowToStructByName[T any](row pgx.CollectableRow) (T, error) {
	var value T
	err := row.Scan(&namedStructRowScanner{ptrToStruct: &value})
	return value, err
}

// RowToAddrOfStructByPos returns the address of a T scanned from row. T must be a struct. T must have the same number a
// named public fields as row has fields. The row and T fields will by matched by name.
func RowToAddrOfStructByName[T any](row pgx.CollectableRow) (*T, error) {
	var value T
	err := row.Scan(&namedStructRowScanner{ptrToStruct: &value})
	return &value, err
}

type namedStructRowScanner struct {
	ptrToStruct any
}

func (rs *namedStructRowScanner) ScanRow(rows pgx.Rows) error {
	dst := rs.ptrToStruct
	dstValue := reflect.ValueOf(dst)
	if dstValue.Kind() != reflect.Ptr {
		return fmt.Errorf("dst not a pointer")
	}

	dstElemValue := dstValue.Elem()
	scanTargets, err := rs.appendScanTargets(dstElemValue, nil, rows.FieldDescriptions())

	if err != nil {
		return err
	}

	for i, t := range scanTargets {
		if t == nil {
			return fmt.Errorf("struct doesn't have corresponding row field %s", rows.FieldDescriptions()[i].Name)
		}
	}

	return rows.Scan(scanTargets...)
}

const structTagKey = "db"

func fieldPosByName(fldDescs []pgconn.FieldDescription, field string) (i int) {
	i = -1
	for i, desc := range fldDescs {
		if strings.EqualFold(desc.Name, field) {
			return i
		}
	}
	return
}

func (rs *namedStructRowScanner) appendScanTargets(dstElemValue reflect.Value, scanTargets []any, fldDescs []pgconn.FieldDescription) ([]any, error) {
	var err error
	dstElemType := dstElemValue.Type()

	if scanTargets == nil {
		scanTargets = make([]any, len(fldDescs))
	}

	for i := 0; i < dstElemType.NumField(); i++ {
		sf := dstElemType.Field(i)
		if sf.PkgPath != "" && !sf.Anonymous {
			// Field is unexported, skip it.
			continue
		}
		// Handle anoymous struct embedding, but do not try to handle embedded pointers.
		if sf.Anonymous && sf.Type.Kind() == reflect.Struct {
			scanTargets, err = rs.appendScanTargets(dstElemValue.Field(i), scanTargets, fldDescs)
			if err != nil {
				return nil, err
			}
		} else {
			dbTag, dbTagPresent := sf.Tag.Lookup(structTagKey)
			if dbTagPresent {
				dbTag = strings.Split(dbTag, ",")[0]
			}
			if dbTag == "-" {
				// Field is ignored, skip it.
				continue
			}
			colName := dbTag
			if !dbTagPresent {
				colName = sf.Name
			}
			fpos := fieldPosByName(fldDescs, colName)
			if fpos == -1 || fpos >= len(scanTargets) {
				return nil, fmt.Errorf("cannot find field %s in returned row", colName)
			}
			scanTargets[fpos] = dstElemValue.Field(i).Addr().Interface()
		}
	}

	return scanTargets, err
}

Tests:

package row_test

import (
	"context"
	"testing"

	"github.com/jackc/pgx/v5"
	"github.com/stretchr/testify/assert"
)

func TestRowToStructByNameEmbeddedStruct(t *testing.T) {
	type Name struct {
		Last  string `db:"last_name"`
		First string `db:"first_name"`
	}

	type person struct {
		Ignore    bool `db:"-"`
		unxported bool
		Name
		Age int32
	}

	ConfigDb := <init the database connection>
	ctx := context.Background()

	rows, _ := ConfigDb.Query(ctx, `select 'John' as first_name, 'Smith' as last_name, n as age from generate_series(0, 9) n`)
	slice, err := pgx.CollectRows(rows, pgx.RowToStructByName[person])
	assert.NoError(t, err)

	assert.Len(t, slice, 10)
	for i := range slice {
		assert.Equal(t, "Smith", slice[i].Name.Last)
		assert.Equal(t, "John", slice[i].Name.First)
		assert.EqualValues(t, i, slice[i].Age)
	}

	// check missing fields in a returned row
	rows, _ = ConfigDb.Query(ctx, `select 'Smith' as last_name, n as age from generate_series(0, 9) n`)
	_, err = pgx.CollectRows(rows, pgx.RowToStructByName[person])
	assert.ErrorContains(t, err, "cannot find field first_name in returned row")

	// check missing field in a destination struct
	rows, _ = ConfigDb.Query(ctx, `select 'John' as first_name, 'Smith' as last_name, n as age, null as ignore from generate_series(0, 9) n`)
	_, err = pgx.CollectRows(rows, pgx.RowToAddrOfStructByName[person])
	assert.ErrorContains(t, err, "struct doesn't have corresponding row field ignore")
}

@jackc
Copy link
Owner Author

jackc commented Sep 7, 2022

@pashagolub Yes, I'm interested in that feature. But I don't want to add any new features before the release of v5.0.0. Probably allow for a few weeks after release to fix any bugs that slipped through, too. So a PR is fine but I probably won't get to look at it for a month or so or until v5.1.0.

@jackc
Copy link
Owner Author

jackc commented Sep 17, 2022

v5.0.0 has been released!

The only changes since v5.0.0-beta.5 are:

  • Fix atomic alignment on 32-bit platforms
  • Fix long-standing bug in pgconn.Timeout(). It no longer considers context.Canceled as a timeout error.

@jackc jackc closed this as completed Sep 17, 2022
@jackc jackc unpinned this issue Sep 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants