Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle copy from stdin query #64

Open
hectorj opened this issue Oct 8, 2023 · 8 comments
Open

handle copy from stdin query #64

hectorj opened this issue Oct 8, 2023 · 8 comments

Comments

@hectorj
Copy link

hectorj commented Oct 8, 2023

Hi, and first of all: thank you for your work! 馃檪

I am trying to use pgx's CopyFrom method on a server created with psql-wire.

Reading the spec here https://www.postgresql.org/docs/15/protocol-flow.html#PROTOCOL-COPY, the server is supposed to send a CopyInResponse message before receiving the data via multiple CopyData messages, ending with a CopyDone message.

I don't see a way to handle that currently inside a ParseFn, but maybe I missed something?

The query reaches my parser, and the wire.PreparedStatementFn is called just after, but then we can't properly process the whole operation.

Calling writer.Complete("CopyInResponse") makes the pgx client return without error, but with 0 rows inserted (and we never see the data to be copied server-side).

I'm not sure how this should be handled.

Maybe a different handler? That would make psql-wire responsible for parsing and recognizing a copy from query, I'm not sure that's great.

Or change the wire.DataWriter to be able to send the CopyInResponse and read the next messages? That may be better.

This may be a "won't fix", but I wanted to at least have a discussion about it.

Even if this hurdle falls, I know pgx forces the binary format on copy from so I won't be out of the woods yet 馃槃

@jeroenrinzema
Copy link
Owner

馃憢 Hi @hectorj, thanks for opening an issue!

I indeed have not yet included support for the COPY protocol flow inside psql-wire. It is a feature I am interested in implementing. I would like to try to keep the server implementation as simple as possible, with the possibility of defining more complex flows if needed. It appears that the COPY protocol flow is not backwards compatible and would require its own handler methods. The COPY handler could accept a reader and writer interface, allowing the server to read or write rows from/to the client.

Support for text/binary values has been implemented inside the encoder. It is possible to easily define the encoding type while reading incoming messages.

It would be great if you could help me set up a small example project using the COPY protocol. I would like to inspect the packages being sent back and forth between the client and server using the psql-proxy CLI.

@hectorj
Copy link
Author

hectorj commented Oct 9, 2023

I tried setting up an example here: https://github.com/hectorj/psql-proxy-copy-example

But I keep hitting this error message when connecting to the proxy: ERROR unexpected error while reading a typed client message err="message size 1396920893, bigger than maximum allowed message size 16777216"

I don't have this issue with psql-wire or when connecting directly to postgres, so I'm not sure what I missed.

@jeroenrinzema
Copy link
Owner

jeroenrinzema commented Oct 9, 2023

Ah, this might be a bug. This error is often thrown when an unexpected message is read from the client. I will take a look at this later today.

The psql proxy is unaware of the message flows. It might be that untyped messages are send during the COPY protocol flow.

@hectorj
Copy link
Author

hectorj commented Oct 14, 2023

I think it was due to SASL authentication.

Setting POSTGRES_HOST_AUTH_METHOD: md5 in the postgres container removes the error. Not much logged though, I'll keep digging.

Edit: completely wrong

@hectorj
Copy link
Author

hectorj commented Oct 14, 2023

Edit : removed because of many false assumptions

@hectorj
Copy link
Author

hectorj commented Oct 14, 2023

OK, I just had to set the tls flag even though I was using sslmode=disable.

Now here are the full proxy logs:

2023/10/14 16:48:33 INFO proxy listening address=:2345
2023/10/14 16:48:39 INFO incoming connection, dialing PostgreSQL server!
2023/10/14 16:48:39 INFO starting sniffing the PSQL packages
2023/10/14 16:48:39 INFO --> version=196608
2023/10/14 16:48:39 INFO <<- type=R msg="\x00\x00\x00\nSCRAM-SHA-256\x00\x00"
2023/10/14 16:48:39 INFO ->> type=p msg="SCRAM-SHA-256\x00\x00\x00\x00 n,,n=,r=CHzF7wyhaIC7QKYeU9KVkVi5"
2023/10/14 16:48:39 INFO <<- type=R msg="\x00\x00\x00\vr=CHzF7wyhaIC7QKYeU9KVkVi5wMPqcMqo0zKQD4t8h3ENdh0n,s=CV3OW7JkDKmPL7sfcEBvXw==,i=4096"
2023/10/14 16:48:39 INFO ->> type=p msg="c=biws,r=CHzF7wyhaIC7QKYeU9KVkVi5wMPqcMqo0zKQD4t8h3ENdh0n,p=eI+tM3zXgwD2TdcGz+a5BWsIyN9TO2yV0sRaCriuFP0="
2023/10/14 16:48:39 INFO <<- type=R msg="\x00\x00\x00\fv=yI9bfwCkkeyc+1u/PmNoMymUj9bLD5yVbgvdU11AVsY="
2023/10/14 16:48:39 INFO <<- type=R msg="\x00\x00\x00\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="application_name\x00\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="client_encoding\x00UTF8\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="DateStyle\x00ISO, MDY\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="default_transaction_read_only\x00off\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="in_hot_standby\x00off\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="integer_datetimes\x00on\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="IntervalStyle\x00postgres\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="is_superuser\x00on\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="server_encoding\x00UTF8\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="server_version\x0015.2 (Debian 15.2-1.pgdg110+1)\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="session_authorization\x00user\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="standard_conforming_strings\x00on\x00"
2023/10/14 16:48:39 INFO <<- type=S msg="TimeZone\x00Etc/UTC\x00"
2023/10/14 16:48:39 INFO <<- type=K msg="\x00\x00\x00K/\xb5J1"
2023/10/14 16:48:39 INFO <<- type=Z msg=I
2023/10/14 16:48:39 INFO ->> type=Q msg="DROP TABLE IF EXISTS t CASCADE;\x00"
2023/10/14 16:48:39 INFO <<- type=C msg="DROP TABLE\x00"
2023/10/14 16:48:39 INFO <<- type=Z msg=I
2023/10/14 16:48:39 INFO ->> type=Q msg="CREATE TABLE public.t (id serial primary key, field1 text null, field2 int not null, field3 jsonb not null)\x00"
2023/10/14 16:48:39 INFO <<- type=C msg="CREATE TABLE\x00"
2023/10/14 16:48:39 INFO <<- type=Z msg=I
2023/10/14 16:48:39 INFO ->> type=P msg="stmtcache_1\x00select \"field1\", \"field2\", \"field3\" from \"public\".\"t\"\x00\x00\x00"
2023/10/14 16:48:39 INFO ->> type=D msg="Sstmtcache_1\x00"
2023/10/14 16:48:39 INFO ->> type=S msg=""
2023/10/14 16:48:39 INFO <<- type=1 msg=""
2023/10/14 16:48:39 INFO <<- type=t msg="\x00\x00"
2023/10/14 16:48:39 INFO <<- type=T msg="\x00\x03field1\x00\x00\x00@\x18\x00\x02\x00\x00\x00\x19\xff\xff\xff\xff\xff\xff\x00\x00field2\x00\x00\x00@\x18\x00\x03\x00\x00\x00\x17\x00\x04\xff\xff\xff\xff\x00\x00field3\x00\x00\x00@\x18\x00\x04\x00\x00\x0e\xda\xff\xff\xff\xff\xff\xff\x00\x00"
2023/10/14 16:48:39 INFO <<- type=Z msg=I
2023/10/14 16:48:39 INFO ->> type=Q msg="copy \"public\".\"t\" ( \"field1\", \"field2\", \"field3\" ) from stdin binary;\x00"
2023/10/14 16:48:39 INFO <<- type=G msg="\x01\x00\x03\x00\x01\x00\x01\x00\x01"
2023/10/14 16:48:39 INFO ->> type=d msg="PGCOPY\n\xff\r\n\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x03\x00\x00\x00\ttest text\x00\x00\x00\x04\x00\x00\x00*\x00\x00\x00\x11\x01{\"some\": \"json\"}\x00\x03\xff\xff\xff\xff\x00\x00\x00\x04\x00\x00\x00\a\x00\x00\x00\x05\x01null"
2023/10/14 16:48:39 INFO ->> type=c msg=""
2023/10/14 16:48:39 INFO <<- type=C msg="COPY 2\x00"
2023/10/14 16:48:39 INFO <<- type=Z msg=I

My example repo is now functional if you want to try it.

@jeroenrinzema
Copy link
Owner

Hi @hectorj thanks for the update! I apologize, but I'm currently dealing with some personal matters this week that require my attention. I will take a look at this issue as soon as possible. It seems the COPY flow simply has to be implemented for the messages to be handled correctly.

@hectorj
Copy link
Author

hectorj commented Oct 17, 2023

Hey, no worry: you don't have to apologize for not putting benevolent work first ^^

I'm actually testing another lower-level approach (closer to what psql-proxy does) for my project, so there is really no hurry at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants