Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SELECT: literal variable values are coerced into strings #194

Closed
phoe opened this issue Mar 30, 2019 · 24 comments
Closed

SELECT: literal variable values are coerced into strings #194

phoe opened this issue Mar 30, 2019 · 24 comments

Comments

@phoe
Copy link
Contributor

phoe commented Mar 30, 2019

SQL/TEST> (pomo:query "SELECT $1;" 1)
(("1"))

I put in an integer and received a string.
Is this expected? Are types not preserved across SELECT queries?

Related to ruricolist/cl-yesql#22

@phoe
Copy link
Contributor Author

phoe commented Mar 30, 2019

I thought that maybe my SQL readtable got somewhat wonky, but I reproduced it on a clean Lisp image:

CL-USER> (ql:quickload :postmodern)
To load "postmodern":
  Load 1 ASDF system:
    postmodern
; Loading "postmodern"
.
(:POSTMODERN)
CL-USER> (pomo:connect-toplevel "database" "user" "password" "localhost")
; No value
CL-USER> (pomo:query "SELECT $1;" 1)
(("1"))
1

@sabracrolleton
Copy link
Collaborator

What happens if you actually select an integer from a table?

@phoe
Copy link
Contributor Author

phoe commented Mar 30, 2019

This one works fine.

SQL/TEST> (pomo:query "SELECT id FROM persona;" :single)
2
1

@phoe phoe changed the title SELECT values are coerced into strings SELECT: literal values are coerced into strings Mar 30, 2019
@phoe phoe changed the title SELECT: literal values are coerced into strings SELECT: literal variable values are coerced into strings Mar 30, 2019
@sabracrolleton
Copy link
Collaborator

Looking into it. By the way, can you check whether the one line commit I just made fixes your dropped polish character problem?

@phoe
Copy link
Contributor Author

phoe commented Mar 30, 2019

Yes, #191 is fixed now. Thank you!

@sabracrolleton
Copy link
Collaborator

While I chase this, does the following work for you?

(pomo:query "select $1::integer" 1)

@phoe
Copy link
Contributor Author

phoe commented Mar 30, 2019

Yes, explicit casting works.

SQL/TEST> (pomo:query "select $1::integer" 1)
((1))

This would imply that string values are somehow already passed to Postgres as placeholder values. (Keep in mind I'm no specialist in the matter though.)

@phoe
Copy link
Contributor Author

phoe commented Mar 30, 2019

Take a look at this.

I make a (break) right before https://github.com/marijnh/Postmodern/blob/master/cl-postgres/messages.lisp#L173

SQL/TEST> (CL-POSTGRES:EXEC-PREPARED POSTMODERN:*DATABASE* "" (LIST "true")
                                     'CL-POSTGRES:LIST-ROW-READER))

This is the stack frame:

  0: (CL-POSTGRES::BIND-MESSAGE #<CL+SSL::SSL-STREAM for #<FD-STREAM for "socket 127.0.0.1:47770, peer: 127.0.0.1:5432" {10069F1873}>> "" #(T) #<unavailable argument>)
      Locals:
        N-PARAMS = 1
        NAME = ""
        PARAM-FORMATS = #(0)
        PARAM-SIZES = #(4)
        PARAM-VALUES = #("true")
        RESULT-FORMATS = #(T)
        SOCKET = #<CL+SSL::SSL-STREAM for #<FD-STREAM for "socket 127.0.0.1:47770, peer: 127.0.0.1:5432" {10069F1873}>>
        SB-C::X#2 = 1

However, if I:

SQL/TEST> (CL-POSTGRES:EXEC-PREPARED POSTMODERN:*DATABASE* "" (LIST T)
                                     'CL-POSTGRES:LIST-ROW-READER))

This is the stack frame:

  0: (CL-POSTGRES::BIND-MESSAGE #<CL+SSL::SSL-STREAM for #<FD-STREAM for "socket 127.0.0.1:47770, peer: 127.0.0.1:5432" {10069F1873}>> "" #(T) #<unavailable argument>)
      Locals:
        N-PARAMS = 1
        NAME = ""
        PARAM-FORMATS = #(0)
        PARAM-SIZES = #(4)
        PARAM-VALUES = #("true")
        RESULT-FORMATS = #(T)
        SOCKET = #<CL+SSL::SSL-STREAM for #<FD-STREAM for "socket 127.0.0.1:47770, peer: 127.0.0.1:5432" {10069F1873}>>
        SB-C::X#2 = 1

The variables in these frames are identical. It seems to me that the symbol T and the string "true", during the message-binding step, are being transformed into the same string value.

And if Postgres receives a string value, then it would explain why Postgres returns a string value.

Edit: The #<unavailable argument> in each case contains the parameters list, so either (T) or ("true").

@phoe
Copy link
Contributor Author

phoe commented Mar 30, 2019

And this, in turn, would mean that the culprit is at https://github.com/marijnh/Postmodern/blob/master/cl-postgres/sql-string.lisp#L177 which serializes T into "true", because https://github.com/marijnh/Postmodern/blob/master/cl-postgres/messages.lisp#L167 orders it to.

After https://github.com/marijnh/Postmodern/blob/master/cl-postgres/messages.lisp#L167 there is no longer any difference between the symbol T and the string "true".

@phoe
Copy link
Contributor Author

phoe commented Mar 30, 2019

When sending the parse message, Postmodern omits all parameter types.

https://github.com/marijnh/Postmodern/blob/master/cl-postgres/messages.lisp#L100 sends two null bytes for what seems to be the last entry at https://github.com/crate/crate/blob/master/sql/src/main/java/io/crate/protocols/postgres/PostgresWireProtocol.java#L464

This most likely causes Postgres to interpret them however it wants - in this case, as strings.

@phoe
Copy link
Contributor Author

phoe commented Mar 30, 2019

When executing (CL-POSTGRES:PREPARE-QUERY POSTMODERN:*DATABASE* "" "SELECT $1"), the message sent to Postgres is #(80 0 0 0 17 0 83 69 76 69 67 84 32 36 49 0 0 0) which is:

80 ;; #\P
00 00 00 17 ;; length
00 ;; null statement name
83 69 76 69 67 84 32 36 49 00 ;; query, (flex:octets-to-string #(83 69 76 69 67 84 32 36 49)) ;=> "SELECT $1"
00 00 ;; no parameter types

@phoe
Copy link
Contributor Author

phoe commented Mar 31, 2019

By means of hacking the cl-postgres::parse-message function, I have instead sent the following message, #(80 0 0 0 17 0 83 69 76 69 67 84 32 36 49 0 0 1 0 0 0 16).

80 ;; #\P
00 00 00 21 ;; new length, 4 bytes longer due to one parameter size
00 ;; null statement name
83 69 76 69 67 84 32 36 49 00 ;; query, (flex:octets-to-string #(83 69 76 69 67 84 32 36 49)) ;=> "SELECT $1"
00 01 ;; one parameter type
00 00 00 16 ;; 16 = boolean

And voila!

SQL/TEST> (CL-POSTGRES:EXEC-PREPARED POSTMODERN:*DATABASE* "" (LIST T)
                                     'CL-POSTGRES:LIST-ROW-READER)
((T))

Conclusion: we prepare the queries wrong by completely skipping parameter types. The other Postgres wire clients I found, e.g. written in Lua and Java, do not skip that information.

@phoe
Copy link
Contributor Author

phoe commented Mar 31, 2019

The question is: why does BIND-MESSAGE, which seemingly processes a list of return formats, not catch this bug? Because it seems that the variable PARAM-FORMATS used inside its inner loop is bound an array of zeroes, too, which means "unspecified format".

@phoe
Copy link
Contributor Author

phoe commented Mar 31, 2019

Or, rather:

                        (etypecase param
                         (string
                          (set-param 0 (enc-byte-length param) param))
                         ((vector (unsigned-byte 8))
                          (set-param 1 (length param) param)))

And set-param is:

          :do (flet ((set-param (format size value)
                       (setf (aref param-formats i) format
                             (aref param-sizes i) size
                             (aref param-values i) value)))

In other words, PARAM-FORMATS is an array that can only contain zeroes or ones. Therefore, BIND-MESSAGE can only request "whatever"s or octet-vectors.

@phoe
Copy link
Contributor Author

phoe commented Mar 31, 2019

No, I misunderstood. param-formats MUST be either 0 or 1 since it's either text or binary. See https://github.com/crate/crate/blob/master/sql/src/main/java/io/crate/protocols/postgres/PostgresWireProtocol.java#L480

Therefore, it is impossible to have generic prepared queries like SELECT $1;. And that should be documented somewhere to avoid surprises like this my one right now.

Wherever possible, PARSE-MESSAGE should declare types ahead of time. And this means that pomo:query called with parameters should grab the types of these params and stuff them into PREPARE-QUERY, before EXECUTE-QUERY is then called with the concrete values.

Also, it does sound logical that all types should be declared ahead of time - preparing a query sounds like it would involve preparing the result types, too.

Sorry for the ranting - I've been trying to document my findings here as I dig deeper and deeper.

@phoe
Copy link
Contributor Author

phoe commented Mar 31, 2019

A possible solution I see is to create a new generic function with method signatures similar to TO-SQL-STRING that maps Lisp types to Postgres OIDs. That value can then optionally be used in parse-message.

@phoe
Copy link
Contributor Author

phoe commented Mar 31, 2019

Confirmed - in the prepared statement of SELECT $1; Postgres expects a text parameter, that is then returned as-is. So, we get text in return.

db=> PREPARE foo AS SELECT $1;
PREPARE
db=> SELECT * from pg_prepared_statements;
 name |         statement         |         prepare_time          | parameter_types | from_sql 
------+---------------------------+-------------------------------+-----------------+----------
 foo  | PREPARE foo AS SELECT $1; | 2019-03-31 11:57:23.387068+02 | {text}          | t

Therefore the issue is not with how prepared statements work in Postgres. The issue is with how pomo:query has all the data to compute the argument types and pass them to Postgres during the parsing step, so it can get proper result types during the binding step - and yet it doesn't do that.

@sabracrolleton
Copy link
Collaborator

My apologies for not being responsive. It has been a bad year. Did you come up with a patch or are we still at the problem stage? Your analysis is much appreciated.

@phoe
Copy link
Contributor Author

phoe commented Dec 1, 2019

No problem, hope that things are going steady for you.

I haven't produced any patch for that yet.

@phoe
Copy link
Contributor Author

phoe commented Dec 1, 2019

I think that the proper solution would be non-trivial.

  • We have queries such as SELECT $1 where $1 can be of any Postgres type - integer, string, date, etc.. Postmodern currently does not send out information about the argument types, which means that Postgres actually always does an implicit conversion from string to the proper type when reading a Postmodern query - even in case of SELECT $1 < 4 where $1 is 2, the actual value in $1 is a string '2' that Postgres then implicitly converts back to its numeric value.
  • This means that we cannot properly prepare any single generic query that takes in any arguments and then expect a proper return type of that query, since Postmodern has no means of telling apart generic queries from non-generic ones..
  • This means that we will have to use multiple prepared statements with the same query text but different argument types - for instance, SELECT $1 where $1 is a string, then where $1 is an integer, ..., ...
  • Once that is done, we will need to do something akin to CLOS dispatch, where we find the Postgres types of the arguments provided to pomo:query, dispatch to the statement prepared with the proper types (and create one if it does not yet exist), and finally execute it.

I think that the above methodology is going to work, though it might be sub-optimal since I am not a Postgres person. Therefore I'd need someone to verify this for me before any implementation attempt is made.

@sabracrolleton
Copy link
Collaborator

Continuing in the documenting the issue.

In theory we can loop through the parameter arguments and decide which ones to send as binary, passing the arguments through the following chain and then deciding in parse-message what we can send in binary and the oid necessary to tell postgresql what is coming.

|- query (postmodern/query)
|----- prepare-query (cl-postgres/public)
|---------- send-parse (cl-postgres/protocal)
|--------------- parse-message (cl-postgres/messages)

While parse-message should be able to specify the number of parameters which Right now I am not quite in sync with postgresql in the formatting of the messages so I need to solve that before I move forward. FYI Postgresql message format details are found here:

https://www.postgresql.org/docs/current/protocol-flow.html
https://www.postgresql.org/docs/current/protocol-message-types.html
https://www.postgresql.org/docs/current/protocol-message-formats.html

Specifically parse-message should be sending a message meeting the following:

Parse (F)
Byte1('P')
Identifies the message as a Parse command.

Int32
Length of message contents in bytes, including self.

String
The name of the destination prepared statement (an empty string selects the unnamed prepared statement).

String
The query string to be parsed.

Int16
The number of parameter data types specified (can be zero). Note that this is not an indication of the number of parameters that might appear in the query string, only the number that the frontend wants to prespecify types for.

Then, for each parameter, there is the following:

Int32
Specifies the object ID of the parameter data type. Placing a zero here is equivalent to leaving the type unspecified.

BTW, the object id for a four byte integer is 23 but postgresql does not guarantee that the oid for other non-integer data types will not change. What fun.

@sabracrolleton
Copy link
Collaborator

Getting there slowly. Sidetracked slightly by a request for scram-sha-256 authentication.

@sabracrolleton
Copy link
Collaborator

For those following at home and wondering if this has been dropped, I now have code that works for boolean, short, regular and long integers, single and double floats and text in regular queries, prepared queries (with and without reconnects) and dao queries.

Dates are going to be tricky both because common lisp has one format and the simple-date and local-time libraries have their own formats and because postgresql accepts dates in a lot of different text string formats.

Does anyone have suggestions on the minimum set of types that should be handled?

@sabracrolleton
Copy link
Collaborator

Resolved in this weeks commit allowing parameters to be passed in binary format on a connection by connection basis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants