Skip to content

Implement Unknown encoding for query parameters #836

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 19, 2022

Conversation

ruslantalpa
Copy link
Contributor

@ruslantalpa ruslantalpa commented Nov 1, 2021

After looking at @NAlexPear #747 PR in relation to #746, it looks like it's changing a lot of function signatures exposed by this library (which probably is not desirable).

Here i propose another solution with the basic ideas being:
Provide a struct like this pub struct Unknown<'a>(pub &'a (dyn ToSql + Sync));
and amend the ToSql trait with encode_format which has a default implementation and only Unknown overrides it
With this additional functionality, encode_bind can selectively switch between binary and text encoding.

As an implementation detail (which can be changed), Unknown requires a method to get the string representation of the inner type so i added an additional as_string function to the ToSql trait and for most of the basic types the function was implemented in the simple_to macro using format!("{}", ..., which looks a bit like a hack but it is a starting point.

I would have liked to somehow implement it inside the Unknown without the need for as_string function, but i am not sure how to get the concrete type out of &(dyn ToSql + Sync) variable

All of this provides the following interface
client.query("select 10 + $1 + $2", &[&10, &Unknown(&"20")])
where the last $2 parameter would be text encoded and pg would infer it's type

@sfackler please let me know if you are interested in this approach and would like me to clean this up (implement as_string for everything that needs it and write up some tests.


/// A Wrapper type used for sending query parameters encoded as unknown.
#[derive(Debug)]
pub struct Unknown<'a>(pub &'a (dyn ToSql + Sync));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of using ToSql in here if the only method being used is as_string? It seems like just using &str would suffice.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name Unknown doesn't seem right to me. There is no "unknown" encoding in the Postgres protocol, just binary and text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the point of using ToSql ...

You are right, it should be just pub struct Unknown<'a>(pub &'a str); (which would also eliminate the need for as_string in the ToSql trait)
I arrived at this while thinking "i have a Vec<&'a (dyn ToSql + Sync)> and i want to map a function over it and wrap each value in Unknown so that it's encoded as string" but that is not the best way as you pointed out, that initial vector in my case is anyway conceptually a Vec<String> anyway so your observation is correct.

The name Unknown doesn't seem right to me...

The name here signifies not the way we are sending the value but the (pseudo) "type" of the value we are sending, just the way we are conceptually sending over int/text/... we are sending in this case a unknown variable type that only incidentally needs to be encoded in text mode.

@@ -769,6 +769,40 @@ pub trait ToSql: fmt::Debug {
ty: &Type,
out: &mut BytesMut,
) -> Result<IsNull, Box<dyn Error + Sync + Send>>;

/// Specify the encode format
fn encode_format(&self) -> i16 { 1 }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use an enum rather than a raw integer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look into it and fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon the question (lack of knowledge), what kind of enum?
I looked over here and the function seems to expect I: IntoIterator<Item = i16>

I'll change to whatever you think it's best

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make whatever Format you'd like and then add a From impl for i16. Here's what I did over on my fork -> https://github.com/NAlexPear/rust-postgres/blob/d8d4264786204bb1ef9d54d2fa61460674917514/postgres-types/src/lib.rs#L791

@sfackler
Copy link
Owner

sfackler commented Nov 2, 2021

How useful is this really on its own without the ability to retrieve results from the database in the text format?

@ruslantalpa
Copy link
Contributor Author

How useful is this really on its own without the ability to retrieve results from the database in the text format?

I am not sure i fully understand the question. Are you referring to using the simple query protocol from pg?
In my case i construct queries that always return ::text columns while the parameters are of different types.

My motivation (and it seems @NAlexPear too) is to let the pg handle type inference.
A few reasons in no particular order:

  • In some case, one does not always work with a "known" schema so the parameter types are not trivial to determine at query creation time (think of products like Hasura) in which case it's very helpful to send over an unknown and let pg determine the type of $1 and do the conversion. This becomes really problematic when the target database uses custom types(enums) so you cant have (i think) at runtime code that handles these situation as it seems in this lib custom types need to have predefined structs for them that derive ToSql
  • Some types (mostly from extensions) do not support binary encoding but they all provide a mechanism to cast an unknown literal which would mean with this small addition one can use all the custom types directly. In some cases probably, as you mentioned somewhere, a double cast from TEXT can be used though maybe not all types support that? and anyway $1 looks better then $1::MONEY::TEXT :) )
  • Meta reason: pg protocol allows for such a use-case (sending over some params as text encoded), they probably have better reasons then mine as to why this functionality exists, so why not let the client library also support this event if we can't come up with very compelling use-cases (if it were a big refactor i would also be suspicious, and pushing back against bloat is also a valid reason, but it seems to be a minimal change with no obvious downsides and some potential upsides)

@NAlexPear do you have other reasons why having the ability to send over some params as text encoded is needed (and the current lib can not handle those usecases)?

ps: @sfackler thank you for the feedback and for considering this.

@ruslantalpa
Copy link
Contributor Author

I've refactored the Unknown struct per your suggestion so now it's more of a convenience feature (which can be removed from the lib since the user can define it himself if needed).

The core functionality/flexibility can be provided only by having the fn encode_format(&self) -> i16 { 1 } function in the trait
and the line

let (param_formats, params):(Vec<_>, Vec<_>) = params.into_iter().map(|p| (p.borrow_to_sql().encode_format(),p)).unzip();

that takes into the consideration the output of this function when encoding parameters.

@NAlexPear
Copy link
Contributor

Thanks for taking a crack at this @ruslantalpa!

do you have other reasons why having the ability to send over some params as text encoded is needed (and the current lib can not handle those usecases)?

The ability to let Postgres infer types is the big one, as you've already pointed out, as well as the ability to support parameter types that don't have easy or intuitive Rust implementations. But I don't think that was the thrust of @sfackler's question when they asked:

How useful is this really on its own without the ability to retrieve results from the database in the text format?

I think the answer here is "still useful" for the type inference case, but only "marginally useful" for the type pass-through case. The most common use for those pass-through types would be as column types, and rare is the column that's modified but never read again.

Given those cases, it would make the most sense to me to allow for an interface that handles both inputs and outputs, since the latter often (but not always) depends on the former. Whether type inference of inputs on its own is a worth feature would be up to @sfackler, but it wouldn't cover my usecases on its own.

@ruslantalpa
Copy link
Contributor Author

ruslantalpa commented Nov 3, 2021

@NAlexPear do i understand correctly that in your usecase you also need to be able to read some columns where the pg type of that column does not support binary encoding? If that is the case maybe this might be an automated way to support them:

Amend the types::Type struct with a field called supports_binary_encoding and have that default to true but when it gets to a situation where we encounter an oid that is not predefined here have the TYPEINFO_QUERY query also fetch the typreceive and typsend fields and based on them set the supports_binary_encoding field of the type.

Now, since we have the binary encoding support stored in the Type struct, it's possible now to selectively request text encoding for the columns in the encode_bind function from query.rs

With this setup it think it makes it possible for a user to define ToSql and FromSql traits for types that require sending and receiving columns encoded in text mode.

Would that cover the usecase?

@sfackler
Copy link
Owner

sfackler commented Nov 5, 2021

Checking if the type has binary conversion functions is an interesting idea! It's not perfect (e.g. if you upgrade an extension and it adds a binary format that'd break your ability to read values), but it seems like it's probably the best we can do without enormous changes to how the query API works.

I think we'd want to allow ToSql to pick if it's using text or binary format and FromSql would be forced to use binary unless the type only supports text mode.

@ruslantalpa
Copy link
Contributor Author

Actually i think the core of the idea is not necessarily the "auto" part but it's that the Type has info about the the encoding it wants.

For now, the type detection first considers the ones implemented in the lib and after the ones attached to the client here but what if it was the other way around? This way you can "replace" the default type implementations dynamically by registering your own (which incidentally can request text encoding even if the default lib mode is in binary).

Anyway, leaving my comment about the order aside (can be considered later) do you want me to add in this PR the implementation for the supports_binary_encoding idea and the changes to the TYPEINFO_QUERY (along with some tests and documentation)?

@sfackler
Copy link
Owner

sfackler commented Nov 5, 2021

Let's leave that to a separate PR.

For this one, I think we just need encode_format to return an enum.

I'm also not sold on the name Unknown. It may be best to not add a type like that at all to this library for now?

@ruslantalpa
Copy link
Contributor Author

@sfackler sorry for the delay,
I removed the Unknown part and added the Format enum (thank you @NAlexPear )

@ruslantalpa ruslantalpa changed the title Implement Unknown encoding for query parameters WIP Implement Unknown encoding for query parameters Dec 16, 2021
@ruslantalpa
Copy link
Contributor Author

@sfackler is this PR ok as it is or do you want me to make some changes or it's not something you want to merge (no rush, whenever you have time)?
Thank you

@sfackler sfackler merged commit 09e8656 into sfackler:master Jul 19, 2022
@sfackler
Copy link
Owner

Merged - sorry for the delay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants