Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement support for 1-dim arrays for PostgreSQL #110

Closed
wants to merge 5 commits into from

Conversation

oeb25
Copy link
Contributor

@oeb25 oeb25 commented Feb 21, 2020

Add support for 1-dimensional arrays for PostgreSQL

This pull request attempts to implement the foundation for encoding and decoding arrays for PostgreSQL, resolving #48 and #82.

One solution I came up with, after a bit of experimentation was having an ArrayEncoder<T> and ArrayDecoder<T>, allowing the encoding and decoding of elements one at a time.

Included in this pull request is encoding of the standard 1-dimensional postgres array types (int[], text[], uuid[], ...) from slices, as well as decoding into Vec's.

The rest of this will contain the details of implementing this, and the problems and considerations that unfolded.

Encoding

By supporting incremental encoding of the array it makes it easier to encode anything that implements Iterator<Item = T>. Creating the implementation for slices and vec's becomes trivial with this approach, as can be seen by the actual implementation:

impl<T> Encode<Postgres> for [T]
where
    T: Encode<Postgres>,
    Postgres: HasSqlType<T>,
{
    fn encode(&self, buf: &mut Vec<u8>) {
        let mut encoder = ArrayEncoder::new(buf);
        for item in self {
            encoder.push(item);
        }
    }
}

ArrayEncoder works by counting the number of elements that will be pushed/encoded, and updates the dimensions field of the array header on Drop.

This results in a potentially invalid buffer as long as elements are still being pushed, but as soon as the buffer is extracted from the encoder, which it must for it to be read, the encoder is dropped and the length is written. Alternatively, we could update the length field on each push, but only updating in the end could be marginally faster. I do not know which method is preferred.

Implementing Encode for iterator of encodeable

Allowing Encode for Iterator<Item = T> could reduce allocations and make mapping of internal structures to postgres transferable types possible, without allocating new Vec's or the like. An example of this:

struct Friend {
  name: &'static str,
  age: usize,
}

let friends = vec![
  Friend { name: "Joe", age: 34 },
  Friend { name: "Carl", age: 38 },
];

sqlx::query!(
  "update users set friend_names = $1",
  friends.iter().map(|friend| friend.name)
);

Unfortunately trying the write this implementation seems conflicts with the implementation for impl<T> Encode for &T where T: Encode { ... }.

Any thoughts on how this could be solved? (Maybe creating a wrapper type EncodeIter(impl Iterator<Item = T>)?)

Decoding

Decoding is implemented similarly as encoding. A decoder is constructed at the start of the buffer containing the bytes for the array, it then parses the header, checks its validity, and now presents an Iterator<Item = T>, allowing for lazy decoding of the array. This makes the implementation of decoding to a Vec trivial:

impl<T> Decode<Postgres> for Vec<T>
where
    T: Decode<Postgres>,
    Postgres: HasSqlType<T>,
{
    fn decode(buf: &[u8]) -> Result<Self, DecodeError> {
        let decoder = ArrayDecoder::new(buf)?;
        decoder.collect()
    }
}

Note the Vec. This required the implementation of HasSqlType<Vec<T>> for Postgres for all of the array types, similar to the already existing HasSqlType<[T]> for Postgres. Also, the size of most arrays coming from the database cannot be known at compile-time, hence the dynamic allocation. I played a handful of different approaches for implementing decode for [T; N], and some of the persistent considerations were:

  • How do we represent the array when it's partially decoded? Do we require T: Default? Do we use some sort of MaybeUninit?
  • How do we use const generics? Not knowing too much about const generics, it seems it is currently still unstable. Also when trying to create an array with [T::default(); N] we get error: array lengths can't depend on generic parameters.
  • What if the array from postgres contains more elements then N? Do we just discard those?

Errors

With regards to errors when decoding, how we handle those must also be considered. Currently if the bytes we get from postgres contains an unexpected elemtype, is of dimensions of other then one, or it includes a null bitmap (currently not implemented), the decoder just panics. An appropriate error for each case such be returned, but I do not know in what shape or form.

Similarly, when decoding of an element fails, should no elements be returned, or should the ones that succeded be returned? Currently, it just stops at the first error and doesn't return any elements at all.

@abonander
Copy link
Collaborator

If we change Encode to take self by-value and then implement it manually for references to types then we can implement it for T: IntoIterator where T::Item: Encode.

We can calculate dimensions statically, I'm thinking we just add a provided method dimensions() -> usize that returns 0 by default and for the T: IntoIterator impl we override it to return <T::Item as Encode>::dimensions() + 1, and then just call Self::dimensions(). That should be all that's necessary to support arbitrary numbers of dimensions.

Since the length is a fixed-width integer, I would definitely just wait to update the count until the end; if we fail to update it then we've panicked which is going to propagate all the way through and kill the code using the connection anyway. (That reminds me, our implementation of Drop for PoolConnection should probably always clear the connection's buffer and maybe just kill the connection entirely if it was dropped by a panic, but that's orthogonal to this PR).

For the decode side I think we're fine limiting to 1D arrays for now, and I also think we're going to mostly ignore fixed-length arrays until const generics are stable. We can wait to discuss the implementation of those at that point.

We definitely need to eliminate those panics in Decode though. For the null bitmap just return UnexpectedNullError (from crate::error) and we'll fix that one later. For the rest, you can return the error message as a String in DecodeError like this: Error::DecodeError(format!("your error message").into()). I'd maybe recommend wrapping this up in a macro in sqlx-core/src/error.rs:

#[allow(unused_macros)]
macro_rules! decode_err (
    ($($args:tt)*) => {
        $crate::error::DecodeError(format!($($args)*).into())
    }
);

And yeah, just return the error, don't worry about decoding the remaining elements. It's most likely going to be a programmer error anyway that I wouldn't expect people to try to recover from.

@abonander
Copy link
Collaborator

Sorry about the delay, we've been busy with #111. We'll probably merge this into that once it's ready.

@@ -0,0 +1,202 @@
/// Encoding and decoding of Postgres arrays. Documentation of the byte format can be found [here](https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/utils/array.h;h=7f7e744cb12bc872f628f90dad99dfdf074eb314;hb=master#l6)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is actually the description of the in-memory format. The binary wire format appears to be different based on the implementation of array_send here: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/adt/arrayfuncs.c;h=7a4a5aaa86dc1c8cffa2d899c89511dc317d485b;hb=master#l1547

Vec<i32>, [i32],
Vec<i64>, [i64],
Vec<f32>, [f32],
Vec<f64>, [f64],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format for this should be Vec<T> | &[T]

@mehcode mehcode added this to the 0.3.1 milestone Mar 19, 2020
@mehcode
Copy link
Member

mehcode commented Mar 20, 2020

Working on rebasing this now

@mehcode
Copy link
Member

mehcode commented Mar 21, 2020

Merged here: ec27b65


Thanks again for the work here.

@mehcode mehcode closed this Mar 21, 2020
demurgos added a commit to demurgos/sqlx that referenced this pull request Mar 16, 2021
Fix commit updates the `postgres::connection::describe` module to add full support for domain types. Domain types were previously confused with their category which caused invalid oid resolution.

Fixes launchbadge#110
demurgos added a commit to demurgos/sqlx that referenced this pull request Mar 16, 2021
Fix commit updates the `postgres::connection::describe` module to add full support for domain types. Domain types were previously confused with their category which caused invalid oid resolution.

Fixes launchbadge#110
demurgos added a commit to demurgos/sqlx that referenced this pull request Mar 16, 2021
Fix commit updates the `postgres::connection::describe` module to add full support for domain types. Domain types were previously confused with their category which caused invalid oid resolution.

Fixes launchbadge#110
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants