Implement support for 1-dim arrays for PostgreSQL #110

oeb25 · 2020-02-21T13:14:16Z

Add support for 1-dimensional arrays for PostgreSQL

This pull request attempts to implement the foundation for encoding and decoding arrays for PostgreSQL, resolving #48 and #82.

One solution I came up with, after a bit of experimentation was having an ArrayEncoder<T> and ArrayDecoder<T>, allowing the encoding and decoding of elements one at a time.

Included in this pull request is encoding of the standard 1-dimensional postgres array types (int[], text[], uuid[], ...) from slices, as well as decoding into Vec's.

The rest of this will contain the details of implementing this, and the problems and considerations that unfolded.

Encoding

By supporting incremental encoding of the array it makes it easier to encode anything that implements Iterator<Item = T>. Creating the implementation for slices and vec's becomes trivial with this approach, as can be seen by the actual implementation:

impl<T> Encode<Postgres> for [T]
where
    T: Encode<Postgres>,
    Postgres: HasSqlType<T>,
{
    fn encode(&self, buf: &mut Vec<u8>) {
        let mut encoder = ArrayEncoder::new(buf);
        for item in self {
            encoder.push(item);
        }
    }
}

ArrayEncoder works by counting the number of elements that will be pushed/encoded, and updates the dimensions field of the array header on Drop.

This results in a potentially invalid buffer as long as elements are still being pushed, but as soon as the buffer is extracted from the encoder, which it must for it to be read, the encoder is dropped and the length is written. Alternatively, we could update the length field on each push, but only updating in the end could be marginally faster. I do not know which method is preferred.

Implementing `Encode` for iterator of encodeable

Allowing Encode for Iterator<Item = T> could reduce allocations and make mapping of internal structures to postgres transferable types possible, without allocating new Vec's or the like. An example of this:

struct Friend {
  name: &'static str,
  age: usize,
}

let friends = vec![
  Friend { name: "Joe", age: 34 },
  Friend { name: "Carl", age: 38 },
];

sqlx::query!(
  "update users set friend_names = $1",
  friends.iter().map(|friend| friend.name)
);

Unfortunately trying the write this implementation seems conflicts with the implementation for impl<T> Encode for &T where T: Encode { ... }.

Any thoughts on how this could be solved? (Maybe creating a wrapper type EncodeIter(impl Iterator<Item = T>)?)

Decoding

Decoding is implemented similarly as encoding. A decoder is constructed at the start of the buffer containing the bytes for the array, it then parses the header, checks its validity, and now presents an Iterator<Item = T>, allowing for lazy decoding of the array. This makes the implementation of decoding to a Vec trivial:

impl<T> Decode<Postgres> for Vec<T>
where
    T: Decode<Postgres>,
    Postgres: HasSqlType<T>,
{
    fn decode(buf: &[u8]) -> Result<Self, DecodeError> {
        let decoder = ArrayDecoder::new(buf)?;
        decoder.collect()
    }
}

Note the Vec. This required the implementation of HasSqlType<Vec<T>> for Postgres for all of the array types, similar to the already existing HasSqlType<[T]> for Postgres. Also, the size of most arrays coming from the database cannot be known at compile-time, hence the dynamic allocation. I played a handful of different approaches for implementing decode for [T; N], and some of the persistent considerations were:

How do we represent the array when it's partially decoded? Do we require T: Default? Do we use some sort of MaybeUninit?
How do we use const generics? Not knowing too much about const generics, it seems it is currently still unstable. Also when trying to create an array with [T::default(); N] we get error: array lengths can't depend on generic parameters.
What if the array from postgres contains more elements then N? Do we just discard those?

Errors

With regards to errors when decoding, how we handle those must also be considered. Currently if the bytes we get from postgres contains an unexpected elemtype, is of dimensions of other then one, or it includes a null bitmap (currently not implemented), the decoder just panics. An appropriate error for each case such be returned, but I do not know in what shape or form.

Similarly, when decoding of an element fails, should no elements be returned, or should the ones that succeded be returned? Currently, it just stops at the first error and doesn't return any elements at all.

This still needs some work, as the inferance and error messages are not quite where we want them to be

abonander · 2020-03-06T01:20:17Z

If we change Encode to take self by-value and then implement it manually for references to types then we can implement it for T: IntoIterator where T::Item: Encode.

We can calculate dimensions statically, I'm thinking we just add a provided method dimensions() -> usize that returns 0 by default and for the T: IntoIterator impl we override it to return <T::Item as Encode>::dimensions() + 1, and then just call Self::dimensions(). That should be all that's necessary to support arbitrary numbers of dimensions.

Since the length is a fixed-width integer, I would definitely just wait to update the count until the end; if we fail to update it then we've panicked which is going to propagate all the way through and kill the code using the connection anyway. (That reminds me, our implementation of Drop for PoolConnection should probably always clear the connection's buffer and maybe just kill the connection entirely if it was dropped by a panic, but that's orthogonal to this PR).

For the decode side I think we're fine limiting to 1D arrays for now, and I also think we're going to mostly ignore fixed-length arrays until const generics are stable. We can wait to discuss the implementation of those at that point.

We definitely need to eliminate those panics in Decode though. For the null bitmap just return UnexpectedNullError (from crate::error) and we'll fix that one later. For the rest, you can return the error message as a String in DecodeError like this: Error::DecodeError(format!("your error message").into()). I'd maybe recommend wrapping this up in a macro in sqlx-core/src/error.rs:

#[allow(unused_macros)]
macro_rules! decode_err (
    ($($args:tt)*) => {
        $crate::error::DecodeError(format!($($args)*).into())
    }
);

And yeah, just return the error, don't worry about decoding the remaining elements. It's most likely going to be a programmer error anyway that I wouldn't expect people to try to recover from.

abonander · 2020-03-06T01:20:57Z

Sorry about the delay, we've been busy with #111. We'll probably merge this into that once it's ready.

abonander · 2020-03-06T03:53:37Z

sqlx-core/src/postgres/types/array.rs

@@ -0,0 +1,202 @@
+/// Encoding and decoding of Postgres arrays. Documentation of the byte format can be found [here](https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/utils/array.h;h=7f7e744cb12bc872f628f90dad99dfdf074eb314;hb=master#l6)


I believe this is actually the description of the in-memory format. The binary wire format appears to be different based on the implementation of array_send here: https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/utils/adt/arrayfuncs.c;h=7a4a5aaa86dc1c8cffa2d899c89511dc317d485b;hb=master#l1547

abonander · 2020-03-06T04:00:59Z

sqlx-macros/src/database/postgres.rs

+        Vec<i32>, [i32],
+        Vec<i64>, [i64],
+        Vec<f32>, [f32],
+        Vec<f64>, [f64],


The format for this should be Vec<T> | &[T]

mehcode · 2020-03-20T18:41:29Z

Working on rebasing this now

mehcode · 2020-03-21T03:08:31Z

Merged here: ec27b65

Thanks again for the work here.

Fix commit updates the `postgres::connection::describe` module to add full support for domain types. Domain types were previously confused with their category which caused invalid oid resolution. Fixes launchbadge#110

oeb25 added 4 commits February 21, 2020 14:10

Implement support for 1-dim arrays for PostgreSQL

c9c87b6

Remember use std::marker::PhantomData

521570d

Remove unused method extend on ArrayEncoder

11b36fc

Add the array types to the query! macro

0de4b51

This still needs some work, as the inferance and error messages are not quite where we want them to be

abonander mentioned this pull request Mar 6, 2020

question: how to preform query with variable length of param? #103

Closed

abonander mentioned this pull request Mar 6, 2020

Validate and specify PoolConnection's behavior in the face of unexpected drops and general panics #122

Closed

abonander reviewed Mar 6, 2020

View reviewed changes

mehcode added this to the 0.3.1 milestone Mar 19, 2020

Report errors in decoding, change types in macros

b51b4d3

mehcode closed this Mar 21, 2020

demurgos mentioned this pull request Mar 16, 2021

fix(postgres): Add support for domain types description #1112

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement support for 1-dim arrays for PostgreSQL #110

Implement support for 1-dim arrays for PostgreSQL #110

oeb25 commented Feb 21, 2020

abonander commented Mar 6, 2020

abonander commented Mar 6, 2020

abonander Mar 6, 2020

abonander Mar 6, 2020

mehcode commented Mar 20, 2020

mehcode commented Mar 21, 2020

		@@ -0,0 +1,202 @@
		/// Encoding and decoding of Postgres arrays. Documentation of the byte format can be found [here](https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/include/utils/array.h;h=7f7e744cb12bc872f628f90dad99dfdf074eb314;hb=master#l6)

Implement support for 1-dim arrays for PostgreSQL #110

Implement support for 1-dim arrays for PostgreSQL #110

Conversation

oeb25 commented Feb 21, 2020

Add support for 1-dimensional arrays for PostgreSQL

Encoding

Implementing Encode for iterator of encodeable

Decoding

Errors

abonander commented Mar 6, 2020

abonander commented Mar 6, 2020

abonander Mar 6, 2020

Choose a reason for hiding this comment

abonander Mar 6, 2020

Choose a reason for hiding this comment

mehcode commented Mar 20, 2020

mehcode commented Mar 21, 2020

Implementing `Encode` for iterator of encodeable