New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read arbitrary trees of arrays of composite types, with specific type handlers for inner elements #3558
Comments
Without (yet) discussing the specific methods above... One general difficulty here, is that the record wire representation contains field types but no names; this raises the question of how to match CLR properties (which have no specific ordering) to PG record (which come back with a specific order). In other words, imagine a CLR type with two int properties, which we want to materialize from a PG record containing two ints - how would we get it right? In that sense, composite seems like a better option compared to record, since the types have a preset order and Npgsql pre-loads all these composite definitions. So EF can just ask Npgsql to materialize the composite into a given CLR type, and it would happen recursively on its own (we already do that). It's true that right now we require the API consumer to map the CLR type to the PG composite in advance, but we could bring unmapped composites back in some way. One last comment - since the goal here is the EF support, I would leave this part of the problem for later - there are probably going to quite a few challenges inside of EF Core before we ever reach the problem of materialization. Put another way, if we can make everything work with pre-mapped composites (where all the support already exists), that's already a pretty amazing thing, and we can solve that last step when we get there. Just my two cents. |
It's true that this is just one of many tasks involving EF support for the array collection strategy. But anyway, I find it a bit hard how to use the pre-mapped composite types with EF, especially since they need to be added to the pg catalog first. Imagine someone writes a query like this: var blogs = context.Blogs.Select(b => new {
b.Title,
Posts = b.Posts.Select(p => new {
p.Id,
p.Text,
Tags = p.Tags.Select(t => new {
t.Id,
t.TagName
}).ToList()
}).ToList()
}).ToList(); This will return a result set of rows containing two columns: Title (text), Posts (record[]). Each record contains first Id, Text followed by another record[] containing records of Id and TagName. We can't require the user to go ahead and create these two composite types (so that these are used instead of the generic record oid), just because they write this particular query in their application. Or do you mean that EF should itself go ahead and create the needed composite types in the pg catalog for each query? |
Also note that there might be other ORMs than EF that might be interested in this, where it is easier to build this kind of support. |
Note that every PostgreSQL table has an implicit composite type which describes its rows; Npgsql doesn't load these "table composites" by default - since there could be a huge number of tables - but you can turn that on with I guess I'm proposing to defer the Npgsql ADO changes (how to read a record) until we're further along on the EF Core side - that would also give us a better idea of exactly which API from Npgsql would be best to support the EF side.
That's true and I think we'll definitely end up building something like this. I just personally don't have the impression I have the entire picture of what this needs to look like, and at the point where we need to implement the actual EF shaper things would be much clearer. But I'm definitely open to doing it your way if you feel strongly about it. |
OK you got me thinking about it despite what I wrote above :) Another way to go for exposing things in Npgsql, is to programs to request a strongly-typed ValueTuple from RecordHandler. This approach would only be usable if the API consumer knows in advance the order of the different fields, which is indeed the case in EF (since it generated the querying SQL). One possible disadvantage of a strongly-typed ValueTuple is the creation of a lot of CLR generic types because of different parameter type combinations. As an alternative (or possibly in addition?), we could expose something similar to a DbDataReader implementation, which would allow consumers to generically (or non-generically) access a field by index only (but not by name). Note that all of the above only seems necessary to avoid boxing when reading value fields on records; otherwise the current object[] approach should work fine (that's another reason why we can just go ahead with the EF Core work and not worry about it too much right away). I don't think there's a need to go beyond records and provide a general-purpose visitation/deserialization API; unless I'm mistaken, consumers (such as EF) are again expected to know in advance exactly which fields and types they're reading. In the EF shaper, you'd effectively be generating code that walks over the specific tree structure read from Npgsql, knowing exactly what to expect. Of course, this whole approach has the disadvantage that Npgsql returns the data in one form (with records), which must now be transformed/materialized into the user's POCO entity types. But any other approach - where Npgsql just gives back the POCOs directly would require establishing the mapping with composites in advance. Anyway, just some random thoughts :) |
Thanks for the thinking!
I was thinking of something like this too, but was a bit surprised that C# doesn't allow arbitrary number of generic arguments, as C++ does. Therefore ValueTuple is limited in number of items. It seems this is worked around however by letting the last item consist of another, nested ValueTuple. So yes this can indeed be a good idea that won't require any new API functions. Note that even though the ValueTuple is a struct (non-heap allocated), we will generally ask for a ValueTuple[], which will be allocated on the heap. So with this approach we will still waste some memory that is to be thrown away immediately. I have no idea though how important this is for performance, if at all. EF will create a bunch of objects anyway the user probably will dispose quite quickly.
You mean something like
Just returning an object[] won't work since then each element will be decoded into the default CLR type. A user could possibly map a column in EF to use a different data type, that's why EF needs to be able to tell Npgsql what type there should be in every field. |
Yeah, this is a known detail of ValueTuple which shouldn't block us (but will make things slightly more complicated to implement).
That's a good point. Ideally we wouldn't waste memory like this, so a visitor/deserialization API could indeed make sense to eliminate those heap allocations. And if we do that, it may as well expose the right API for record etc.
Yep, that's right (all of it). It's not unlike what we already do in NpgsqlDataReader (note also the importance of CommandBehavior.SequentialAccess!).
That's true, though I'd happy start with a solution that starts working for the default CLR type :) |
Ok so I put up the initial implementation here for the inner DbDataReader thing: |
@emil can you please submit this as a PR so it's easier to review/comment? |
Posted at #3604. |
For npgsql/efcore.pg#1691, we need a way to efficiently read a column of type
record[]
which can potentially represent a unique structure for every query. It more or less deals with the same issue as unmapped composite types, but in this case the plainrecord
oid is used.Currently, a .NET
object[][]
is returned for PGrecord[]
, which is not good enough in case some element has a specific type mapping for example when using EF.Note that an item in a record can contain another record, or array of records, of an arbitrary depth.
There are many approaches to deserialize this. Here are some options:
Type
as input and constructs the given object.I believe this was kind of supported in Npgsql 4 (but not inner types), but was dropped.
The last three methods throw if the deserialization state doesn't match what the user expects.
We can then use it like this:
Or let
StartDeserializeComplexType
return a new object having the methods above, to avoid clutteringNpgsqlDataReader
, at the expense of an extra allocation (which however could be cached...).While option 1 is probably easiest to use for normal users, it has performance drawbacks (due to reflection), as well as the potential unability to map record items, which are unnamed, to the correct property (since properties are unordered in C#, if not annotated to be laid out in a particular order). Also if the returned object tree is only for intermediate use, we waste memory.
Option 2 is in my opinion a better fit for how EF materialization works, as it dynamically generates .NET Expressions which will be compiled to a real function. The materialization code can just generate the (verbose) needed code that iterates through the record tree and builds whatever entity objects it wants directly, without any boxing or any creating a bunch of new .NET Types.
Also note that 1 can easily be created as a helper function using 2 if we would want that.
Comments?
The text was updated successfully, but these errors were encountered: