Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use this plugin system without EF Core to read geometry? #2056

Closed
FransBouma opened this issue Jul 11, 2018 · 18 comments
Closed

How to use this plugin system without EF Core to read geometry? #2056

FransBouma opened this issue Jul 11, 2018 · 18 comments

Comments

@FransBouma
Copy link
Contributor

The issue

See: https://www.llblgen.com/tinyforum/Messages.aspx?ThreadID=24834

We use DbDataReader.GetValues() to read the current row. When a field is of type 'geometry' this fails. In v4 this is apparently solved with plugins, but I have a hard time with understanding how to enable these so a DbDataReader.GetValues() call obtains the value in the right type (any type will do, we can convert it to whatever using a type converter, as long as the value is read, which isn't happening).

Sorry if I sound a little frustrated but I find this a messy system: I can't take a dependency on any npgsql related assembly as our code is generic and uses DbProviderFactory to instantiate ADO.NET objects (also on netstandard20), so I can't tell the reader that some field is a geometry field (which is really not something I want to do anyway, as I don't have to do that with any database type, not even oracle) using Npgsql specific code, nor can I enable some Npgsql related assembly as a plugin. I'm referring to this: http://www.npgsql.org/doc/types/nts.html where it states:

// Place this at the beginning of your program to use NetTopologySuite everywhere (recommended)
NpgsqlConnection.GlobalTypeMapper.UseNetTopologySuite();

No-one writing generic code can use this. Is there any other way to enable this? I.e. use something as the default perhaps? (which might be a good idea, it's better to pick a type as the default and return geometry values as that type instead of solely rely on plugins which have to be setup using Npgsql specific code)

TIA.

@austindrenski
Copy link
Contributor

This is a bit out of my wheelhouse, but I took a look at the stack trace in the linked thread:

Generates the following error. The field 'LatLong' is mapped to Object at the designer so it should be OK.

NotSupportedException: The field 'LatLong' has type 'public.geometry', which is currently unknown to >Npgsql. You can retrieve it as a string by marking it as unknown, please see the FAQ.
Npgsql.NpgsqlDefaultDataReader.GetValue(int ordinal) in NpgsqlDefaultDataReader.cs
Npgsql.NpgsqlDataReader.GetValues(object[] values) in NpgsqlDataReader.cs
SD.LLBLGen.Pro.ORMSupportClasses.EntityMaterializerBase.MaterializeAsync(Func<IDataReader, object[], Exception, bool> valueReadErrorHandler, CancellationToken cancellationToken)
SD.LLBLGen.Pro.ORMSupportClasses.DataAccessAdapterCore.ExecuteMultiRowRetrievalQueryAsync(IRetrievalQuery queryToExecute, IEntityFactory2 entityFactory, IEntityCollection2 collectionToFill, IFieldPersistenceInfo[] fieldsPersistenceInfo, bool allowDuplicates, IEntityFields2 fieldsUsedForQuery, CancellationToken cancellationToken)
SD.LLBLGen.Pro.ORMSupportClasses.DataAccessAdapterCore.FetchEntityCollectionInternalAsync(QueryParameters parameters, CancellationToken cancellationToken)
SD.LLBLGen.Pro.ORMSupportClasses.DataAccessAdapterCore.FetchEntityCollectionAsync(QueryParameters parameters, CancellationToken cancellationToken)
SD.LLBLGen.Pro.ORMSupportClasses.DataAccessAdapterBase.ExecuteWithActiveRecoveryStrategyAsync(Func<Task> toExecute, CancellationToken cancellationToken)
SD.LLBLGen.Pro.LinqSupportClasses.LLBLGenProProvider2.ExecuteEntityProjectionAsync(QueryExpression toExecute, CancellationToken cancellationToken)
SD.LLBLGen.Pro.LinqSupportClasses.LLBLGenProProviderBase.ExecuteExpressionAsync(Expression handledExpression, Type typeForPostProcessing, bool performPostProcessing, CancellationToken cancellationToken)
SD.LLBLGen.Pro.LinqSupportClasses.LLBLGenProProviderBase.PerformExecuteAsync(Expression expression, Type typeForPostProcessing, bool performPostProcessing, CancellationToken cancellationToken)
SD.LLBLGen.Pro.LinqSupportClasses.LLBLGenProProviderBase.ExecuteAsync<TResult>(Expression expression, CancellationToken cancellationToken)
SD.LLBLGen.Pro.LinqSupportClasses.LLBLGenProQuery<T>.ExecuteAsync<TResult>(CancellationToken cancellationToken)
SD.LLBLGen.Pro.LinqSupportClasses.QueryableExtensionMethods.ToListAsync<TResult>(IQueryable<TResult> source, CancellationToken cancellationToken)

The Npgsql FAQ suggests a method for retrieving these as string values.

You can also specify text only for some columns in your resultset:

using (var cmd = new NpgsqlCommand(...)) {
 // Only the second field will be fetched as text
 cmd.UnknownResultTypeList = new[] { false, true };
 var reader = cmd.ExecuteReader();
 // Read everything as strings
}

Would that work in this case?

More generally, does LLBLGen allow for users to specify driver configurations?

/cc @YohDeadfall

@FransBouma
Copy link
Contributor Author

FransBouma commented Jul 11, 2018

We have a generic pipeline and don't reference ADO.NET providers directly, we use DbProviderFactory to instantiate instances, so what you suggest can't be done directly (as the connection is of type DbConnection, not NpgsqlConnection as we don't have a reference to Npgsql directly!).

We do have a 'workaround' way of re-trying the read with a method that's called when the pipeline catches an exception during the GetValues() call: the user can override that method, gets the reader and can e.g. cast the connection to NpgsqlConnection (if they take a dependency on npgsql that is) and do the dance you suggest.

This is however terribly inefficient as it is called after an exception is called. It's also odd that this is even necessary because if the user someone manages to enable the PostGIS plugin, it can retrieve the data properly. Not to mention it's very awkward to have it this way as the query pipeline is used with many queries and you then have to make sure the override you made works with the query that fails. This is a terrible stopgap for this problem and I don't want to suggest this to anyone, only as a last resort. (it was mainly implemented to overcome decimal overflows for oracle, which can be done with generic code, but this is a different case)

IMHO it then should be possible to have a default which pulls the spatial data into a default type (so a plugin that's enabled by default, so no more pesky exceptions no-one can use) so data reads simply work as expected. Like I said, what type the data comes back in, be it string, an array of bytes etc. I don't care, we can convert it to whatever we want, but we can't convert an exception ;) :)

So a default plugin, very basic, which pulls the data in a default type so no configuration needed, is IMHO the best way: people don't have to config anything, it works with DbProviderFactory created objects and existing generic code can work as it has always done without needing a stopgap exception for only Npgsql.

I know spatial types in .NET are a general nightmare and a massive clusterf*ck as there are several libs with spatial types and various pieces of .NET use different libs and can't agree on this. In that light I understand Npgsql picked a system which allows the user to use whatever library they want, which is great, but at the same time, it's not really useful for users who have a generic pipeline and use DbProviderFactory and generic code with Npgsql.

@austindrenski
Copy link
Contributor

austindrenski commented Jul 11, 2018

@FransBouma wrote: (emphasis added)

the user can override that method, gets the reader and can e.g. cast the connection to NpgsqlConnection (if they take a dependency on npgsql that is) and do the dance you suggest.

If the user is connecting to PostgreSQL with Npgsql, wouldn't they have already taken on the dependency?

@FransBouma wrote:

IMHO it then should be possible to have a default which pulls the spatial data into a default type (so a plugin that's enabled by default, so no more pesky exceptions no-one can use) so data reads simply work as expected. Like I said, what type the data comes back in, be it string, an array of bytes etc. I don't care, we can convert it to whatever we want, but we can't convert an exception ;) :)

@roji @YohDeadfall Given the new plugin system, is there any reason to not have the geo types map to string by default? (I'm thinking about the EF Core stopgap of mapping JSON to string.)

@YohDeadfall
Copy link
Contributor

It's not trivial to convert a WKB result to the corresponding WKT and it's much better to leave the job to specialized handlers/libraries. But I agree that it must be possible to retrieve data even there is no handler for the type. Currently it's possible for values stored in a text format, but not in a binary. I think that we should return a byte array at least when the field has a binary representation.

@FransBouma Will it solve your problem?

@austindrenski
Copy link
Contributor

austindrenski commented Jul 11, 2018

@YohDeadfall Ah, I assumed the wire protocol sent the WKT (in bytes) rather than the WKB.

I like the idea of returning at least a byte[], but I'm still a little confused about what blocks the normal configuration route.

@FransBouma Since the plugin can be configured globally, shouldn't a user application be able to call this at startup without altering any of their/your ORM code?

NpgsqlConnection.GlobalTypeMapper.UseNetTopologySuite();

It would require end users to know that a type plugin is required, but is that really a burden?

@FransBouma
Copy link
Contributor Author

FransBouma commented Jul 11, 2018

@austindrenski Hmm, that is indeed a good point. So in user code, they call UseNetTopologySuite(), and the ORM then utilizes that as a static variable is already initialized. It's worth a shot.

@YohDeadfall Binary, string, if the format is known, we can convert it to whatever other type using a type converter (same principle as Value converter in EF). For other users this too might be a good idea, i.e. have a default type, like the same format that was returned by v2? (using a plugin that's enabled by default unless UseNetTopologySuite() is called?)

I'll report back to the user who ran into this, the global call is the first thing to try.

If the user is connecting to PostgreSQL with Npgsql, wouldn't they have already taken on the dependency?

Not directly, as we're using DbProviderFactory, so there's no hard dependency on Npgsql, that's done through DbProviderFactory. Admitted, if the user uses .net core, then this is a moot point (as they have to pass the DbProviderFactory instance to our system) but nevertheless, the rest of the ORM (any other (micro)ORM has the same thing) is generic code that doesn't have ado.net provider specific code that's tied to a hard dependency (only small details like some properties you set through reflection)

@dodyg
Copy link

dodyg commented Jul 12, 2018

It works! This is on ASP.NET Core 2.1 (.NET Core)

geography-supports

Thank you all so much.

@FransBouma
Copy link
Contributor Author

Solved, by the trick brought forward by @austindrenski: at startup set the right plugin, as it's in user code, and a dependency on npgsql at that level is likely so it's not really a problem for most.

Still, there's room for improvement, e.g. a default plugin instead of an exception, for the situation where there's no dependency on npgsql possible.

@roji
Copy link
Member

roji commented Jul 12, 2018

Am coming a bit late to this discussion (which started only 19 hours ago!).

All the above comments are right (it's great to feel a bit redundant on Npgsql :)). @austindrenski's suggestion is the right one IMHO - the user sets up an Npgsql plugin without the ORM needing to be aware of anything.

However, it's true that this isn't a 100% complete solution. I have at least one other case when Npgsql isn't being used by a program, but rather as a GAC-installed ADO.NET driver from an existing program such as PowerBI or Excel. In this kind of scenario there simply isn't anywhere to place the plugin initialization code, so all you have is what Npgsql supports out of the box.

@FransBouma, you suggest to have a default plugin - this would solve the above. However, part of the idea of plugins is to not force extra dependencies on Npgsql users which they don't need or want (and which can cause dependency hell with their own dependencies). I think that forcing all Npgsql users to suddenly depend on NetTopologySuite would raise a lot of complaints (I for one wouldn't want that as a developer). But if this is an issue for enough people we can always have the option of bringing the NetTopologySuite plugin into Npgsql itself.

I'm definitely open if anyone has any other suggestions - specifically the PowerBI/Excel case still has no solution as far as I know.

Some final remarks regarding text and binary encoding... PostgreSQL does support both text and binary encoding for each type. Npgsql (by default) is a binary-only driver - values are always passed in binary on the wire. This is better for speed, and also saves us from dealing with variations in the textual representation based on session locale and other things (the binary representation is always the same). Npgsql 2 used to do text encoding and it a huge pain (as well as inefficient).

Now, it's still possible to force text transfer, you can use a special Npgsql-specific API, which tells Npgsql to request results in text instead of binary. This is obviously out of bounds to a generic ORM which doesn't know about Npgsql, and in general is only meant to be a rarely-used means of fetching a data type in text which Npgsql (and now its plugins) don't yet support. Even if an ORM could do this, it would mean the the user gets a string property with WKT which they then have to parse - that's not an optimal situation both in usability and perf.

@roji
Copy link
Member

roji commented Jul 12, 2018

Oh, one more comment on the suggestion to allow access as byte[]. There actually is an (undocumented) plugin called Npgsql.RawPostgis, which exposes PostGIS values as raw byte[]. I haven't documented it yet because I honestly wasn't sure whether it should exist or not... As a general rule, one should be able to use NetTopologySuite to read PostGIS values and proceed from there, converting to any format necessary etc. Note also that PostGIS doesn't exactly return WKB, but rather EWKB, which IIRC isn't 100% compatible.

It's technically possible to make this plugin the default behavior inside Npgsql, so byte[] would be returned by default (as @YohDeadfall suggested). I'm not sure this is such a good idea - it would definitely confuse new users who don't yet know how spatial works, sending them to parse the binary data themselves (I'd rather have the current exception that sends them to the documentation or to stackoverflow).

However, we do have the option of supporting byte[]. This would mean that untyped access (GetValue()) still throws like today (which I think is right), but GetFieldValue<byte[]>() would work. While this probably wouldn't help LLBLGen (which AFAIK uses untyped access), it would at least allow everyone unrestricted access to the raw value transmitted by PostgreSQL. We could even support this for any data type and not just PostGIS, just in case somebody's interested. But I think this would be a very exotic feature with very limited use, and would rather wait for it to be requested.

@FransBouma
Copy link
Contributor Author

@roji I think if GetFieldValue<> is required it's not going to happen anyway. My idea was to have a text based plugin as default which returns the data equivalent to v2. So if I call GetValues() on a reader and no plugin is activated, I get the default, which is v2 text for the geometry. This likely sucks in a lot of cases, but IMHO it's better than an exception which is unclear for people who use npgsql through an orm.

That said, I have no idea how many people will run into this regardless, it might be a very small group, so the time needed to implement all this might be better put elsewhere. I'm at least glad we have a proper way to enable it now in our system :)

@roji
Copy link
Member

roji commented Jul 13, 2018

My idea was to have a text based plugin as default which returns the data equivalent to v2. So if I call GetValues() on a reader and no plugin is activated, I get the default, which is v2 text for the geometry.

Unfortunately this isn't how PostgreSQL works... When you execute a query, you need to specify upfront, column-per-column, whether you want the results in binary or text. As I wrote above, unless you use a special Npgsql-specific API, Npgsql will always request binary on your behalf. So unless you use that API (which LLBLGen will never do), then by the time you're trying to call GetValues(), the query has already been sent requesting binary results, and there's no way to turn those binary results into a string representation - it's too late (unless Npgsql converts the binary to text internally, which means including an entire WKB->WKT decoder/encoder pipeline inside Npgsql... not a good idea).

There have been discussions about allowing clients to predefine, at the PostgreSQL session level, which types they want to receive in binary and which they want in text. This would remove the need to specify it on each separate query, allowing it to be configured once on startup. In theory, if this protocol capability existed, Npgsql could by default request to have geometry/geography in text, and only if a plugin is used we change that to requesting binary. But this is a nothing more than theoretical idea at this point, and would be a PostgreSQL feature rather than an Npgsql one. The only conceivable thing we can do is expose the binary data received from PostgreSQL (EWKB format) as a byte[], like I wrote above, and let the user deal with it - but I'm against that option for the reasons described above. In any case the user would need something like NetTopologySuite to decode the EWKB, so why not just use the plugin and have Npgsql do it for you...

To summarize, I can't really think of any improvement to the way things work at the moment (but would be glad to hear of any ideas).

@FransBouma
Copy link
Contributor Author

The default, built-in plugin I proposed sits at the same level as the NetTopologySuite plugin, so it gets the binary data and returns it in text form. Disclaimer: I might be talking complete nonsense here, but that's how I imagined it could work. the text form is in 1 format only, and if people need a different format, then they can use a different plugin, the default at least gives a value that's more workable than a binary blob.

I assume that's how the current plugins work too? They get the binary blob and transform it into an instance of various types depending on the data/column type?

@roji
Copy link
Member

roji commented Jul 13, 2018

The default, built-in plugin I proposed sits at the same level as the NetTopologySuite plugin, so it gets the binary data and returns it in text form. Disclaimer: I might be talking complete nonsense here, but that's how I imagined it could work. the text form is in 1 format only, and if people need a different format, then they can use a different plugin, the default at least gives a value that's more workable than a binary blob.

You're not talking complete nonsense at all :) First, if you're imagining a plugin that's at the same level as the NetTopologySuite plugin (so external to Npgsql!), then it has the same issues as NetTopologySuite plugin (or any other plugin) - you have to wire it up somewhere, which would force users to reference Npgsql and invoke some Npgsql-specific API (and excludes scenarios like PowerBI/Excel). If you go down this path, you may as well tell your users to use NetTopologySuite, which will happily provide you with the textual representation of the PostgreSQL geometry data.

The other option is to do the binary-to-text conversion inside Npgsql, in which case it's no longer a plugin (it's just like any other type that Npgsql can read natively, like date/time types). The problem here is how to implement it... To be sure this is clear, PostgreSQL doesn't send spatial data as strings encoded in UTF-8 - it really is a binary data encoding (called EWKB). Converting the binary data to text form (WKT) is a really non-trivial task, which as @YohDeadfall said should be done by a specialized spatial library (such as NetTopologySuite). If we implement EWKB->WKT conversion inside Npgsql, we'll have created a partial spatial implementation, which is something we want to avoid. Before 4.0 Npgsql actually had some spatial types and implement EWKB decoding itself (this is now available as the Npgsql.LegacyPostgis plugin), and part of the idea was to get rid of that: a spatial implementation isn't something that belongs in a database driver IMHO (that's what NTS is for).

Hope this clarifies everything (and also that I understood what you were proposing).

I assume that's how the current plugins work too? They get the binary blob and transform it into an instance of various types depending on the data/column type?

That's exactly right... reading an int is taking a 4-byte segment and decoding it to an int. The problem as I said above is that decoding PostGIS binary data (EWKB) and then reencoding it as text (WKT) is much more complicated than that.

@austindrenski
Copy link
Contributor

@roji wrote:

First, if you're imagining a plugin that's at the same level as the NetTopologySuite plugin (so external to Npgsql!), then it has the same issues as NetTopologySuite plugin (or any other plugin) - you have to wire it up somewhere, which would force users to reference Npgsql and invoke some Npgsql-specific API (and excludes scenarios like PowerBI/Excel).

(emphasis added)

Just to throw some spaghetti at the wall:

Is there anything we could do to support the PowerBI/Excel scenarios by offering a configuration file (e.g. ~/.npgsql) to dynamically load plugins?

@roji
Copy link
Member

roji commented Jul 15, 2018

Is there anything we could do to support the PowerBI/Excel scenarios by offering a configuration file (e.g. ~/.npgsql) to dynamically load plugins?

We could definitely do something like that... We could possibly even look at all assemblies in the GAC which start with Npgsql.* and dynamically load those - or look in a specific directory in your homedir, or via a config file as you suggest.

I'm always a bit stressed out by such auto-load mechanisms, especially for a library - someone dropping the right library in the right place could be a security problem. It's not a real or final argument or anything, just unease.

Either way I'd rather wait for people to complain first. Right now I only have the CrateDB folks, who have a plugin and really do want to use PowerBI, but I haven't received any other sort of complain - likely because programs such as Excel or PowerBI don't require the specialized types which the plugins provide. At least that's my guess.

Bottom line, I'd rather have people request something like this before going ahead and implementing it.

@austindrenski
Copy link
Contributor

We could possibly even look at all assemblies in the GAC which start with Npgsql.* and dynamically load those - or look in a specific directory in your homedir, or via a config file as you suggest.

I'm always a bit stressed out by such auto-load mechanisms, especially for a library - someone dropping the right library in the right place could be a security problem. It's not a real or final argument or anything, just unease.

Agreed on the unease, just wanted to put the idea out there. If there's ever a move to implement something like this, I would vote for constraining it to only plugins explicitly identified in a ~/.npgsql file.

@roji
Copy link
Member

roji commented Jul 15, 2018

Sounds good @austindrenski, let's keep it in mind in case this turns out to be a real issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants