-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add type check to file writer and memory layer #37039
Add type check to file writer and memory layer #37039
Conversation
Before continuing the review, do we really want this? What do we win? Besides the memory provider, there are also sqlite and likely other providers that support generic values. I know that "we" GIS people normally tend to think in well defined data structures, but I can imagine that we are preventing legit workflows. |
@m-kuhn legit question that I've asked myself as well and my answer is yes. What I'm really fixing here is the use case where a feature containing an attribute value that is not compatible is silently saved into a destination datasource without any error or warning, this leads to data loss and/or corruption and it is something that cannot be accepted in any serious data processing workflow.
That is the wrong question: I would ask "what do we loose if we don't do that?" What I'm really fixing here is the use case where a feature containing an attribute value that is not compatible with a destination datasource is silently discarded or - worse - saved with a wrong value without any error or warning, this leads to data loss and/or corruption and it is something that cannot be accepted in any serious data processing workflow.
I'm not trying to prevent saving (say) a datetime into a string (this is ok and tested here: https://github.com/qgis/QGIS/pull/37039/files#diff-947ce79ae5af26c9e67b3690e7106485R719), I'm dealing with impossible conversions that provoke data loss or corruption (for example a float with fractional parts to a integer). In any event, this should not normally happen: we are dealing with misconfigurations or wrong programming from plugins. |
I fully agree on the problem and that the writer should transparently raise warnings/errors in such situations. That shouldn't affect the memory layer code though? |
Correct: I'm fixing two separate issues here:
they are related but not the same:
|
Is that the right level to solve this problem though?
|
I don't know, any better idea? referring to 1. and 2. above:
Is your intention to use the memory provider as a generic QVariant loose typed storage?
For the overhead, yes, that's the price we pay for robustness. |
Not at all right now, however ...
... so, what do we win with this?
Stick to what we have right now and let individual providers fix up things, one by one? I might not have the complete picture of the situation, but it looks like it's solving an edge case with a sledgehammer. |
In general I agree with @elpaso's thoughts that we should not allow storage of e.g. a completely non-numeric text string "abc" in a numeric field. I disagree about blocking double storage in int fields though (or similarly, datetime storage in date fields, date storage in datetime fields, etc). I think there's legitimate times we want to allow this (not least of which would be that using the field calculator should not force users to wrap an expression like TBH - it seems like a big driver for this is making sure that 3rd party plugins do things correctly. And I wonder if we couldn't better achieve the same results in a more developer-friendly way by throwing AttributeError exceptions at the time values are being stored in a feature. E.g.
This would ultimately allow these issues to be caught right at the location of the error, making things much easier for PyQGIS devs to debug and fix. (On the same note, for 4.0 we should consider raising exceptions instead of returning False to addFeatures if the addFeatures operation fails!) |
Big +1 for exception raising, if only SIP was more friendly with exception handling :\ let's see how shiboken will deal with that. |
@alexbruy would you please have a look to the failing processing tests? I checked one of the issues ( |
@alexbruy it Travis is drunk as usual at this time of the day, you can have a look to this previous build: https://travis-ci.org/github/qgis/QGIS/builds/696353705#L1238 |
@alexbruy thank you! |
cb8aebc
to
fc44318
Compare
@alexbruy there are still some failures but they seems just rounding errors, mind having a look? If they are rounding errors what would you suggest to do? Is there a way to make the test more tolerant to rounding errors or should I update the reference files (in that case I would appreciate some instructions about how to do that for processing tests) ? |
@nyalldawson @m-kuhn: ok, no sledgehammer, I'm now using the QgsField convertCompatible implementation to check for type compatibility, this means that narrow casting and loss of precision is allowed. I'm still uncomfortable with a processing toolchain that silently accepts double -> int conversions but as long as it is clearly documented I think we can live with that. For those use cases that require stricter checking it might be worthwhile to add a global settings to activate a stricter type checking but that's out of scope. (I'm closing the open comments as resolved because they do not apply anymore). |
Oh, in my setup local test runs without errors. Seems these are rounding errors probably related to some differences in the Travis and local setup. Tried to reduce precision a bit in #37097, let's see if it helps |
4446e88
to
0b5ffb0
Compare
@alexbruy here is another bunch of rounding errors, can you please have a look if they are real issues or the test can be adapted? |
@alexbruy nevermind, I'm fixing it myself. |
@elpaso sorry, was away for several days and missed all emails/notifications. |
@alexbruy no problem! Your first patch pointed me to the right direction! |
841936e
to
c0c35a5
Compare
Fixes qgis#36715 Adds a method to check for QVariant conversions, also check for integral type narrowing so that for example floating point 123.45 does not get down casted to integer without raising an error.
Co-authored-by: Matthias Kuhn <matthias@opengis.ch>
Long story short: calling provider's addFeatures is implemented for some providers in a way that will roll back all changes on errors, leaving the backend storage unchanged. Adding a QgsFeatureSink flag to control this behavior allows certain providers to support partial feature addition. The issue comes from QgsVectorDataProvider::commitChanges that is documented to leave the provider unchanged (roll back) on any error, giving the client code the possibility to fix errors (in the editing buffer) and re-commit. Without a full rollback implementation in the memory provider and after the type check introduction in this PR we ended up with situations like this: vl = ... an empty memory layer self.assertTrue(vl.addFeatures([valid, invalid])) self.assertFalse(vl.commitChanges()) self.assertEqual(vl.featureCount(), 1) <--- fails! We actually had 3 features from vl.getFeatures(): [valid, invalid, valid] (the first from the provider the second and third from the editing buffer). On the other hand, QgsFeatureSink would probably assume that addFeatures will allow partial additions. BTW: This is for sure the longest commit message I've ever written.
c0c35a5
to
a733f07
Compare
@m-kuhn @nyalldawson are you cool to merge now or we'd better wait? |
Let's delay -- it's not a regression, and release is only a couple of days away.. |
Fixes #36715
Started as a simple fix and down into the rabbit hole of provider inconsistencies ...
This monster PR is now about making the memory provider behave like (some? many? who knows?) other providers within respect to type checking and commit changes.
From the commit message:
Long story short: calling provider's addFeatures
is implemented for some providers in a way that
will roll back all changes on errors, leaving
the backend storage unchanged.
Adding a QgsFeatureSink flag to control this
behavior allows certain providers to support
partial feature addition.
The issue comes from QgsVectorDataProvider::commitChanges
that is documented to leave the provider unchanged (roll
back) on any error, giving the client code the possibility
to fix errors (in the editing buffer) and re-commit.
Without a full rollback implementation in the memory
provider and after the type check introduction in this
PR we ended up with situations like this:
vl = ... an empty memory layer
self.assertTrue(vl.addFeatures([valid, invalid]))
self.assertFalse(vl.commitChanges())
self.assertEqual(vl.featureCount(), 1) <--- fails!
We actually had 3 features from vl.getFeatures():
[valid, invalid, valid] (the first from the provider
the second and third from the editing buffer).
On the other hand, QgsFeatureSink would probably assume
that addFeatures will allow partial additions.