-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More heuristics to determine whether to iterate over target source #45339
More heuristics to determine whether to iterate over target source #45339
Conversation
bdc32da
to
dcfcfec
Compare
CI is failing for unrelated reasons. Also failing in master branch |
The test I've used in #45195 (comment), which is 43336 linestring features filtered by 10000 points (13,382 extracted), has these times: 75.63 seconds (1 minute 16 seconds) -- without this PR I'd like to run another test because it isn't clear to me if the 10.28 timing was taken on second run of the same algorithm, which could mean geoms were already prepared. |
I confirm that master, even on the second run of the same algorithm, still takes over 1 minute (~1m16secs) |
I just tried again, same input (~43k lines and 10k points, ~13k lines extracted with distance 0.05): 9.93 seconds with this PR The only concern is that with the PR we have 13006 features extracted while without we have 13380, @nyalldawson any idea ? |
I've found that dropping the setLimit(1) on the request fixes the issue, resulting in 13382 rows returned (as with master, I got my previous counting wrong, master always matched ST_DWithin from postgis). So, it looks like setLimit(1) ends up being applied at the wrong time (ie: not finding the first reference geometry within distance, but some other first...) |
Enabling SQL logging I see that for each target feature, a query like this is executed: WHERE "p" && st_makeenvelope(13.10861315090370205,64.87761437708003598,13.20866626328804472,64.9776440982121386,4258) Such query makes a bounding-box comparison, which may be true for points which are NOT within the given distance. This could explain why There's no way to predict how many features should be extracted with the bounding box based operator, so can't limit in that case (or limit shoudl be applied later). With OPERATOR Is this problem worth a separate ticket ? Because I suspect it can be triggered also without this PR (at least when targetSource iteration is selected, which is when target has less features than source if I understand correctly) |
Further debugging shows that ST_DWithin is never used at the PostgreSQL side, although I do see code intended to use it when Qgis::SpatialFilterType::DistanceWithin is used in the iterator, so something is problematic there as well (algorithm not passing DistanceWithin spatial filter ?) |
The problem seems to be that spatial filters are mutually exclusive, so the PostgreSQL FeatureIterator ends up finding the BoundingBox filter and NOT the DistanceWithin filter:
This explain why setLimit(1) breaks the algorithm |
I've filed #45352 to deal with the bug in the algorithm failing to extract all features. |
dcfcfec
to
5bb1ee0
Compare
5bb1ee0
to
74b651c
Compare
// | ||
// Possible reasons to iterate over target are considered here | ||
// | ||
do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just "if/else if" here? The loop approach is a bit odd to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given we are looking for possible reasons to force a non-default iteration model and there may be very many reasons to do so, an "if/else" could become very hard to read
do | ||
{ | ||
// reference needs reprojection, so we cannot iterate over target | ||
if ( targetSource->sourceCrs() != referenceSource->sourceCrs() ) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic has changed -- if the target crs != reference crs then we have to iterate over the target instead.
if ( targetSource->sourceCrs() != referenceSource->sourceCrs() ) break; | ||
|
||
// distance is dynamic, so we cannot iterate over target | ||
if ( distanceProperty.isActive() ) break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again this one should force iterating over target
Thanks Nyall for pointing this out in qgis#45339 (comment) qgis#45339 (comment)
Thanks @nyalldawson I've pushed a commit fixing the inverted conditions. The code is even more readable now, as each of the conditions are forcing iteration on target, although I'd like those comments to be more specific as to why target MUST be used in some cases (ie: different CRSs or dynamic distance) |
a2be482
to
1cb502a
Compare
Now, I'm still trying to figure out HOW it is possible that a QgsFeatureRequest constructed with spatialFeatureType()==2
Time to run a debugger to figure out what's really going on in there! |
Thanks @strk ! |
Thank you @nyalldawson but I'm afraid this change is exposing the bug reported in #45352 -- anyway, will tackle it there (or in the associated PR #45384) |
Following up performance tests reported in #45195 (comment) this PR is adding more heuristics to determine how to iterate over target and reference sources by preferring to iterate over the source NOT being of the POINT type.