More heuristics to determine whether to iterate over target source #45339

strk · 2021-09-30T10:30:40Z

Following up performance tests reported in #45195 (comment) this PR is adding more heuristics to determine how to iterate over target and reference sources by preferring to iterate over the source NOT being of the POINT type.

strk · 2021-09-30T11:13:29Z

CI is failing for unrelated reasons. Also failing in master branch

strk · 2021-09-30T14:24:37Z

The test I've used in #45195 (comment), which is 43336 linestring features filtered by 10000 points (13,382 extracted), has these times:

75.63 seconds (1 minute 16 seconds) -- without this PR
10.28 seconds -- with this PR

I'd like to run another test because it isn't clear to me if the 10.28 timing was taken on second run of the same algorithm, which could mean geoms were already prepared.

strk · 2021-09-30T14:26:01Z

I confirm that master, even on the second run of the same algorithm, still takes over 1 minute (~1m16secs)

strk · 2021-09-30T22:44:27Z

I just tried again, same input (~43k lines and 10k points, ~13k lines extracted with distance 0.05):

9.93 seconds with this PR
73.90 seconds without this PR

The only concern is that with the PR we have 13006 features extracted while without we have 13380,
which I can't explain at the moment. Maybe a bug in processByIteratingOverTargetSource ?

@nyalldawson any idea ?

strk · 2021-10-01T07:46:22Z

I've verified with PostGIS ST_DWithin and the number of LINESTRING records having at least one POINT record within the 0.05 distance is 13382, so 2 more than found by QGIS master and 376 more than with QGIS from this PR.

I suspect iterating over target is bogus.

Here's a picture comparing the extracted lines with master and with this PR:

The logs report:

Feature 1900599 in query_points found to be within distance currentDistance 0.05 from feature 41238 of edge_data
No feature in query_points found to be within distance currentDistance 0.05 from feature 41223 of edge_data

Feature 41223 is the green diagonal linestring on the right of the measurement box, which is clearly within 0.05 units of distance from multiple query points.

strk · 2021-10-01T07:58:47Z

I've found that dropping the setLimit(1) on the request fixes the issue, resulting in 13382 rows returned (as with master, I got my previous counting wrong, master always matched ST_DWithin from postgis).

So, it looks like setLimit(1) ends up being applied at the wrong time (ie: not finding the first reference geometry within distance, but some other first...)

strk · 2021-10-01T08:40:57Z

Enabling SQL logging I see that for each target feature, a query like this is executed:

WHERE "p" && st_makeenvelope(13.10861315090370205,64.87761437708003598,13.20866626328804472,64.9776440982121386,4258)

Such query makes a bounding-box comparison, which may be true for points which are NOT within the given distance. This could explain why limit(1) extracts less features because we could be fetching the first reference feature whose bounding box intersects our bounding box expanded by requested distance and later find it to be NOT within the distance.

There's no way to predict how many features should be extracted with the bounding box based operator, so can't limit in that case (or limit shoudl be applied later). With OPERATOR <-> and new enough PostgreSQL version (and PostGIS version) this should instead work.

Is this problem worth a separate ticket ? Because I suspect it can be triggered also without this PR (at least when targetSource iteration is selected, which is when target has less features than source if I understand correctly)

strk · 2021-10-01T09:30:23Z

Further debugging shows that ST_DWithin is never used at the PostgreSQL side, although I do see code intended to use it when Qgis::SpatialFilterType::DistanceWithin is used in the iterator, so something is problematic there as well (algorithm not passing DistanceWithin spatial filter ?)

strk · 2021-10-01T09:39:56Z

The problem seems to be that spatial filters are mutually exclusive, so the PostgreSQL FeatureIterator ends up finding the BoundingBox filter and NOT the DistanceWithin filter:

PostgresFeatureIterator found spatialFilterType being BoundingBox

This explain why setLimit(1) breaks the algorithm

strk · 2021-10-01T10:08:29Z

I've filed #45352 to deal with the bug in the algorithm failing to extract all features.

nyalldawson · 2021-10-14T00:52:36Z

src/analysis/processing/qgsalgorithmdistancewithin.cpp

+  //
+  // Possible reasons to iterate over target are considered here
+  //
+  do


why not just "if/else if" here? The loop approach is a bit odd to me.

Given we are looking for possible reasons to force a non-default iteration model and there may be very many reasons to do so, an "if/else" could become very hard to read

nyalldawson · 2021-10-14T00:54:40Z

src/analysis/processing/qgsalgorithmdistancewithin.cpp

+  do
+  {
+    // reference needs reprojection, so we cannot iterate over target
+    if ( targetSource->sourceCrs() != referenceSource->sourceCrs() ) break;


This logic has changed -- if the target crs != reference crs then we have to iterate over the target instead.

nyalldawson · 2021-10-14T00:55:07Z

src/analysis/processing/qgsalgorithmdistancewithin.cpp

+    if ( targetSource->sourceCrs() != referenceSource->sourceCrs() ) break;
+
+    // distance is dynamic, so we cannot iterate over target
+    if ( distanceProperty.isActive() ) break;


again this one should force iterating over target

Thanks Nyall for pointing this out in qgis#45339 (comment) qgis#45339 (comment)

strk · 2021-10-14T20:54:19Z

Thanks @nyalldawson I've pushed a commit fixing the inverted conditions. The code is even more readable now, as each of the conditions are forcing iteration on target, although I'd like those comments to be more specific as to why target MUST be used in some cases (ie: different CRSs or dynamic distance)

strk · 2021-10-15T15:56:45Z

Now, I'm still trying to figure out HOW it is possible that a QgsFeatureRequest constructed with spatialFeatureType()==2
and passed to QgsFeatureSource->getFeatures() becomes a Request with spatialFeatureType()==1:

XXXX request 0x7fe2f27fa4e0 for target source someData has spatialFilterType 2
XXX QgsFeatureRequest::setFilterRect called on request 0x7fe2d40129a8 having spatialFilterType == 2
XXX QgsPostgresFeatureSource::getFeatures called, request 0x7fe2d4012b18 has spatialFilterType 1

Time to run a debugger to figure out what's really going on in there!

nyalldawson · 2021-10-17T22:33:42Z

Thanks @strk !

strk · 2021-10-18T15:54:59Z

Thank you @nyalldawson but I'm afraid this change is exposing the bug reported in #45352 -- anyway, will tackle it there (or in the associated PR #45384)

strk requested a review from nyalldawson September 30, 2021 10:30

strk self-assigned this Sep 30, 2021

github-actions bot added this to the 3.22.0 milestone Sep 30, 2021

strk mentioned this pull request Sep 30, 2021

Reduce GEOS conversions and preparations of geometries #45195

Merged

strk force-pushed the distance-within-algorithm-heuristic branch 3 times, most recently from bdc32da to dcfcfec Compare September 30, 2021 11:02

strk mentioned this pull request Oct 1, 2021

DistanceWithin algorithm failing with PostgreSQL provider #45352

Closed

2 tasks

strk force-pushed the distance-within-algorithm-heuristic branch from dcfcfec to 5bb1ee0 Compare October 1, 2021 16:09

strk mentioned this pull request Oct 12, 2021

Fix DistanceWithin algorithm with PostgreSQL provider #45384

Closed

strk force-pushed the distance-within-algorithm-heuristic branch from 5bb1ee0 to 74b651c Compare October 12, 2021 17:05

nyalldawson reviewed Oct 14, 2021

View reviewed changes

strk added a commit to strk/QGIS that referenced this pull request Oct 14, 2021

Fix iteration logic for the dinamic distance and reprojection cases

a2be482

Thanks Nyall for pointing this out in qgis#45339 (comment) qgis#45339 (comment)

More heuristics to determine whether to iterate over target source

1cb502a

strk force-pushed the distance-within-algorithm-heuristic branch from a2be482 to 1cb502a Compare October 14, 2021 22:01

strk requested a review from nyalldawson October 14, 2021 22:02

nyalldawson approved these changes Oct 17, 2021

View reviewed changes

nyalldawson merged commit a7f7800 into qgis:master Oct 17, 2021

strk deleted the distance-within-algorithm-heuristic branch October 18, 2021 15:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More heuristics to determine whether to iterate over target source #45339

More heuristics to determine whether to iterate over target source #45339

strk commented Sep 30, 2021

strk commented Sep 30, 2021

strk commented Sep 30, 2021

strk commented Sep 30, 2021

strk commented Sep 30, 2021

strk commented Oct 1, 2021

strk commented Oct 1, 2021

strk commented Oct 1, 2021

strk commented Oct 1, 2021

strk commented Oct 1, 2021

strk commented Oct 1, 2021

nyalldawson Oct 14, 2021

strk Oct 14, 2021

nyalldawson Oct 14, 2021

nyalldawson Oct 14, 2021

strk commented Oct 14, 2021

strk commented Oct 15, 2021

nyalldawson commented Oct 17, 2021

strk commented Oct 18, 2021

More heuristics to determine whether to iterate over target source #45339

More heuristics to determine whether to iterate over target source #45339

Conversation

strk commented Sep 30, 2021

strk commented Sep 30, 2021

strk commented Sep 30, 2021

strk commented Sep 30, 2021

strk commented Sep 30, 2021

strk commented Oct 1, 2021

strk commented Oct 1, 2021

strk commented Oct 1, 2021

strk commented Oct 1, 2021

strk commented Oct 1, 2021

strk commented Oct 1, 2021

nyalldawson Oct 14, 2021

Choose a reason for hiding this comment

strk Oct 14, 2021

Choose a reason for hiding this comment

nyalldawson Oct 14, 2021

Choose a reason for hiding this comment

nyalldawson Oct 14, 2021

Choose a reason for hiding this comment

strk commented Oct 14, 2021

strk commented Oct 15, 2021

nyalldawson commented Oct 17, 2021

strk commented Oct 18, 2021