Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More heuristics to determine whether to iterate over target source #45339

Merged
merged 1 commit into from
Oct 17, 2021

Conversation

strk
Copy link
Contributor

@strk strk commented Sep 30, 2021

Following up performance tests reported in #45195 (comment) this PR is adding more heuristics to determine how to iterate over target and reference sources by preferring to iterate over the source NOT being of the POINT type.

@strk strk self-assigned this Sep 30, 2021
@github-actions github-actions bot added this to the 3.22.0 milestone Sep 30, 2021
@strk strk force-pushed the distance-within-algorithm-heuristic branch 3 times, most recently from bdc32da to dcfcfec Compare September 30, 2021 11:02
@strk
Copy link
Contributor Author

strk commented Sep 30, 2021

CI is failing for unrelated reasons. Also failing in master branch

@strk
Copy link
Contributor Author

strk commented Sep 30, 2021

The test I've used in #45195 (comment), which is 43336 linestring features filtered by 10000 points (13,382 extracted), has these times:

75.63 seconds (1 minute 16 seconds) -- without this PR
10.28 seconds -- with this PR

I'd like to run another test because it isn't clear to me if the 10.28 timing was taken on second run of the same algorithm, which could mean geoms were already prepared.

@strk
Copy link
Contributor Author

strk commented Sep 30, 2021

I confirm that master, even on the second run of the same algorithm, still takes over 1 minute (~1m16secs)

@strk
Copy link
Contributor Author

strk commented Sep 30, 2021

I just tried again, same input (~43k lines and 10k points, ~13k lines extracted with distance 0.05):

9.93 seconds with this PR
73.90 seconds without this PR

The only concern is that with the PR we have 13006 features extracted while without we have 13380,
which I can't explain at the moment. Maybe a bug in processByIteratingOverTargetSource ?

@nyalldawson any idea ?

@strk
Copy link
Contributor Author

strk commented Oct 1, 2021

I've verified with PostGIS ST_DWithin and the number of LINESTRING records having at least one POINT record within the 0.05 distance is 13382, so 2 more than found by QGIS master and 376 more than with QGIS from this PR.

I suspect iterating over target is bogus.

Here's a picture comparing the extracted lines with master and with this PR:
failed-distance

The logs report:

Feature 1900599 in query_points found to be within distance currentDistance 0.05 from feature 41238 of edge_data
No feature in query_points found to be within distance currentDistance 0.05 from feature 41223 of edge_data

Feature 41223 is the green diagonal linestring on the right of the measurement box, which is clearly within 0.05 units of distance from multiple query points.

@strk
Copy link
Contributor Author

strk commented Oct 1, 2021

I've found that dropping the setLimit(1) on the request fixes the issue, resulting in 13382 rows returned (as with master, I got my previous counting wrong, master always matched ST_DWithin from postgis).

So, it looks like setLimit(1) ends up being applied at the wrong time (ie: not finding the first reference geometry within distance, but some other first...)

@strk
Copy link
Contributor Author

strk commented Oct 1, 2021

Enabling SQL logging I see that for each target feature, a query like this is executed:

WHERE "p" && st_makeenvelope(13.10861315090370205,64.87761437708003598,13.20866626328804472,64.9776440982121386,4258)

Such query makes a bounding-box comparison, which may be true for points which are NOT within the given distance. This could explain why limit(1) extracts less features because we could be fetching the first reference feature whose bounding box intersects our bounding box expanded by requested distance and later find it to be NOT within the distance.

There's no way to predict how many features should be extracted with the bounding box based operator, so can't limit in that case (or limit shoudl be applied later). With OPERATOR <-> and new enough PostgreSQL version (and PostGIS version) this should instead work.

Is this problem worth a separate ticket ? Because I suspect it can be triggered also without this PR (at least when targetSource iteration is selected, which is when target has less features than source if I understand correctly)

@strk
Copy link
Contributor Author

strk commented Oct 1, 2021

Further debugging shows that ST_DWithin is never used at the PostgreSQL side, although I do see code intended to use it when Qgis::SpatialFilterType::DistanceWithin is used in the iterator, so something is problematic there as well (algorithm not passing DistanceWithin spatial filter ?)

@strk
Copy link
Contributor Author

strk commented Oct 1, 2021

The problem seems to be that spatial filters are mutually exclusive, so the PostgreSQL FeatureIterator ends up finding the BoundingBox filter and NOT the DistanceWithin filter:

PostgresFeatureIterator found spatialFilterType being BoundingBox

This explain why setLimit(1) breaks the algorithm

@strk
Copy link
Contributor Author

strk commented Oct 1, 2021

I've filed #45352 to deal with the bug in the algorithm failing to extract all features.

//
// Possible reasons to iterate over target are considered here
//
do
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just "if/else if" here? The loop approach is a bit odd to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we are looking for possible reasons to force a non-default iteration model and there may be very many reasons to do so, an "if/else" could become very hard to read

do
{
// reference needs reprojection, so we cannot iterate over target
if ( targetSource->sourceCrs() != referenceSource->sourceCrs() ) break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic has changed -- if the target crs != reference crs then we have to iterate over the target instead.

if ( targetSource->sourceCrs() != referenceSource->sourceCrs() ) break;

// distance is dynamic, so we cannot iterate over target
if ( distanceProperty.isActive() ) break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again this one should force iterating over target

strk added a commit to strk/QGIS that referenced this pull request Oct 14, 2021
@strk
Copy link
Contributor Author

strk commented Oct 14, 2021

Thanks @nyalldawson I've pushed a commit fixing the inverted conditions. The code is even more readable now, as each of the conditions are forcing iteration on target, although I'd like those comments to be more specific as to why target MUST be used in some cases (ie: different CRSs or dynamic distance)

@strk strk force-pushed the distance-within-algorithm-heuristic branch from a2be482 to 1cb502a Compare October 14, 2021 22:01
@strk strk requested a review from nyalldawson October 14, 2021 22:02
@strk
Copy link
Contributor Author

strk commented Oct 15, 2021

Now, I'm still trying to figure out HOW it is possible that a QgsFeatureRequest constructed with spatialFeatureType()==2
and passed to QgsFeatureSource->getFeatures() becomes a Request with spatialFeatureType()==1:

XXXX request 0x7fe2f27fa4e0 for target source someData has spatialFilterType 2
XXX QgsFeatureRequest::setFilterRect called on request 0x7fe2d40129a8 having spatialFilterType == 2
XXX QgsPostgresFeatureSource::getFeatures called, request 0x7fe2d4012b18 has spatialFilterType 1

Time to run a debugger to figure out what's really going on in there!

@nyalldawson
Copy link
Collaborator

Thanks @strk !

@nyalldawson nyalldawson merged commit a7f7800 into qgis:master Oct 17, 2021
@strk
Copy link
Contributor Author

strk commented Oct 18, 2021

Thank you @nyalldawson but I'm afraid this change is exposing the bug reported in #45352 -- anyway, will tackle it there (or in the associated PR #45384)

@strk strk deleted the distance-within-algorithm-heuristic branch October 18, 2021 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants