Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce GEOS conversions and preparations of geometries #45195

Merged
merged 2 commits into from
Oct 18, 2021

Conversation

strk
Copy link
Contributor

@strk strk commented Sep 22, 2021

This is a spin-off of PR #45057 which surfaced a problem with duplicated GEOS conversions and preparations.

In this PR the aim is to reduce the number of GEOS preparation as much as possible.
For testing I'm using a polygon layer containing 310 features and running the "Extract Within Distance" processing algorithm to extract features being within 10 units distance from features of a 1-feature point layer.

It was found, in #45057 (comment) that QGIS is running 1550 prepare operations.

The first commit in this PR (using shared_ptr for the GeometryEngine member of QgsFeatureRequest) reduces the preparation to 620 (less than half).

The 620 number probably means that for each of the 310 polygons we're re-preparing the same POINT from the points layer.

@strk strk self-assigned this Sep 22, 2021
@github-actions github-actions bot added this to the 3.22.0 milestone Sep 22, 2021
@strk
Copy link
Contributor Author

strk commented Sep 22, 2021

The idea proposed by @nyalldawson to put the GeometryEngine inside the QgsGeometry object sounds good to me, because it opens the door to allowing a QgsGeometry to be always used as prepared once it was prepared, in its whole lifetime.

Copy link
Member

@m-kuhn m-kuhn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@strk
Copy link
Contributor Author

strk commented Sep 22, 2021

With next commit I've used a shared_ptr for the mDistanceWithinEngine member of the QgsVectorLayerFeatureIterator class as well, and exposed a referenceGeometryEngine() method to the QgsFeatureRequest class. Doing so the number of GEOS preparation go down to 310 (one per feature in the 310 features layer). I'm not sure why it's not 311 (for the preparation of the point geometry) but maybe that's because QGIS avoids preparing POINTS (would make sense to avoid that)

@nirvn
Copy link
Contributor

nirvn commented Sep 22, 2021

@strk , do you have numbers in seconds saved? :) nice work!

@strk
Copy link
Contributor Author

strk commented Sep 22, 2021

Inverting the inputs to the algorithm results in a single preparation, so points are STILL prepared by QGIS, just evidently the layer to extract the geometries from is prepared, and the other is not (shouldn't it be the other way around ?)

@@ -405,6 +405,8 @@ will use the same CRS as the source layer/provider.
.. versionadded:: 3.22
%End

std::shared_ptr< QgsGeometryEngine > referenceGeometryEngine() const;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be exposed to sip... it's the sort of API we should reserve the ability to revise in future

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also keeping in mind that if we do end up storing prepared instances in QgsGeometryPrivate then this method will no longer be required)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hidden in next commit

@strk
Copy link
Contributor Author

strk commented Sep 22, 2021

@nirvn for timing I'm going to change dataset.
These numbers are with use of a LINESTRING layer containing 43336 features and a POINT layer containing 10000 features.
The changes in this PR seem to actually make the operation SLOWER, despite running 1/5 of the preparations

Extracting LINESTRINGs within 0.05 units from the POINTs :
Execution completed in 112.37 seconds (1 minute 52 seconds) -- BEFORE PR with debug logging to file, showing 50000 preparations
Execution completed in 115.15 seconds (1 minute 55 seconds) -- WITH PR with debug logging to file, showing 10000 preparations (1/5)
Execution completed in 71.33 seconds (1 minute 11 seconds) -- BEFORE PR (no debug logging)
Execution completed in 74.03 seconds (1 minute 14 seconds) -- WITH PR

Extracted features: 13,382

Extracting POINTs within 0.05 units from the LINESTRINGs:
Execution completed in 3.21 seconds BEFORE PR
Execution completed in 3.38 seconds WITH PR
Extracted features: 7,979

Note how much faster it is to extract POINT withinDistance of LINES. This is most likely due to the algorithm preparing only the "query layer" and not the "extraction layer" (might benefit from a re-thinking of it).

@strk
Copy link
Contributor Author

strk commented Sep 30, 2021

I've removed the non-topical commits from this PR, so it only contains the actual code to reduce conversions and preparations.
The other changes are now in their own PRs:

This reduces the number of GEOS conversions and preparations
upon copying the request (for example by QgsDistanceWithinAlgorithm)
Does so by exposing a referenceGeometryEngine method to
QgsFeatureRequest and copying the engine rather than creating
a new one on each iteration.
@strk
Copy link
Contributor Author

strk commented Oct 1, 2021

Rebased to current master to pass CI.

@github-actions
Copy link

The QGIS project highly values your contribution and would love to see this work merged! Unfortunately this PR has not had any activity in the last 14 days and is being automatically marked as "stale". If you think this pull request should be merged, please check

  • that all unit tests are passing

  • that all comments by reviewers have been addressed

  • that there is enough information for reviewers, in particular

    • link to any issues which this pull request fixes

    • add a description of workflows which this pull request fixes

    • add screenshots if applicable

  • that you have written unit tests where possible
    In case you should have any uncertainty, please leave a comment and we will be happy to help you proceed with this pull request.
    If there is no further activity on this pull request, it will be closed in a week.

@github-actions github-actions bot added the stale Uh oh! Seems this work is abandoned, and the PR is about to close. label Oct 16, 2021
@strk
Copy link
Contributor Author

strk commented Oct 16, 2021

This is still to be merged, but I wanted to see actual better numbers or it wouldn't make sense.
It would maybe help if @nyalldawson can find his test dataset to check this code against

@github-actions github-actions bot removed the stale Uh oh! Seems this work is abandoned, and the PR is about to close. label Oct 16, 2021
@strk strk removed the stale Uh oh! Seems this work is abandoned, and the PR is about to close. label Oct 16, 2021
@nyalldawson
Copy link
Collaborator

This looks good to me -- I'd be happy to see it merged now!

@strk
Copy link
Contributor Author

strk commented Oct 18, 2021

Oh well, let's merge it then, and delegate performance testing to the heuristics PR...

@strk strk marked this pull request as ready for review October 18, 2021 13:25
@strk strk merged commit 27ae2c9 into qgis:master Oct 18, 2021
@strk strk deleted the shared-engine-in-request branch October 18, 2021 13:27
@strk
Copy link
Contributor Author

strk commented Oct 21, 2021

For the record: I just re-run the test with current PR #45384 and the timing to extract the LINESTRING goes down to <16 seconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants