Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a distanceWithin method to the QgsGeometryEngine virtual class #45057

Merged
merged 1 commit into from
Sep 21, 2021

Conversation

strk
Copy link
Contributor

@strk strk commented Sep 14, 2021

References #472

@@ -184,6 +184,13 @@ class CORE_EXPORT QgsGeometryEngine
*/
virtual double distance( const QgsAbstractGeometry *geom, QString *errorMsg = nullptr ) const = 0;

/**
* Checks if \geom is within \maxdistance distance from this * geometry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Checks if \geom is within \maxdistance distance from this * geometry
* Checks if \a geom is within \a maxdistance distance from this geometry

/**
* Checks if \geom is within \maxdistance distance from this * geometry
*
* \since QGIS 3.21
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* \since QGIS 3.21
* \since QGIS 3.22

@@ -903,7 +903,7 @@ bool QgsVectorLayerFeatureIterator::postProcessFeature( QgsFeature &feature )

if ( result && mDistanceWithinEngine && feature.hasGeometry() )
{
result = mDistanceWithinEngine->distance( feature.geometry().constGet() ) <= mDistanceWithin;
result = mDistanceWithinEngine->distanceWithin( feature.geometry().constGet(), mDistanceWithin );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit it's more readable than before, but is that also optimized somehow compared with the previous version?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the intention is that optimisations will be added in a follow up (potentially by calling a newer GEOS method)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly

@strk
Copy link
Contributor Author

strk commented Sep 17, 2021

Accepted suggested correction and rebased against current master branch

@strk
Copy link
Contributor Author

strk commented Sep 17, 2021

@nyalldawson looking at the "Extract Within Distance" processing algorithm I've found the QgsGeos::prepareGeometry() function called 1550 times when asked to extract, from a layer of 331 polygons, all the ones within 10 units distance from a layer having a single geometry.

I would have expected at most 332 "prepare" calls, but got 1550. That's about 5 times as many calls to prepare than expected.
Looking at the Qgs::prepareGeometry I also see that the "prepared" version of the geometry is NOT retained, on second call (the method is not idempotent). Why is it so ?

I suspect the speed problem has simpler solutions than implementing a proper "DistanceWithin" method...

@strk
Copy link
Contributor Author

strk commented Sep 20, 2021

With further debugging I've found that extracting features from a 310 features layer being within X distance from a 1 feature layer results in 310 initializations of the VectorLayerFeatureIterator (one iterator per feature?). We get a single iterator if we extract features from the 1-feature layer being within X distance from the 310 features layer. I'd expect the contrary being true.

Anyway, there are definitely multiple preparations of the same geometry, from the logs:

VectorLayerFeatureIterator initialized with non-empty request reference geometry
QgsGeos::prepareGeometry preparing geom Polygon ((25 30, 25 30.19999999999999929, 25.19999999999999929 30.19999999999999929, 25.19999999999999929 30, 25 30))
QgsGeos::prepareGeometry preparing geom Polygon ((16.80000000000000071 29.80000000000000071, 16.80000000000000071 40.20000000000000284, 31.19999999999999929 40.20000000000000284, 31.19999999999999929 29.80000000000000071, 16.80000000000000071 29.80000000000000071))
QgsGeos::prepareGeometry preparing geom Polygon ((16.80000000000000071 29.80000000000000071, 16.80000000000000071 40.20000000000000284, 31.19999999999999929 40.20000000000000284, 31.19999999999999929 29.80000000000000071, 16.80000000000000071 29.80000000000000071))
QgsGeos::prepareGeometry preparing geom Polygon ((16.80000000000000071 29.80000000000000071, 16.80000000000000071 40.20000000000000284, 31.19999999999999929 40.20000000000000284, 31.19999999999999929 29.80000000000000071, 16.80000000000000071 29.80000000000000071))
QgsGeos::prepareGeometry preparing geom Polygon ((16.80000000000000071 29.80000000000000071, 16.80000000000000071 40.20000000000000284, 31.19999999999999929 40.20000000000000284, 31.19999999999999929 29.80000000000000071, 16.80000000000000071 29.80000000000000071))
VectorLayerFeatureIterator initialized with non-empty request reference geometry
QgsGeos::prepareGeometry preparing geom Polygon ((16.80000000000000071 29.80000000000000071, 16.80000000000000071 40.20000000000000284, 31.19999999999999929 40.20000000000000284, 31.19999999999999929 29.80000000000000071, 16.80000000000000071 29.80000000000000071))
QgsGeos::prepareGeometry preparing geom Polygon ((24.80000000000000071 29.80000000000000071, 24.80000000000000071 30, 25.19999999999999929 30, 25.19999999999999929 29.80000000000000071, 24.80000000000000071 29.80000000000000071))
QgsGeos::prepareGeometry preparing geom Polygon ((24.80000000000000071 29.80000000000000071, 24.80000000000000071 30, 25.19999999999999929 30, 25.19999999999999929 29.80000000000000071, 24.80000000000000071 29.80000000000000071))
QgsGeos::prepareGeometry preparing geom Polygon ((24.80000000000000071 29.80000000000000071, 24.80000000000000071 30, 25.19999999999999929 30, 25.19999999999999929 29.80000000000000071, 24.80000000000000071 29.80000000000000071))
QgsGeos::prepareGeometry preparing geom Polygon ((24.80000000000000071 29.80000000000000071, 24.80000000000000071 30, 25.19999999999999929 30, 25.19999999999999929 29.80000000000000071, 24.80000000000000071 29.80000000000000071))
VectorLayerFeatureIterator initialized with non-empty request reference geometry
QgsGeos::prepareGeometry preparing geom Polygon ((24.80000000000000071 29.80000000000000071, 24.80000000000000071 30, 25.19999999999999929 30, 25.19999999999999929 29.80000000000000071, 24.80000000000000071 29.80000000000000071))
QgsGeos::prepareGeometry preparing geom Polygon ((17 30, 17 40, 31 40, 31 30, 17 30))
QgsGeos::prepareGeometry preparing geom Polygon ((17 30, 17 40, 31 40, 31 30, 17 30))
QgsGeos::prepareGeometry preparing geom Polygon ((17 30, 17 40, 31 40, 31 30, 17 30))
QgsGeos::prepareGeometry preparing geom Polygon ((17 30, 17 40, 31 40, 31 30, 17 30))

You can see geometries tend to be prepared even 4-5 times each!

@strk
Copy link
Contributor Author

strk commented Sep 20, 2021

With gdb I'm looking at which path takes to geometry preparation. The very first preparation originates in QgsDistanceWithinAlgorithm::process calling QgsDistanceWithinAlgorithm::processByIteratingOverTargetSource calling QgsFeatureRequest::setDistanceWithin. This is when extracting, from the 310 POLYGON layer, all those polygons being within 10 units distance from a POINT layer with only 1 point in it. That very first preparation prepares a POLYGON geometry (Polygon ((25 30, 25 30.19999999999999929, 25.19999999999999929 30.19999999999999929, 25.19999999999999929 30, 25 30))).

The second preparation is again coming from QgsDistanceWithinAlgorithm::processByIteratingOverTargetSource and ends up preparing the same POLYGON. This time though the code path is with that method calling QgsFeatureRequest::operator=(QgsFeatureRequest const&) (assignment operator!), which in turn calls prepareGeometry again.

Third time preparation of SAME polygon comes again from an assignment operator (of QgsFeatureRequest).

Fourth preparation comes from QgsVectorLayerFeatureIterator::QgsVectorLayerFeatureIterator(QgsVectorLayerFeatureSource*, bool, QgsFeatureRequest const&) directly preparing a POLYGON (still the same).

Fifth preparation comes from QgsFeatureRequest::setDistanceWithin(QgsGeometry const&, double) -- and it is again the same geometry.

@strk
Copy link
Contributor Author

strk commented Sep 20, 2021

Is there a reason why QgsGeos::prepareGeometry resets the mGeosPrepared member rather than re-using it if already present ?

@strk
Copy link
Contributor Author

strk commented Sep 20, 2021

I pushed a commit turning cacheGeos and prepareGeometry methods idempotent, to reduce the preparations.
Do you want this to be in its own PR instead ?

@strk
Copy link
Contributor Author

strk commented Sep 20, 2021

I moved the idempotent GEOS caches in its own PR: #45147

@strk
Copy link
Contributor Author

strk commented Sep 20, 2021

Unfortunately the idempotent GEOS preparation methods do not reduce the number of preparations of the same geometries,
I guess it takes more care with the assignment operator, currently deleted. Those caches are kept as unique_ptr so the default assignment operator will probably reset (clear) the cache of the right operand. Maybe we should use shared pointers for those caches instead

@strk
Copy link
Contributor Author

strk commented Sep 20, 2021

Now backtraces with debug symbols:

First geometry preparation:

#0  QgsGeos::prepareGeometry (this=0x7ffff19ce420 <std::unique_ptr<QgsGeometryEngine, std::default_delete<QgsGeometryEngine> >::operator->() const+28>)
    at /usr/local/src/qgis/qgis/src/master/src/core/geometry/qgsgeos.cpp:214
#1  0x00007ffff1d7a951 in QgsFeatureRequest::setDistanceWithin (this=0x7fff5accc600, geometry=..., distance=100) at /usr/local/src/qgis/qgis/src/master/src/core/qgsfeaturerequest.cpp:132
#2  0x00007ffff01e5ca6 in QgsDistanceWithinAlgorithm::processByIteratingOverTargetSource(QgsProcessingContext const&, QgsFeatureSource*, QgsFeatureSource*, double, QgsProperty const&, std::function<void (QgsFeature const&)> const&, bool, QgsProcessingFeedback*, QgsExpressionContext&) (this=0x555557166230, context=..., targetSource=0x7fff4c0065c0, referenceSource=0x7fff4c003f50,
    distance=100, distanceProperty=..., handleFeatureFunction=..., onlyRequireTargetIds=false, feedback=0x7fffdc014350, expressionContext=...)
    at /usr/local/src/qgis/qgis/src/master/src/analysis/processing/qgsalgorithmdistancewithin.cpp:106
#3  0x00007ffff01e5910 in QgsDistanceWithinAlgorithm::process(QgsProcessingContext const&, QgsFeatureSource*, QgsFeatureSource*, double, QgsProperty const&, std::function<void (QgsFeature const&)> const&, bool, QgsProcessingFeedback*, QgsExpressionContext&) (this=0x555557166230, context=..., targetSource=0x7fff4c0065c0, referenceSource=0x7fff4c003f50, distance=100,
    distanceProperty=..., handleFeatureFunction=..., onlyRequireTargetIds=false, feedback=0x7fffdc014350, expressionContext=...)
    at /usr/local/src/qgis/qgis/src/master/src/analysis/processing/qgsalgorithmdistancewithin.cpp:52
#4  0x00007ffff01e91af in QgsExtractWithinDistanceAlgorithm::processAlgorithm (this=0x555557166230, parameters=..., context=..., feedback=0x7fffdc014350)
    at /usr/local/src/qgis/qgis/src/master/src/analysis/processing/qgsalgorithmdistancewithin.cpp:355
#5  0x00007ffff1a1df7b in QgsProcessingAlgorithm::runPrepared (this=0x555557166230, parameters=..., context=..., feedback=0x7fffdc014350)
    at /usr/local/src/qgis/qgis/src/master/src/core/processing/qgsprocessingalgorithm.cpp:533
#6  0x00007ffff1a28530 in QgsProcessingAlgRunnerTask::run (this=0x555557166160) at /usr/local/src/qgis/qgis/src/master/src/core/processing/qgsprocessingalgrunnertask.cpp:66

Second prepare geometry call:

#0  QgsGeos::prepareGeometry (this=0x7ffff19ce420 <std::unique_ptr<QgsGeometryEngine, std::default_delete<QgsGeometryEngine> >::operator->() const+28>)
    at /usr/local/src/qgis/qgis/src/master/src/core/geometry/qgsgeos.cpp:214
#1  0x00007ffff1d7a545 in QgsFeatureRequest::operator= (this=0x7fff5accc4c0, rh=...) at /usr/local/src/qgis/qgis/src/master/src/core/qgsfeaturerequest.cpp:76
#2  0x00007ffff01e5d1f in QgsDistanceWithinAlgorithm::processByIteratingOverTargetSource(QgsProcessingContext const&, QgsFeatureSource*, QgsFeatureSource*, double, QgsProperty const&, std::function<void (QgsFeature const&)> const&, bool, QgsProcessingFeedback*, QgsExpressionContext&) (this=0x555557166230, context=..., targetSource=0x7fff4c0065c0, referenceSource=0x7fff4c003f50,  
    distance=100, distanceProperty=..., handleFeatureFunction=..., onlyRequireTargetIds=false, feedback=0x7fffdc014350, expressionContext=...)
    at /usr/local/src/qgis/qgis/src/master/src/analysis/processing/qgsalgorithmdistancewithin.cpp:106
#3  0x00007ffff01e5910 in QgsDistanceWithinAlgorithm::process(QgsProcessingContext const&, QgsFeatureSource*, QgsFeatureSource*, double, QgsProperty const&, std::function<void (QgsFeature const&)> const&, bool, QgsProcessingFeedback*, QgsExpressionContext&) (this=0x555557166230, context=..., targetSource=0x7fff4c0065c0, referenceSource=0x7fff4c003f50, distance=100,               
    distanceProperty=..., handleFeatureFunction=..., onlyRequireTargetIds=false, feedback=0x7fffdc014350, expressionContext=...)
    at /usr/local/src/qgis/qgis/src/master/src/analysis/processing/qgsalgorithmdistancewithin.cpp:52
#4  0x00007ffff01e91af in QgsExtractWithinDistanceAlgorithm::processAlgorithm (this=0x555557166230, parameters=..., context=..., feedback=0x7fffdc014350)
    at /usr/local/src/qgis/qgis/src/master/src/analysis/processing/qgsalgorithmdistancewithin.cpp:355
#5  0x00007ffff1a1df7b in QgsProcessingAlgorithm::runPrepared (this=0x555557166230, parameters=..., context=..., feedback=0x7fffdc014350)
    at /usr/local/src/qgis/qgis/src/master/src/core/processing/qgsprocessingalgorithm.cpp:533
#6  0x00007ffff1a28530 in QgsProcessingAlgRunnerTask::run (this=0x555557166160) at /usr/local/src/qgis/qgis/src/master/src/core/processing/qgsprocessingalgrunnertask.cpp:66

What's happening is that the QgsFeatureRequest assignment operator is invoked by QgsDistanceWithinAlgorithm::processByIteratingOverTargetSource (for no apparent reason, btw) and such assignment operator is forcing creation of a NEW GeometryEngine, rather than reusing or copying the one from the right-hand side, and preparing the associated geometry again.

@nyalldawson
Copy link
Collaborator

@strk

Unfortunately the idempotent GEOS preparation methods do not reduce the number of preparations of the same geometries,

Right -- this is a potential optimisation which we should consider. Right now if you create a geometry engine from a QgsGeometry object you'll always get a new QgsGeos object, even though QgsGeometry objects themselves are implicitly shared and only invoke shallow copies (sharing the same underlying QgsGeometryPrivate object). Like you've found, this means that we often convert the exact same QgsGeometryPrivate objects to QgsGeos multiple times, and prepare each of these separately before discarding the results.

We should consider whether it's worth storing the QgsGeos representation inside QgsGeometryPrivate instead, so that we avoid the cost of multiple QGIS -> GEOS conversions and then everyone benefits from the first time this geometry is prepared.

There's a few steps to this:

  1. Add a std::unique_ptr< QgsGeometryEngine> engine to QgsGeometryPrivate
  2. Add a new const QgsGeometryEngine* QgsGeometry::constGeometryEngine() const method, which would return a const pointer to QgsGeometryPrivate::engine.get(). (This needs to be const as we'd otherwise need to detach the QgsGeometry on calling it.). Hide this from sip, as python doesn't respect const. On first call to constGeometryEngine we create the geometry engine and store it in QgsGeometryPrivate::engine, after that we just return the existing object. We'd also need to ensure that this engine is correctly deleted whenever the QgsGeometry object is modified so that it will be created again on the next call to constGeometryEngine. We should probably also have some logic here to ONLY store the engine for complex geometry types -- eg it's overkill to store this for point geometries!
  3. Make QgsGeometryEngine::prepareGeometry() const, so it can be called on the const pointer returned by constGeometryEngine() (QgsGeos::mGeosPrepared would need to be marked mutable to allow this). This means we'd now be able to call
QgsGeometry g = feature.geometry();
const QgsGeometryEngine* engine = g.constGeometryEngine();
engine->prepareGeometry();

and ALL other code paths using this same feature would already have a prepared geometry ready to go.
4. Upgrade the QgsGeometry methods like bool QgsGeometry::intersects( const QgsGeometry &geometry ) const so that instead of creating a new QgsGeos object these use the engine from QgsGeometry::constGeometryEngine (and accordingly they'll automagically start benefiting from any previous calls to prepare on this QgsGeometry).
5. Consider whether we should automatically prepare the engine in the high level calls like QgsGeometry::intersects. (Maybe with some logic so that we only do this for complex geometries?)

This would lead to HUGE speed ups across the board in QGIS, especially for geometry-based QGIS expressions, aggregates, etc.

Interested to hear your thoughts on this!

And use it from QgsVectorLayerFeatureIterator

References qgis#472

The current implementation is really just a wrapper around distance()
but opens the door for future improvements
@strk
Copy link
Contributor Author

strk commented Sep 21, 2021

@nyalldawson your solution sounds good to me, although I'm still not clear about the interaction between AbstractGeometry and Geometry and GeometryEngine at this stage. How about merging this PR and starting the refactoring work in another ?

I'm thinking that allowing non-repreparing copies of GeometryEngine could be a lighter change for the time being, which would mean using shared pointers instead of unique pointers for the geosGeom member

@nyalldawson nyalldawson merged commit acf302e into qgis:master Sep 21, 2021
@nyalldawson
Copy link
Collaborator

@strk sure, done!

I'm going to file a bug report collecting information on this, and I'll attach the datasets I was having issues with... Maybe there's something unique about them which will give further clues as to the poor performance

@strk
Copy link
Contributor Author

strk commented Sep 22, 2021

@nyalldawson that's perfect, looking forward for your dataset ! Meanwhile I'll continue with my little test to see how number of preparation can be reduced with shared_ptr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants