-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Processing multithreaded vector #33365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/analysis/vector/abstracttool.cpp
Outdated
QgsFeatureIterator it = layer->getFeatures(); | ||
while ( it.nextFeature( currentFeature ) ) | ||
{ | ||
index.addFeature( currentFeature ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would not be better to bulk-load features from the iterator or source into spatial index instead of adding them one by one? It should be faster.
@mhugent this is great, thank you! Just a small request: please tell the QGIS community that you are working on a new feature in the future, it is a pure coincidence that I didn't start working on a similar improvement and we cannot afford duplicated efforts, do we? |
src/analysis/vector/abstracttool.cpp
Outdated
{ | ||
QgsFeature currentFeature; | ||
QgsFeatureRequest request; | ||
request.setFlags( QgsFeatureRequest::SubsetOfAttributes ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? We don't fetch any attributes here
QEventLoop evLoop; | ||
if ( feedback ) | ||
{ | ||
QObject::connect( feedback, SIGNAL( canceled() ), &fWatcher, SLOT( cancel() ) ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use new signal slot syntax.
src/analysis/vector/abstracttool.cpp
Outdated
QVector<QgsFeature *> featureList; | ||
for ( QgsFeatureId id : intersectIds ) | ||
{ | ||
QgsFeature *feature = new QgsFeature(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use a unique_ptr here.
Any reason why the class name(space)s do not have QGIS |
Please do not merge yet -- I want to do an in-depth review first |
Was this based on the existing methods from QgsOverlayUtils, or ported directly from elsewhere? I'm concerned if it's the latter case, as the QgsOverlayUtils methods have been heavily refined and well tested, and are proven robust... |
Actually looking at it, it's definitely not based on QgsOverlayUtils. I'm -1 for merge until this is updated to use the proven logic from QgsOverlayUtils. |
Very subtle @mhugent. My point still stands, you can't just dismiss all the hard work @wonder-sk and others have put into making the existing implementations rock solid. I'm definitely not against you doing this work, but stability and correct results >>> speed. |
@nyalldawson : While I agree that the existing tests should pass (PyQgsProcessingInPlace still fails, going to check why), the argument 'is not based on something' is very strange. |
Look at the code! There's multiple fixes and optimisations present in master that you've completely ignored in your code. |
In my experience, with most current open source tools (including my PR, QGIS and others), people run into problems with real world overlay operations. Maybe it helps if we use the precision parameter of geos? I need to do some tests over the christmas days to see if that helps |
Coming back to the 'constructive input'. Say which fixes and optimisations you miss and I'm happy to adapt the PR code. |
@wonder-sk travels these days but I think he would like to review these changes, please wait for his review before merging too. thanks |
I think what he is referring to is mostly code duplication which results in an additional long term maintenance overhead. (Sorry for not bringing more constructive input in here, I lack the time for an in depth review right now). |
It's not just code duplication -- it's that years of careful fixes and optimisations have been dropped by bypassing this code. @mhugent I can't really be more constructive then that -- the multithreaded implementation should be a multithreaded version of the current code, not a completely new implementation of these algorithms. When that's done I'll be happy to review in detail and provide specific comments, but until then I can't give any guidance more specific then that. |
Agreed with comments from @nyalldawson - if we need these algorithms to be multi-threaded, why not just adapt the existing implementation? The existing implementation also has the advantage of being simple and easy to reason about. There are just two "basic" algorithms: difference and intersections. The union is just 2x difference and 1x intersection, the sym.difference is 2x difference. The implementation in this PR has all the operations mixed together (as it was the case with the older processing implementation). To name a couple of improvements in the current QGIS implementation over the code in this PR:
What is the speedup that you gain with this implementation? I am also -1 to merge this PR - I think we should go the way of making the current implementation multi-threaded rather than starting from scratch with 1K of new lines of code. It is essentially just two functions (each under 100 lines) that would need to make parallelized. |
Tested if with two layers 'land coverage' and 'wood'. Both are PostGIS layers with a spatial index and the results are written to memory layer. The speedup is 2-4 times on my four-core laptop: intersection wald<->bodenbedeckung difference bodenbedeckung<->wald union bodenbedeckung<->wald symdiff wald<->Bodenbedeckung |
I'm planning to do more tests |
It is supported. In the multithreaded implementation the cancelation is done with QFuture instead of asking isCanceled() in the feature loop |
Agreed, this needs to be added
Thanks for the pointer. I'm going to test what the difference in performance is. |
Another critical improvement which the existing code has is that output results are "sanitized", automatically handling geometry collections in the correct way Honestly, just follow the advice @wonder-sk and I are giving, and port the existing implementation rather than continuing this from-scratch rewrite. It'll be much faster to get to a merge-ready state than pushing ahead with this implementation. |
Seems it is relatively straightforward to avoid the one-by-one feature fetching. I've made a quick prototype test with code that fetches the features in chunks (e.g. 100 features in one go). From this chunk, the job queue is populated and executed with multiple threads. Then the job queue is emptied and the next chunk is fetched. So the featureAtId calls are replaced by a nextFeature() - Iteration. I've tested it with an intersection of two layers, the same data in a local database and in a remote database on qgiscloud. The data has relatively few intersections so it should be a case where the IO is important (as opposite to a CPU-intensive use-case where we have many intersections and complex data): Intersection on Remote PostGIS-DB: Intersection same data local PostGIS-DB: |
Interesting - thank you for sharing these results! Sounds like a good speed gain. |
Given the promising results of the batch feature fetching, is it ok if I go on to update the PR in this direction? |
So here is the implementation with single threaded batch IO and multithreaded processing. The runtime is now a bit slower on local postgres datasources compared to the first version in this PR (since IO is single threaded), but faster on remote data sources. |
The QGIS project highly values your contribution and would love to see this work merged! Unfortunately this PR has not had any activity in the last 14 days and is being automatically marked as "stale". If you think this pull request should be merged, please check
|
While we hate to see this happen, this PR has been automatically closed because it has not had any activity in the last 21 days. If this pull request should be reconsidered, please follow the guidelines in the previous comment and reopen this pull request. Or, if you have any further questions, just ask! We love to help, and if there's anything the QGIS project can do to help push this PR forward please let us know how we can assist. |
Why was it never merged into master ? Need rework ? |
I don't know the reason it was never merged. All the feedback has been addressed and the unit tests passed |
Stale bot closed this PR :-/ But now seems more complex to merge as some processing cpp files changed |
@mhugent it has been probably lost in the mass. Can you rebase and open a new PR ? |
Basically it was never merged because it's an extremely complex change, and one which we cannot afford to have any regressions at all in. No one with the required expertise (@wonder-sk , myself or @m-kuhn) had the time required to properly review, which would likely take a good half day to do. I also have issues in the general design of the approach taken here. It cannot be extended to other processing algorithms, and I'd prefer a generic multithreading processing approach which can easily be extended to other algorithms (eg buffers, make valid, etc ) without having to have yet another multi threading implementation in qgis/processing. |
While I can't comment on the technical aspect of this matter, it seems really a pity that such important kind of improvement can't be added to QGIS considered the struggle we face since forever in having such basic operations runs at acceptable speeds with large datasets (and being bashed along the way when compared to other GIS packages). Can't you developers find some common ground and delineate a plan, eventually with the support of qgis.org? @mbernasocchi @andreasneumann @elpaso |
@gioman the situation is far from as straightforward as that. For instance, I strongly suspect that the recent thread on the mailing list relates to an operation being performed on a single very complex geometry. All the processing in this case is occurring in geos (which would explain a lack of progress updates - it's a single feature being processed). In this situation ALL software using geos will show the same lack of performance, including gdal and postgis. (Fingers crossed that the new geos overlay engine helps in cases like this!) |
@nyalldawson is there a rough timeline about when we could have QGIS shipped with that? |
@gioman it would need a new stable geos release first, and even then I'd suggest we play it safe and let the release stabilise before moving the windows builds to it. Geometry operations are just too fundamental a part of qgis, we can't use our users as a test bed for geos 😁 |
To support other tools which have a feature-by-feature processing (e.g. buffer), it is possible to create a new subclass of QgsAbstractTool (just as it is done with QgsDifferenceTool, QgsIntersectionTool, ... |
@nyalldawson this seems a polite way of saying "more than 1 year" :) Considered that we don't know how much this will improve speed I would really like to see a common push to make clip,union,intersection,difference,dissolve,buffer multi-threaded. As a user (that started to clip stuff with "ftools", so my pain comes from a long ago) I would support this effort with contribution, and I'm sure many more would do. |
This PR adds code to execute vector overlay operations (intersection, union, difference, symdifference) on multiple CPU cores. The tools are integrated into the processing framework.
Besides intersectiontool, uniontool, differencetool, symdifferencetool in this PR, we also have buffertool, sliverpolygontool.h, convexhulltool, dissolvetool. Going to integrate the latter tools in a second step.