Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not deadlock when calling when calling GetLayerVisibility #56131

Merged

Conversation

elpaso
Copy link
Contributor

@elpaso elpaso commented Feb 1, 2024

... from multiple threads.

Fix #53956

@elpaso elpaso added Expressions Related to the QGIS expression engine or specific expression functions Bug Either a bug report, or a bug fix. Let's hope for the latter! backport release-3_34 labels Feb 1, 2024
@github-actions github-actions bot added this to the 3.36.0 milestone Feb 1, 2024
@elpaso
Copy link
Contributor Author

elpaso commented Feb 1, 2024

@nyalldawson while I believe this patch is sound and valid, I am little concerned about the biggest picture.

This is deprecated https://github.com/qgis/QGIS/blob/master/src/core/expression/qgsexpressionutils.cpp#L55 but no alternative is suggested, it searches in the context's layerStore first but I couldn't find any call to QgsExpressionContextScope::addLayerStore and in practice there isn't any store in the tests I've run. Then it searches in the project instance with this comment https://github.com/qgis/QGIS/blob/master/src/core/expression/qgsexpressionutils.cpp#L119 but that's the code patch which is taken 100% of the times in the context of #53956.

Also, QgsExpressionUtils::getMapLayer is deprecated but the "thread safe" functions like runMapLayerFunctionThreadSafe appear to use the very same Qt::BlockingQueuedConnection which is the root cause of the deadlocks.

I'm a little confused about what should be done to properly assure we can retrieve a layer instance from an expression in a thread safe way.

Maybe we should consider what we intend to do with the retrieved layer, in the context of GetLayerVisibility we just use its pointer address to retrieve cached visibility info from a local cache so there is no danger of any race condition.

@nyalldawson
Copy link
Collaborator

@elpaso

Also, QgsExpressionUtils::getMapLayer is deprecated but the "thread safe" functions like runMapLayerFunctionThreadSafe appear to use the very same Qt::BlockingQueuedConnection which is the root cause of the deadlocks.

What's the wider context here? Is the expression being evaluated on the main thread? Or if not, what's the main thread itself blocked on?

@elpaso
Copy link
Contributor Author

elpaso commented Feb 2, 2024

@elpaso

Also, QgsExpressionUtils::getMapLayer is deprecated but the "thread safe" functions like runMapLayerFunctionThreadSafe appear to use the very same Qt::BlockingQueuedConnection which is the root cause of the deadlocks.

What's the wider context here? Is the expression being evaluated on the main thread? Or if not, what's the main thread itself blocked on?

Expressions for snapping point locator renderer test are all evaluated in separated threads that all get blocked by BlockingQueuedConnection, the main thread is blocked on QgsTask::waitForFinished, see:

immagine

When I pause the program after about 2 seconds from the initial click I can see about 70 threads running.

@nyalldawson
Copy link
Collaborator

Expressions for snapping point locator renderer test are all evaluated in separated threads that all get blocked by BlockingQueuedConnection, the main thread is blocked on QgsTask::waitForFinished, see:

Hmm, not simple to fix then. I'd probably try just chucking an ugly event loop in instead of that waitForFinished and see what happens.

@nyalldawson
Copy link
Collaborator

Or.... Thinking about this, should we just NOT snap if the results aren't ready so we aren't blocking at all?

@elpaso
Copy link
Contributor Author

elpaso commented Feb 2, 2024

Or.... Thinking about this, should we just NOT snap if the results aren't ready so we aren't blocking at all?

No, I think the current sequence of calls is correct, when you move the mouse on canvas after switching to digitizing mode the point locator kicks in and starts launching all the threads to collect the snapping locations.

Following your advice, this seems to work just fine:

--- a/src/core/qgspointlocator.cpp
+++ b/src/core/qgspointlocator.cpp
@@ -1008,7 +1008,14 @@ void QgsPointLocator::waitForIndexingFinished()
 {
   disconnect( mInitTask, &QgsPointLocatorInitTask::taskTerminated, this, &QgsPointLocator::onInitTaskFinished );
   disconnect( mInitTask, &QgsPointLocatorInitTask::taskCompleted, this, &QgsPointLocator::onInitTaskFinished );
-  mInitTask->waitForFinished();
+
+  const auto startTime { std::chrono::system_clock::now() };
+
+  // Same timeout of waitForFinished
+  while( mInitTask && mInitTask->isActive() && std::chrono::system_clock::now() - startTime < std::chrono::seconds( 30 ) )
+  {
+    qApp->processEvents();
+  }

It will probably crash if it times out but this is no different than it was without the patch.

I think we should merge this PR as it is because it fixes at least a few scenarios and provides a faster/safer implementation for the specific use case with no drawbacks.

The patch attached above fixes the general problem of running the locators in separate threads (which we cannot avoid) without deadlocks caused by function evaluations delegated to the main thread.

If you feel confident about the patch I can either make a separate PR or push it here.

@elpaso elpaso force-pushed the bugfix-gh53956-GetLayerVisibility-deadlock branch from 71b0bd9 to 1b71b39 Compare February 5, 2024 14:22
@elpaso
Copy link
Contributor Author

elpaso commented Feb 5, 2024

@nyalldawson took your advice and looking at the client code it seems that somehow relaxed wasn't passed to (only) some of the functions.

This updated PR should fix all similar scenarios.

Copy link

github-actions bot commented Feb 5, 2024

🪟 Windows builds ready!

Windows builds of this PR are available for testing here. Debug symbols for this build are available here.

(Built from commit fd45672)

@elpaso elpaso enabled auto-merge February 6, 2024 11:12
@elpaso elpaso merged commit b29e3fd into qgis:master Feb 6, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport release-3_34 Bug Either a bug report, or a bug fix. Let's hope for the latter! Expressions Related to the QGIS expression engine or specific expression functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QGis crashes when digitizing with snapping
2 participants