Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Graph updaters shutdown #5092

Merged
merged 6 commits into from May 8, 2023

Conversation

vpaturet
Copy link
Contributor

@vpaturet vpaturet commented May 4, 2023

Summary

This PR ensures that graph updaters stop execution gracefully when the OTP server shuts down.
This is particularly important in a cloud environment where OTP instances are routinely started and stopped.

More specifically:

  • all updaters should exit their initialization sequence when receiving an interruption signal.
  • polling updaters should exit their polling loop when receiving an interruption signal,
  • listening updaters that perform retry logic when disconnected should exit their retry loop when receiving an interruption signal.

Additionally:

  • log messages during the shutdown sequence are clarified.
  • in a couple of places in the code Throwables were caught. Now Exceptions are caught instead, to prevent hiding memory issues.

Issue

None

Unit tests

Documentation

No

@codecov
Copy link

codecov bot commented May 4, 2023

Codecov Report

Patch coverage: 31.64% and project coverage change: +0.06 🎉

Comparison is base (ad3520f) 64.51% compared to head (668ca8f) 64.57%.

Additional details and impacted files
@@              Coverage Diff              @@
##             dev-2.x    #5092      +/-   ##
=============================================
+ Coverage      64.51%   64.57%   +0.06%     
- Complexity     14031    14064      +33     
=============================================
  Files           1725     1729       +4     
  Lines          67222    67374     +152     
  Branches        7200     7208       +8     
=============================================
+ Hits           43365    43507     +142     
- Misses         21434    21447      +13     
+ Partials        2423     2420       -3     
Impacted Files Coverage Δ
...anner/ext/legacygraphqlapi/LegacyGraphQLUtils.java 24.56% <0.00%> (ø)
...ygraphqlapi/datafetchers/LegacyGraphQLLegImpl.java 58.58% <0.00%> (+0.42%) ⬆️
...raphqlapi/generated/LegacyGraphQLDataFetchers.java 0.00% <ø> (ø)
...pplanner/ext/siri/SiriTimetableSnapshotSource.java 0.00% <0.00%> (ø)
...er/ext/siri/updater/SiriETGooglePubsubUpdater.java 0.00% <0.00%> (ø)
...pentripplanner/ext/siri/updater/SiriSXUpdater.java 0.00% <0.00%> (ø)
...t/siri/updater/azure/AbstractAzureSiriUpdater.java 0.00% <0.00%> (ø)
...tripplanner/ext/transmodelapi/TransmodelGraph.java 0.00% <0.00%> (ø)
...er/ext/transmodelapi/TransmodelGraphQLPlanner.java 0.00% <0.00%> (ø)
.../support/OTPProcessingTimeoutGraphQLException.java 0.00% <0.00%> (ø)
... and 29 more

... and 48 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@t2gran t2gran added this to the 2.4 milestone May 4, 2023
@vpaturet vpaturet marked this pull request as ready for review May 4, 2023 12:40
@vpaturet vpaturet requested a review from a team as a code owner May 4, 2023 12:40
@vpaturet vpaturet requested a review from t2gran May 4, 2023 12:40
Copy link
Member

@t2gran t2gran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor readability issues.

We should however, refactor this code a bit. Thread handling is difficult, and almost impossible to unit-test (often just creating a sense of false-security). So, my suggestion is to refactor the Updater interfaces, and break up the different phases in methods and move all iteration and thread handling logic to the "updater framework".

@leonardehrenfried leonardehrenfried self-requested a review May 4, 2023 14:37
vpaturet and others added 2 commits May 5, 2023 10:22
jtorin
jtorin previously approved these changes May 5, 2023
@vpaturet vpaturet requested a review from t2gran May 5, 2023 08:58
@@ -221,12 +221,12 @@ private Result<UpdateSuccess, UpdateError> apply(

/* commit */
return addTripToGraphAndBuffer(result.successValue(), journey, entityResolver);
} catch (Throwable t) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ouch!

try {
if (updaterList.stream().allMatch(GraphUpdater::isPrimed)) {
LOG.info("OTP UPDATERS INITIALIZED - OTP is ready for routing!");
return;
}
//noinspection BusyWait
Thread.sleep(1000);
} catch (Exception e) {
} catch (RuntimeException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to catch RuntimeException? Aren't those meant to stop the JVM?

Copy link
Contributor

@jtorin jtorin May 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine. RuntimeException is the base-class for all unchecked exceptions, for which lots of instances can occur during quite normal operations (like NoSuchElementException for example).

On this subject, while I agree that there are restrictions to its application, I don't think catching Throwable is automatically bad form. I tend to catch all errors on the top-level of the module or thread and log with as much pertinent information as possible from the state. If the alternative is to let the error propagate upwards to either the JVM or the execution framework (like Spring or an executor service) I rather log it myself to make sure the information is sent through the logging system in use. (Propagation of stdout/stderr is not always setup correctly.)

Catching Throwable and then continue execution is a no-no though, we'll agree on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realised that we caught Exception before which is a superclass of RuntimeException so this is catching fewer exceptions, not more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it is. Unless there is a specific reason Exception would actually be better.

(That RuntimeException inherits from Exception always trips me up.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always safe to catch RuntimeException instead of Exception since compilation would fail if the block threw any checked exception. In this part of the code catching Exception is problematic since it could swallow InterruptedException and prevent graceful shutdown. And this is exactly what happened in this method.

But I can also replace :


        catch (RuntimeException e) {
            LOG.error(e.getMessage(), e);
          } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            otpIsShuttingDown = true;
            LOG.info("OTP is shutting down, cancelling wait for updaters readiness.");
          }

by:

         catch (InterruptedException e) {
            Thread.currentThread().interrupt();
            otpIsShuttingDown = true;
            LOG.info("OTP is shutting down, cancelling wait for updaters readiness.");
          } catch (Exception e) {
            LOG.error(e.getMessage(), e);
          }

which is equivalent and maybe less confusing.
What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vpaturet I think the suggested second form is indeed clearer.

@vpaturet vpaturet added the Entur test This is currently being tested at Entur label May 8, 2023
@vpaturet vpaturet merged commit 58c18ff into opentripplanner:dev-2.x May 8, 2023
5 checks passed
t2gran pushed a commit that referenced this pull request May 8, 2023
@t2gran t2gran deleted the improve_updater_shutdown branch November 21, 2023 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Entur test This is currently being tested at Entur technical debt
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants