Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reattempt Dqlite start-up instead of worker restart #16129

Merged
merged 4 commits into from Aug 21, 2023

Conversation

manadart
Copy link
Member

@manadart manadart commented Aug 21, 2023

JUJU-4510

We previously introduced back-stop behaviour for the Dqlite cluster whereby if we fail to start the local node, we request API server details and wait. If we get a message indicating that we are the last remaining node, we reconfigure the cluster. However, if we get a message indicating other cluster members, we return an error from the worker, resulting in a restart by the dependency engine.

It turns out it is possible to get into the latter situation when Dqlite is starting and does not process cluster changes quickly enough. This is under investigation, but it makes more sense just to retry starting Dqlite instead of throwing an error.

The same behaviour will result, but with less disruption to the worker graph. It may also speed entry into HA.

Included are some cherry picks from main for test reorganisation.

QA steps

This cannot be replicated consistently. When enabling HA, if establishing the cluster takes more than a minute, you will see the log message unable to reconcile current controller and Dqlite cluster status; reattempting node start-up instead of the worker returning an error.

Documentation changes

None.

Bug reference

In service of https://bugs.launchpad.net/juju/+bug/2015371.

this because we can access the dbReady channel directly.
There are only 2 tests the result in a call to handover, to we just add
the expectation to those.
server detail messages, we do not bounce the dbaccessor worker for
messages indicating other cluster members.

Instead we try again to start Dqlite, potentially going through the same
workflow.
@manadart manadart changed the title 3.2 dbaccessor tests Reattempt Dqlite start-up instead of worker restart Aug 21, 2023
Copy link
Member

@SimonRichardson SimonRichardson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

@manadart
Copy link
Member Author

/merge

@manadart
Copy link
Member Author

/build

@jujubot jujubot merged commit 6fa7051 into juju:3.2 Aug 21, 2023
19 of 21 checks passed
@manadart manadart deleted the 3.2-dbaccessor-tests branch August 21, 2023 12:05
@manadart manadart mentioned this pull request Aug 21, 2023
jujubot added a commit that referenced this pull request Aug 22, 2023
#16133

Zero-conflict merge to bring forward a single patch:
#16129
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants