New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up stale HA nodes on restore #396
Conversation
Nudgy McNudgeface @github/backup-utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@lildude Apologies for the delay here, I'll have a look at this tonight. |
bin/ghe-restore
Outdated
@@ -341,6 +341,16 @@ else | |||
fi | |||
fi | |||
|
|||
# Clean up all stale replicas | |||
if ! $CLUSTER; then | |||
inactive_nodes=$(echo "ghe-spokes server show inactive --json | jq -r '.[] | select(.host | contains(\"git-server\")).host' | sed 's/^git-server-//g'" | ghe-ssh "$GHE_HOSTNAME" -- /bin/bash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this fails to cleanup the stale HA nodes if the target instance is in an unconfigured state (and the replica(s) were online at the time of the backup).
This occurs because when unconfigured, we don't automatically run ghe-config-apply
near the end of the restore, which results in the nodes still appearing as online
in the database. It takes a ghe-config-apply
for these stale nodes to transition to a offline
state.
As an alternative, could we drop the inactive
from the ghe-spokes server show
and instead just cleanup any nodes that don't match the UUID that we restored (as found in $GHE_RESTORE_SNAPSHOT_PATH/uuid
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, good point. I completely forgot about restoring to an unconfigured instance. Your suggestion makes sense too.
bin/ghe-restore
Outdated
inactive_nodes=$(echo "ghe-spokes server show inactive --json | jq -r '.[] | select(.host | contains(\"git-server\")).host' | sed 's/^git-server-//g'" | ghe-ssh "$GHE_HOSTNAME" -- /bin/bash) | ||
if [ -n "$inactive_nodes" ]; then | ||
restored_uuid=$(cat $GHE_RESTORE_SNAPSHOT_PATH/uuid) | ||
other_nodes=$(echo "ghe-spokes server show --json | jq -r '.[] | select(.host | contains(\"git-server\")).host' | sed 's/^git-server-//g'" | ghe-ssh "$GHE_HOSTNAME" -- /bin/bash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reduce lines of code by 3, could we just add a | grep -F -x -v \"$restored_uuid\"
to the end of the large collection of piped commands here? Or would this reduce the readability too much?
@snh done, and we got tests too 😄 |
…b/backup-utils into lildude/clean-old-nodes-on-restore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
update-master-version-to-3-9
Since the change in the method used to replicate repository, pages and storage data between nodes in a high availability (HA) configuration in GitHub Enterprise 2.10, restoring a backup of the HA node could result in "extra" nodes when re-establishing replication after restoring.
This PR addresses that by cleaning up and stale node entries immediately after restoring the backup.
Fixes https://github.com/github/backup-utils/issues/390