Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up stale HA nodes on restore #396

Merged
merged 12 commits into from May 21, 2018
Merged

Conversation

lildude
Copy link
Member

@lildude lildude commented Apr 26, 2018

Since the change in the method used to replicate repository, pages and storage data between nodes in a high availability (HA) configuration in GitHub Enterprise 2.10, restoring a backup of the HA node could result in "extra" nodes when re-establishing replication after restoring.

This PR addresses that by cleaning up and stale node entries immediately after restoring the backup.

Fixes https://github.com/github/backup-utils/issues/390

@lildude lildude requested a review from a team April 26, 2018 13:12
@lildude
Copy link
Member Author

lildude commented May 4, 2018

Nudgy McNudgeface @github/backup-utils

Copy link

@anth1y anth1y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@snh snh self-requested a review May 14, 2018 08:29
@snh
Copy link
Member

snh commented May 14, 2018

@lildude Apologies for the delay here, I'll have a look at this tonight.

bin/ghe-restore Outdated
@@ -341,6 +341,16 @@ else
fi
fi

# Clean up all stale replicas
if ! $CLUSTER; then
inactive_nodes=$(echo "ghe-spokes server show inactive --json | jq -r '.[] | select(.host | contains(\"git-server\")).host' | sed 's/^git-server-//g'" | ghe-ssh "$GHE_HOSTNAME" -- /bin/bash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this fails to cleanup the stale HA nodes if the target instance is in an unconfigured state (and the replica(s) were online at the time of the backup).

This occurs because when unconfigured, we don't automatically run ghe-config-apply near the end of the restore, which results in the nodes still appearing as online in the database. It takes a ghe-config-apply for these stale nodes to transition to a offline state.

As an alternative, could we drop the inactive from the ghe-spokes server show and instead just cleanup any nodes that don't match the UUID that we restored (as found in $GHE_RESTORE_SNAPSHOT_PATH/uuid)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, good point. I completely forgot about restoring to an unconfigured instance. Your suggestion makes sense too.

bin/ghe-restore Outdated
inactive_nodes=$(echo "ghe-spokes server show inactive --json | jq -r '.[] | select(.host | contains(\"git-server\")).host' | sed 's/^git-server-//g'" | ghe-ssh "$GHE_HOSTNAME" -- /bin/bash)
if [ -n "$inactive_nodes" ]; then
restored_uuid=$(cat $GHE_RESTORE_SNAPSHOT_PATH/uuid)
other_nodes=$(echo "ghe-spokes server show --json | jq -r '.[] | select(.host | contains(\"git-server\")).host' | sed 's/^git-server-//g'" | ghe-ssh "$GHE_HOSTNAME" -- /bin/bash)
Copy link
Member

@snh snh May 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reduce lines of code by 3, could we just add a | grep -F -x -v \"$restored_uuid\" to the end of the large collection of piped commands here? Or would this reduce the readability too much?

@lildude
Copy link
Member Author

lildude commented May 17, 2018

@snh done, and we got tests too 😄

Copy link
Member

@snh snh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

@lildude lildude merged commit dda5249 into master May 21, 2018
@lildude lildude deleted the lildude/clean-old-nodes-on-restore branch May 21, 2018 13:48
@lildude lildude mentioned this pull request Jun 22, 2018
pluehne pushed a commit to pluehne/backup-utils that referenced this pull request Aug 8, 2023
@bonsohi bonsohi mentioned this pull request Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants