Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up stale HA nodes on restore #396

Merged
merged 12 commits into from May 21, 2018
10 changes: 10 additions & 0 deletions bin/ghe-restore
Expand Up @@ -341,6 +341,16 @@ else
fi
fi

# Clean up all stale replicas
if ! $CLUSTER; then
inactive_nodes=$(echo "ghe-spokes server show inactive --json | jq -r '.[] | select(.host | contains(\"git-server\")).host' | sed 's/^git-server-//g'" | ghe-ssh "$GHE_HOSTNAME" -- /bin/bash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this fails to cleanup the stale HA nodes if the target instance is in an unconfigured state (and the replica(s) were online at the time of the backup).

This occurs because when unconfigured, we don't automatically run ghe-config-apply near the end of the restore, which results in the nodes still appearing as online in the database. It takes a ghe-config-apply for these stale nodes to transition to a offline state.

As an alternative, could we drop the inactive from the ghe-spokes server show and instead just cleanup any nodes that don't match the UUID that we restored (as found in $GHE_RESTORE_SNAPSHOT_PATH/uuid)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, good point. I completely forgot about restoring to an unconfigured instance. Your suggestion makes sense too.

if [ -n "$inactive_nodes" ]; then
echo "Cleaning up stale nodes ..."
for uuid in $inactive_nodes; do
ghe-ssh "$GHE_HOSTNAME" -- "/usr/local/share/enterprise/ghe-cluster-cleanup-node $uuid" 1>&3
done
fi
fi

# Update the remote status to "complete". This has to happen before importing
# ssh host keys because subsequent commands will fail due to the host key
Expand Down