Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify 'restore-from-backup.sh' to work in multinode etcd cluster. #56692

Merged
merged 1 commit into from Dec 7, 2017
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
13 changes: 8 additions & 5 deletions cluster/restore-from-backup.sh
Expand Up @@ -62,6 +62,9 @@ ETCD_API="$(echo $VERSION_CONTENTS | cut -d '/' -f 2)"
# NOTE: NAME HAS TO BE EQUAL TO WHAT WE USE IN --name flag when starting etcd.
NAME="${NAME:-etcd-$(hostname)}"

INITIAL_CLUSTER="${INITIAL_CLUSTER:-${NAME}=http://localhost:2380}"
INITIAL_ADVERTISE_PEER_URLS="${INITIAL_ADVERTISE_PEER_URLS:-http://localhost:2380}"

# Port on which etcd is exposed.
etcd_port=2379
event_etcd_port=4002
Expand Down Expand Up @@ -101,7 +104,7 @@ wait_for_cluster_healthy() {
# Wait until etcd and apiserver pods are down.
wait_for_etcd_and_apiserver_down() {
for i in $(seq 120); do
etcd=$(docker ps | grep etcd | grep -v etcd-empty-dir | grep -v etcd-monitor | wc -l)
etcd=$(docker ps | grep etcd-server | wc -l)
apiserver=$(docker ps | grep apiserver | wc -l)
# TODO: Theoretically it is possible, that apiserver and or etcd
# are currently down, but Kubelet is now restarting them and they
Expand Down Expand Up @@ -134,6 +137,8 @@ if ! wait_for_etcd_and_apiserver_down; then
exit 1
fi

read -rsp $'Press enter when all etcd instances are down...\n'

# Create the sort of directory structure that etcd expects.
# If this directory already exists, remove it.
BACKUP_DIR="/var/tmp/backup"
Expand Down Expand Up @@ -185,15 +190,13 @@ elif [ "${ETCD_API}" == "etcd3" ]; then

# Run etcdctl snapshot restore command and wait until it is finished.
# setting with --name in the etcd manifest file and then it seems to work.
# TODO(jsz): This command may not work in case of HA.
image=$(docker run -d -v ${BACKUP_DIR}:/var/tmp/backup --env ETCDCTL_API=3 \
docker run -v ${BACKUP_DIR}:/var/tmp/backup --env ETCDCTL_API=3 \
"gcr.io/google_containers/etcd:${ETCD_VERSION}" /bin/sh -c \
"/usr/local/bin/etcdctl snapshot restore ${BACKUP_DIR}/${snapshot} --name ${NAME} --initial-cluster ${NAME}=http://localhost:2380; mv /${NAME}.etcd/member /var/tmp/backup/")
"/usr/local/bin/etcdctl snapshot restore ${BACKUP_DIR}/${snapshot} --name ${NAME} --initial-cluster ${INITIAL_CLUSTER} --initial-advertise-peer-urls ${INITIAL_ADVERTISE_PEER_URLS}; mv /${NAME}.etcd/member /var/tmp/backup/"
if [ "$?" -ne "0" ]; then
echo "Docker container didn't started correctly"
exit 1
fi
echo "Prepare container exit code: $(docker wait ${image})"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you remove this one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed '-d' from docker run so that stdout/stderr goes directly to console. It's a way easier to see if it worked and debug if it failed.

So 'docker wait' is not longer needed (if command fails, the if above will catch it)


rm -f "${BACKUP_DIR}/${snapshot}"
fi
Expand Down