Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: backup and restore scripts #72

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,6 @@ config/**/*

data/**/*
!data/.gitkeep

## backup directories
backup/
174 changes: 174 additions & 0 deletions bin/backup
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
#! /usr/bin/env bash

set -euo pipefail

#### Detect Toolkit Project Root ####
# if realpath is not available, create a semi-equivalent function
command -v realpath >/dev/null 2>&1 || realpath() {
[[ $1 = /* ]] && echo "$1" || echo "$PWD/${1#./}"
}
SCRIPT_PATH="$(realpath "${BASH_SOURCE[0]}")"
SCRIPT_DIR="$(dirname "$SCRIPT_PATH")"
TOOLKIT_ROOT="$(realpath "$SCRIPT_DIR/..")"
if [[ ! -d "$TOOLKIT_ROOT/bin" ]] || [[ ! -d "$TOOLKIT_ROOT/config" ]]; then
echo "ERROR: could not find root of overleaf-toolkit project (inferred project root as '$TOOLKIT_ROOT')"
exit 1
fi

TMP_ROOT_DIR="$TOOLKIT_ROOT/tmp"

IS_SERVER_PRO="$(grep -q 'SERVER_PRO=true' \
"$TOOLKIT_ROOT/config/overleaf.rc" && echo 'true' || echo 'false')"

function usage() {
cat <<EOF
Usage: bin/backup

Makes a backup of the data in this installation, and writes it to a
timestamped tar.gz file in the ./backup/ directory.

This file can then be consumed by the bin/restore script.

EOF
}

function wait-for-mongo () {
while ! "$TOOLKIT_ROOT/bin/docker-compose" exec -T mongo \
mongo --eval "db.version()" \
> /dev/null; do echo '[mongo is not ready]' && sleep 1; done
echo '[mongo is ready]'
}

function create-tmp-dir () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could mktemp be used here?

local now
now="$(date '+%F-%H%M%S')"
local random_part
random_part="$(head -c 8 /dev/urandom | md5sum | cut -c 1-4)"
if ! [[ -d "$TMP_ROOT_DIR" ]]; then
mkdir "$TMP_ROOT_DIR"
fi
local tmp_dir="$TMP_ROOT_DIR/backup-$now-$random_part"
if [[ -d "$tmp_dir" ]]; then
echo "Error: temp directory '$tmp_dir' already exists" >&2
exit 1
fi
mkdir -p "$tmp_dir/backup"
echo "$tmp_dir"
}

function get-container-name () {
local name="$1"
"$TOOLKIT_ROOT/bin/docker-compose" ps | grep "$name" | cut -d ' ' -f 1 | head -n 1
}

function dump-mongo () {
local tmp_dir="$1"
local mongo_tmp_dir="$tmp_dir/backup/mongo"
mkdir "$mongo_tmp_dir"

"$TOOLKIT_ROOT/bin/docker-compose" up -d mongo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we handle the case where there is an external Mongo DB here? I think there will be people running without the internal DBs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, would be useful to have a --skip-mongo option for both backup/restore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, if we go with external redis, it could also have a --skip-redis

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that we could look at the config in some way and back up the external mongo/redis using the same mechanism. Given that we've already checked the prerequisites for making a backup (the service is stopped so nothing will be in-flight) it makes sense to roll up the backup from the external DB at the same time. WDYT?

wait-for-mongo

# shellcheck disable=SC1004
"$TOOLKIT_ROOT/bin/docker-compose" exec mongo bash -lc '\
[[ -d /tmp/dump ]] && rm -rf /tmp/dump; \
cd /tmp && mongodump --quiet;'

docker cp "$(get-container-name mongo)":/tmp/dump \
"$mongo_tmp_dir/dump"

if [[ ! -d "$mongo_tmp_dir/dump" ]]; then
echo "Error: did not get mongo backup" >&2
exit 1
fi

# shellcheck disable=SC1004
"$TOOLKIT_ROOT/bin/docker-compose" exec mongo bash -lc '\
rm -rf /tmp/dump;'
}

function copy-data-files () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are data files really needed for the backup? Most of them would be intermediate compile results right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that there would be e.g. filestore data here?

local tmp_dir="$1"
local sharelatex_tmp_dir="$tmp_dir/backup/data/sharelatex"
mkdir -p "$sharelatex_tmp_dir"

rsync -a "$TOOLKIT_ROOT/data/sharelatex/" "$sharelatex_tmp_dir"
}

function backup-tar () {
local tmp_dir="$1"
local backup_name
backup_name="$(basename "$tmp_dir")"
local tar_file="$TOOLKIT_ROOT/backup/${backup_name}.tar.gz"
echo "Writing backup to backup/$(basename "$tar_file")"
pushd "$tmp_dir" 1>/dev/null
tar zcvf "$tar_file" backup info.txt > /dev/null
popd 1>/dev/null
}

function write-info-file () {
local tmp_dir="$1"
cat <<EOF > "$tmp_dir/info.txt"
Backup info:
- time: $(date '+%F-%H%M%S')
- user: $(whoami)
- server pro: $IS_SERVER_PRO
EOF

}

function _main() {
## Help, and such
if [[ "${1:-null}" == '--help' ]] || [[ "${1:-null}" == "help" ]]; then
usage
exit 0
fi

## Get a temp directory
local tmp_dir
tmp_dir="$(create-tmp-dir)"
echo "Using temp directory: $tmp_dir"

## Stop docker services
echo "Stopping docker-compose services..."
"$TOOLKIT_ROOT/bin/docker-compose" stop 2>/dev/null

## Dump mongo
echo "Dumping mongo..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to back up redis?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have gone back and forth on this, the docs used to say it wasn't needed but were changed earlier this year: https://github.com/overleaf/overleaf/wiki/Backup-of-Data/_compare/dd55f820bf868464db70cbf8f584a608b6425c1b...f07e059281545c18907f9971e0fd18d9492b0127

I believe it's the case that if the editor is closed, users are disconnected, and then everything is shut down cleanly (in that order), nothing left in Redis is absolutely essential. But since we don't currently have a way to close the editor and disconnect users from a script, it seems safer to back up both, given various problems that admins have run into on support. (I think scripting the close editor and disconnect users actions was being discussed elsewhere – if we add that to the backup script then maybe Redis backup is indeed not needed.)

dump-mongo "$tmp_dir"

## Copy data files
echo "Copying data/ files..."
copy-data-files "$tmp_dir"

## Add info file
echo "Writing info file..."
write-info-file "$tmp_dir"

## Prepare backup directory
[[ ! -d "$TOOLKIT_ROOT/backup" ]] && mkdir "$TOOLKIT_ROOT/backup"

## Archive structure:
## - backup/
## - mongo/
## - data/
## - ...
## - info.txt

## Create backup archive
echo "Creating tar.gz archive..."
backup-tar "$tmp_dir"

## Clean up temp dir
echo "Removing temp files..."
rm -rf "$tmp_dir"

## Stop docker services
echo "Stopping docker-compose services..."
"$TOOLKIT_ROOT/bin/docker-compose" stop 2>/dev/null

echo "Done"
exit 0
}

_main "$@"
1 change: 1 addition & 0 deletions bin/doctor
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ function check_dependencies() {
perl
awk
openssl
rsync
)

for binary in "${binaries[@]}"; do
Expand Down