Skip to content

Snh/passive backup #340

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 631 commits into from
Closed

Snh/passive backup #340

wants to merge 631 commits into from

Conversation

hazzalove
Copy link

No description provided.

rubiojr and others added 30 commits May 30, 2016 10:11
Quiet audit-log restores if there are not indices
Routes for the gist repositories being restored are calculated remotely
so we only need to rsync once per storage server available.

This needs some server side support from GitHub Enterprise 2.6.3 (unreleased).

The change is backwards compatible and only affects cluster restores.
backup-utils will use the old/slower script when restoring to older
GitHub Enterprise versions.
Actually required when using `--files-from`

This reverts commit b110c66.
Simple benchmarking that logs the time it takes to restore data. Looks something like:

$ cat data/26-snap/current/benchmarks/benchmark.1464253606.log
ghe-import-mysql took: 7s
ghe-import-redis took: 43s
ghe-restore-repositories-dgit took: 11s
ghe-restore-alambic-cluster-ng took: 6s
ghe-restore-git-hooks-cluster took: 2s
ghe-restore-es-audit-log took: 0s

The log is timestamped and stored in the benchmarks directory,
along with the data that was restored.
The output is sent to fd#3, so only available when using the verbose flag (-v).

This is consistent with the other scripts using rsync and also helpful for
support/troubleshooting.
Verbose rsync when restoring gist repositories
restore host keys for cluster environment as well
Revert "restore host keys for cluster environment as well"
Restores Git over SSH (babeld) keys and distribute them to all the nodes in the cluster.

Initially fixed by #228 and later reverted in #229, this new patch prevents the SSH host keys to be replaced, breaking `ghe-restore` in the process (new SSH connections after the host key restore will fail).

Fixes https://github.com/github/backup-utils/226
Clusters: Restore Git over SSH host keys
We now use a tarball based backup/restore approach, fixing
problems with permissions when backing up and restoring user provided environments.

From @snh:

If files or directories within a hook environment are lacking the user read and/or write bits, then a number of issues arise:

* The hook environment cannot be backed up by the backup utilities, and causes a backup failure, as the backup utilities needs the user read bit to access all files and directories as the git user.

* Older backup snapshots cannot be pruned, as the backup utilities needs the user write bit to remove all files and directories as the account that is running the backup utilities.

@dbussink, @WillAbides and @snh: thanks for the new implementation
and feedback.
We now rsync once per cluster node available instead of rsyncing each
Git repository individually.

Some simple benchmarks restoring a snapshot with 1183 repositories (13 GiB):

* Using backup-utils 2.6.1

```
real    20m34.923s
user    5m1.888s
sys 2m39.983s

```

* Using the new implementation

```
real    9m0.368s
user    2m46.912s
sys 1m18.746s
```

The old implementation is able to restore ~1 repo/s so restoring backup
snapshots with a large number of repositories and a fast network will benefit
the most from this.

Here's the time it takes to restore 8K repositories (~800 MiB), all of them
very similar in size (100K, with a single README file added):

* Using backup-utils 2.6.1

```
real    111m45.370s
user    7m54.829s
sys 6m38.247s
```

* Using the new implementation

```
real    6m20.087s
user    0m16.509s
sys 1m42.616s
```

In clusters with more than 3 Git server nodes, backup-utils 2.6.1 also
restores the repositories to all the Git server nodes available. Only three
copies of a Git repository are necessary so this patch also fixes that,
speeding things up and optimizing disk usage.

/cc @github/backup-utils
Verbose rsync when using `ghe-restore` or `ghe-backup` with `-v`.

Print `ghe-hook-env-update` output when using verbose mode only.

/cc @dbussink
Clusters: speedup repositories restore
git-hooks backup/restore fixes
A patch release that includes performance improvements for cluster
restores, bug fixes and other improvements:

* git-hooks fixes #231
* Cluster: speedup repositories restore #232
* Cluster: restore Git over SSH keys #230
* Benchmark restores #219
Fixes a regression in backup-utils 2.6.2 when backing up GitHub Enterprise
clusters not using custom pre-receive Git hooks environments.
* Use headers for linkability
* Some other minor tweaks
buckelij and others added 28 commits May 26, 2017 15:09
Use default niceness for restores
Improve detection of failures in cluster backup rsync threads
Use existing Elasticsearch indices to speed up transfer during a restore
Include the user data directory in the benchmark name
Explicitly state OpenSSH in requirements
jeluhu pushed a commit that referenced this pull request Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.