Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should I create a pr for automated backups? #467

Closed
hungrymonkey opened this issue Apr 23, 2020 · 17 comments
Closed

Should I create a pr for automated backups? #467

hungrymonkey opened this issue Apr 23, 2020 · 17 comments

Comments

@hungrymonkey
Copy link
Contributor

hungrymonkey commented Apr 23, 2020

I saw the deficiencies section and I wonder if I should try to contribute back.

matrix-synapse-backup.service

[Unit]
Description=Backup service for Matrix Synapse

[Service]
Environment=AWS_BUCKET=s3://<your-aws-bucket>/matrix
Type=oneshot
ExecStartPre=/bin/sh -c 'docker run --rm --network=matrix \
				--env-file=/matrix/postgres/env-postgres-psql \
				postgres:12.1-alpine pg_dumpall -h matrix-postgres | gzip -c > /postgres.sql.gz'
ExecStart=/bin/sh -c 'aws s3 cp /postgres.sql.gz ${AWS_BUCKET}/$$(date +%%m-%%d-%%Y)/ && rm /postgres.sql.gz'
User=root
Group=systemd-journal

matrix-synapse-backup.timer

[Unit]
Description=Backup timer for Matrix Synapse

[Timer]
OnCalendar=Sun,Tue,Thu,Sat 02:00
Persistent=true

[Install]
WantedBy=timers.target

backup-acl.json

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::<your-bucket>",
            "Condition": {
                "ForAnyValue:IpAddress": {
                    "aws:SourceIp": [
                        "<Restrict-IP>"
                    ]
                }
            }
        },
        {
            "Sid": "VisualEditor3",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::<your-bucket>/matrix/*",
                "arn:aws:s3:::<your-bucket>/matrix"
            ],
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": "<Restrict-IP>"
                }
            }
        }
    ]
}

restore-acl.json

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::<your-bucket>",
            "Condition": {
                "ForAnyValue:IpAddress": {
                    "aws:SourceIp": [
                        "<Restrict-IP>"
                    ]
                }
            }
        },
        {
            "Sid": "VisualEditor3",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::<your-bucket>/matrix/*",
                "arn:aws:s3:::<your-bucket>/matrix"
            ],
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": "<Restrict-IP>"
                }
            }
        }
    ]
}
@hungrymonkey hungrymonkey changed the title Should I create a pr for automated backups Should I create a pr for automated backups? Apr 23, 2020
@hungrymonkey
Copy link
Contributor Author

hungrymonkey commented Apr 24, 2020

Hi,

I realize it is not so simple to add systemd timers because start.yml assumes .service.

- "ansible_facts.services[item + '.service']|default(none) is none or ansible_facts.services[item + '.service'].state != 'running'"

@spantaleev
Copy link
Owner

Separate code could be added for the timer.

We have various cronjobs already, so using cronjobs may be a better approach, instead of introducing one more mechanism.


Otherwise.. This code is pretty biased toward uploading stuff to S3, unencrypted.

Where is the aws tool coming from? Ideally, we'd also run that in a container, so that it's portable across all distributions that we support.


I wonder what we should do about backups, so that we can support more storage providers. Perhaps we should split the "backup creation" and "backup upload" parts.

I'm thinking we should have some matrix-backup role in any case, which would ultimately support multiple ways of backing up the system.

Not sure if there should only be one method to create the backup though. I'd imagine that some people would love an encrypted tarball containing everything, which could then be copied to any storage provider.. While others are file with just dumping the database periodically and rsync-ing most of /matrix to some other machine they have (or to S3, etc.).

Somewhat larger deployments likely can't afford to frequently make archives of everything (thousands of files in the Synapse media repository, etc.).

@hungrymonkey
Copy link
Contributor Author

hungrymonkey commented Apr 24, 2020

I'm thinking we should have some matrix-backup role in any case, which would ultimately support multiple ways of backing up the system.

@spantaleev I just did it as an example. To be honest, there are too many choices for backups
We have duplicity for encrypted backups, rsync for remote copy, or borg to back up everything. I decide to present one choice. I acknowledge my choice currently bias towards using an iam-role for passwordless s3 bucket access.

http://duplicity.nongnu.org/

Where is the aws tool coming from? Ideally, we'd also run that in a container, so that it's portable across all distributions that we support.

Distros already package awscli in their repos. You can run it in a container or install it. AWS CLI is designed to have little ideological policy. The command line interface should be stable forever.

https://rpmfind.net/linux/rpm2html/search.php?query=aws-cli
https://packages.debian.org/sid/awscli
https://www.archlinux.org/packages/community/any/aws-cli/

We have various cronjobs already, so using cronjobs may be a better approach, instead of introducing one more mechanism.

Is it ok to create a cron job which execute a systemd service file? In the future, I kinda want to be able to email backup failures.

@spantaleev
Copy link
Owner

It'd still be better if we refrain from installing aws-cli from package managers.

I think it's nice how the playbook currently installs some bare minimum set of packages (see the matrix-base role) and then runs everything in containers, not messing with your system. And then there's different versions in each distro's repository. While the base commands have stayed the same, there's things like "storage class", which aren't configurable on old versions, etc. It's better if we always run a new awscli in a container and not have to worry about which version is packaged by which distro, under what name, etc.

It'd probably be nice if we can support access key/secret authentication to S3, for people who don't host their server on AWS.

And then, maybe would even like configuring the endpoint, so they can point it to an S3-like alternative, like Digital Ocean Spaces. This should be fairly easy though - just another configurable switch.

@hungrymonkey
Copy link
Contributor Author

hungrymonkey commented May 3, 2020

Hi,

How do you envision rsync? I posted a sample documentation for scrutiny

matrix-backup.service.j2
[Unit]
Description=Backup service for Matrix Synapse

[Service]
{% if  matrix_backup_bucket_key_id %}
Environment=AWS_ACCESS_KEY_ID={{matrix_backup_bucket_key_id}}
Environment=AWS_SECRET_ACCESS_KEY={{matrix_backup_bucket_key_secret}}
{% endif %}
{% if  matrix_backup_bucket %}
Environment=AWS_BUCKET=s3://<your-aws-bucket>/matrix
{% endif %}

Type=oneshot
ExecStartPre=/bin/sh -c 'docker run --rm --network={{ matrix_docker_network }} \
                --env-file={{ matrix_postgres_base_path }}/env-postgres-server \
                {{ matrix_postgres_docker_image_to_use }} -h matrix-postgres | gzip -c > /tmp/postgres.sql.gz'
{% if  matrix_backup_bucket %}
ExecStart=/bin/sh -c 'docker run --rm -it {{matrix_backup_aws_cli_docker_image_to_use}} \
s3 cp /postgres.sql.gz ${AWS_BUCKET}/$$(date +%%m-%%d-%%Y)/ \
{% if matrix_backup_bucket_endpoint %} \
 --endpoint-url {{ matrix_backup_bucket_endpoint }} \
{% endif %} && rm /tmp/postgres.sql.gz'
{% endif %}
{% if  matrix_backup_rsync_target %}
ExecStart=/bin/sh -c 'rsync’ ??
{% endif %}
User=root
Group=systemd-journal
- name: Creates a matrix synapse backup cron file under /etc/cron.d
  cron:
    name: Matrix Backup Service
    weekday: "1"
    minute: "0"
    hour: "2"
    user: root
    job: "systemctl start matrix-backup.service"
    cron_file: matrix-backup

Setting up Matrix Synapse backups (optional)

This playbook installs a weekly cron backup.

Variable Table

Variables Default Example
matrix_backup_enabled false True
matrix_backup_bucket "" "s3//bucketname/prefix/"
matrix_backup_bucket_endpoint "" "https://nyc3.digitaloceanspaces.com"
matrix_backup_bucket_awscli_docker_image_latest "amazon/aws-cli:2.0.10" "amazon/aws-cli:latest"
matrix_backup_bucket_key_id "" "AKIAQIOAVK3Q4HMXL272"
matrix_backup_bucket_key_secret "" "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"
matrix_backup_rsync_target "" ??

Method 1: Rsync

??

Method 2: S3 Compatible object store

Setup: S3 compatible buckets

S3 compatible services https://en.wikipedia.org/wiki/Amazon_S3#S3_API_and_competing_services

Service Provider Costs Compatibility Endpoint
AWS S3 https://aws.amazon.com/s3/pricing/ N/A N/A
Digital Ocean Spaces https://www.digitalocean.com/pricing/#Storage https://developers.digitalocean.com/documentation/spaces/ https://<region>.digitaloceanspaces.com
Azure Blob https://azure.microsoft.com/en-us/pricing/details/storage/blobs/ https://cloudblogs.microsoft.com/opensource/2017/11/09/s3cmd-amazon-s3-compatible-apps-azure-storage/ Requires minio
Blackblaze B2 https://www.backblaze.com/b2/cloud-storage-pricing.html https://www.backblaze.com/b2/docs/s3_compatible_api.html https://s3.<region>.backblazeb2.com/
Google Cloud Storage https://cloud.google.com/storage/pricing https://cloud.google.com/storage/docs/interoperability https://storage.googleapis.com
Wasbi https://wasabi.com/s3-compatible-cloud-storage/ https://wasabi-support.zendesk.com/hc/en-us/articles/115001910791-How-do-I-use-AWS-CLI-with-Wasabi- https://s3.wasabisys.com
IBM Cloud Object Storage https://cloud.ibm.com/catalog/services/cloud-object-storage https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-aws-cli s3.<region>.cloud-object-storage.appdomain.cloud
Linode Object Storage https://www.linode.com/pricing/#row--storage https://www.linode.com/docs/platform/object-storage/bucket-versioning/ http://<region>.linodeobjects.com
Dream Hosts https://www.dreamhost.com/cloud/storage/ https://help.dreamhost.com/hc/en-us/articles/360022654971-AWS-CLI-commands-to-manage-your-DreamObjects-data https://objects-us-east-1.dream.io

Preparation

Select a S3 compatible provider.
Create S3 Bucket
Create a specialized IAM users with the permissions recorded below. For users who deployed their postgres instance on an AWS EC2, you can create attachable IAM roles instead for password less S3 access.

Backup-acl.json

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::<your-bucket>",
            "Condition": {
                "ForAnyValue:IpAddress": {
                    "aws:SourceIp": [
                        "<Restrict-IP>"
                    ]
                }
            }
        },
        {
            "Sid": "VisualEditor3",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::<your-bucket>/matrix/*",
                "arn:aws:s3:::<your-bucket>/matrix"
            ],
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": "<Restrict-IP>"
                }
            }
        }
    ]
}

Restore-acl.json

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::<your-bucket>",
            "Condition": {
                "ForAnyValue:IpAddress": {
                    "aws:SourceIp": [
                        "<Restrict-IP>"
                    ]
                }
            }
        },
        {
            "Sid": "VisualEditor3",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::<your-bucket>/matrix/*",
                "arn:aws:s3:::<your-bucket>/matrix"
            ],
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": "<Restrict-IP>"
                }
            }
        }
    ]
}

Deploy Matrix S3 Backup

Using AWS IAM Role

Set matrix_backup_enabled and matrix_backup_bucket.

Using AWS IAM User

Set matrix_backup_enabled, matrix_backup_bucket, matrix_backup_bucket_key_id, and matrix_backup_bucket_key_secret

S3 Compatible Services

Set matrix_backup_enabled, matrix_backup_bucket, matrix_backup_bucket_key_id, matrix_backup_bucket_key_secret, and matrix_backup_bucket_endpoint

@beardedlinuxgeek
Copy link

In the spirit of Synapse being selfhosted, I would prefer to see rsync instead of S3. I also think that your back up is only for the database right? It would be good to include an option to backing up files (i.e. images) as well.

@ptman
Copy link
Contributor

ptman commented Aug 14, 2020

It would indeed be nice to not need an S3 API endpoint. E.g. rsync, restic, borgbackup

@hungrymonkey
Copy link
Contributor Author

hungrymonkey commented Aug 14, 2020

I apologize for not keeping my commits up to date because I have been busy with other things.

@ptman @beardedlinuxgeek I avoided rsync, restic, and borgbackup because everyone has their own backup workflow. Would you kindly document your workflow and your proposed Ansible vars for everyone's benefit? I want to avoid creating something nobody wants because I imagined the wrong user.

@ptman
Copy link
Contributor

ptman commented Aug 14, 2020

I think there are two parts:

  1. Create a dump (not only sql, include media as well)
  2. Optionally transfer the dump to a remote location (s3, duplicity, restic, borg, rclone, ...)

For 1, I would expect one to create timestamped dumps at a given interval (1d, 1h, whatever) and clean up old ones (7d, 30d, 90d, 1y, just keep the latest one (since you mostly care about dumping and then transfer them somewhere else)).

And yes, this should be a PR, not code in comments.

@Hexalyse
Copy link

Have all the issues about a backup role been abandoned? I've seen two issues created by @hungrymonkey but nothing seems to have been merged into main branch.

What is the proposed best way of doing it? Is a backup of the Postgres database enough to be able to reinstall matrix from scratch with the playbook, import the database, and be able to access the service with all the discussions again (albeit without the media files, but I don't care about those) ?

@hungrymonkey
Copy link
Contributor Author

@Hexalyse The problem is that there isn't a best way to do backups. Unfortunately, all administrators have a personal preference and this playbook needs to be flexible enough to allow different workflows but also allow the maintainer feel comfortable to be included.

For this playbook, all you need to backup the db and figure out a method to restore it. The maintainer has added a hard dependency on Postgres so I assume it will be supported in the future.

@Hexalyse
Copy link

@hungrymonkey Sure... backups processes are opinionated. Wouldn't it be better to offer one by default, that users can use... or just leave disabled and do their own thing? Especially when you consider how many people don't set up backups, until the day something happens and ...

@hungrymonkey
Copy link
Contributor Author

The only tool that can remotely can be default is a simple rsync but you still need a separate cheap reliable storage.

@hungrymonkey
Copy link
Contributor Author

hungrymonkey commented May 8, 2021

I have been reviewing the s3 counterpart of the container used in matrix-postgres-backup. The s3 container does not seem to be maintained anymore and the variables leaks go-cron variables. I wonder if I should leak it too.

spantaleev added a commit that referenced this issue Apr 19, 2022
ksnieck pushed a commit to ksnieck/matrix-docker-ansible-deploy that referenced this issue Jul 4, 2022
@ptman
Copy link
Contributor

ptman commented Feb 27, 2023

Borg backup support exists?

@spantaleev
Copy link
Owner

I suppose that Borgbackup support is good enough and we can close this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants