Should I create a pr for automated backups? #467

hungrymonkey · 2020-04-23T13:44:13Z

I saw the deficiencies section and I wonder if I should try to contribute back.

matrix-synapse-backup.service

[Unit]
Description=Backup service for Matrix Synapse

[Service]
Environment=AWS_BUCKET=s3://<your-aws-bucket>/matrix
Type=oneshot
ExecStartPre=/bin/sh -c 'docker run --rm --network=matrix \
				--env-file=/matrix/postgres/env-postgres-psql \
				postgres:12.1-alpine pg_dumpall -h matrix-postgres | gzip -c > /postgres.sql.gz'
ExecStart=/bin/sh -c 'aws s3 cp /postgres.sql.gz ${AWS_BUCKET}/$$(date +%%m-%%d-%%Y)/ && rm /postgres.sql.gz'
User=root
Group=systemd-journal

matrix-synapse-backup.timer

[Unit]
Description=Backup timer for Matrix Synapse

[Timer]
OnCalendar=Sun,Tue,Thu,Sat 02:00
Persistent=true

[Install]
WantedBy=timers.target

backup-acl.json

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::<your-bucket>",
            "Condition": {
                "ForAnyValue:IpAddress": {
                    "aws:SourceIp": [
                        "<Restrict-IP>"
                    ]
                }
            }
        },
        {
            "Sid": "VisualEditor3",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::<your-bucket>/matrix/*",
                "arn:aws:s3:::<your-bucket>/matrix"
            ],
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": "<Restrict-IP>"
                }
            }
        }
    ]
}

restore-acl.json

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::<your-bucket>",
            "Condition": {
                "ForAnyValue:IpAddress": {
                    "aws:SourceIp": [
                        "<Restrict-IP>"
                    ]
                }
            }
        },
        {
            "Sid": "VisualEditor3",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::<your-bucket>/matrix/*",
                "arn:aws:s3:::<your-bucket>/matrix"
            ],
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": "<Restrict-IP>"
                }
            }
        }
    ]
}

The text was updated successfully, but these errors were encountered:

hungrymonkey · 2020-04-24T11:35:01Z

Hi,

I realize it is not so simple to add systemd timers because start.yml assumes .service.

matrix-docker-ansible-deploy/roles/matrix-common-after/tasks/start.yml

Line 43 in 9cc0c59

    
           - "ansible_facts.services[item + '.service']|default(none) is none or ansible_facts.services[item + '.service'].state != 'running'"

spantaleev · 2020-04-24T13:28:56Z

Separate code could be added for the timer.

We have various cronjobs already, so using cronjobs may be a better approach, instead of introducing one more mechanism.

Otherwise.. This code is pretty biased toward uploading stuff to S3, unencrypted.

Where is the aws tool coming from? Ideally, we'd also run that in a container, so that it's portable across all distributions that we support.

I wonder what we should do about backups, so that we can support more storage providers. Perhaps we should split the "backup creation" and "backup upload" parts.

I'm thinking we should have some matrix-backup role in any case, which would ultimately support multiple ways of backing up the system.

Not sure if there should only be one method to create the backup though. I'd imagine that some people would love an encrypted tarball containing everything, which could then be copied to any storage provider.. While others are file with just dumping the database periodically and rsync-ing most of /matrix to some other machine they have (or to S3, etc.).

Somewhat larger deployments likely can't afford to frequently make archives of everything (thousands of files in the Synapse media repository, etc.).

hungrymonkey · 2020-04-24T15:26:56Z

I'm thinking we should have some matrix-backup role in any case, which would ultimately support multiple ways of backing up the system.

@spantaleev I just did it as an example. To be honest, there are too many choices for backups
We have duplicity for encrypted backups, rsync for remote copy, or borg to back up everything. I decide to present one choice. I acknowledge my choice currently bias towards using an iam-role for passwordless s3 bucket access.

http://duplicity.nongnu.org/

Where is the aws tool coming from? Ideally, we'd also run that in a container, so that it's portable across all distributions that we support.

Distros already package awscli in their repos. You can run it in a container or install it. AWS CLI is designed to have little ideological policy. The command line interface should be stable forever.

https://rpmfind.net/linux/rpm2html/search.php?query=aws-cli
https://packages.debian.org/sid/awscli
https://www.archlinux.org/packages/community/any/aws-cli/

We have various cronjobs already, so using cronjobs may be a better approach, instead of introducing one more mechanism.

Is it ok to create a cron job which execute a systemd service file? In the future, I kinda want to be able to email backup failures.

spantaleev · 2020-04-25T07:27:43Z

It'd still be better if we refrain from installing aws-cli from package managers.

I think it's nice how the playbook currently installs some bare minimum set of packages (see the matrix-base role) and then runs everything in containers, not messing with your system. And then there's different versions in each distro's repository. While the base commands have stayed the same, there's things like "storage class", which aren't configurable on old versions, etc. It's better if we always run a new awscli in a container and not have to worry about which version is packaged by which distro, under what name, etc.

It'd probably be nice if we can support access key/secret authentication to S3, for people who don't host their server on AWS.

And then, maybe would even like configuring the endpoint, so they can point it to an S3-like alternative, like Digital Ocean Spaces. This should be fairly easy though - just another configurable switch.

hungrymonkey · 2020-05-03T14:15:42Z

Hi,

How do you envision rsync? I posted a sample documentation for scrutiny

matrix-backup.service.j2
[Unit]
Description=Backup service for Matrix Synapse

[Service]
{% if  matrix_backup_bucket_key_id %}
Environment=AWS_ACCESS_KEY_ID={{matrix_backup_bucket_key_id}}
Environment=AWS_SECRET_ACCESS_KEY={{matrix_backup_bucket_key_secret}}
{% endif %}
{% if  matrix_backup_bucket %}
Environment=AWS_BUCKET=s3://<your-aws-bucket>/matrix
{% endif %}

Type=oneshot
ExecStartPre=/bin/sh -c 'docker run --rm --network={{ matrix_docker_network }} \
                --env-file={{ matrix_postgres_base_path }}/env-postgres-server \
                {{ matrix_postgres_docker_image_to_use }} -h matrix-postgres | gzip -c > /tmp/postgres.sql.gz'
{% if  matrix_backup_bucket %}
ExecStart=/bin/sh -c 'docker run --rm -it {{matrix_backup_aws_cli_docker_image_to_use}} \
s3 cp /postgres.sql.gz ${AWS_BUCKET}/$$(date +%%m-%%d-%%Y)/ \
{% if matrix_backup_bucket_endpoint %} \
 --endpoint-url {{ matrix_backup_bucket_endpoint }} \
{% endif %} && rm /tmp/postgres.sql.gz'
{% endif %}
{% if  matrix_backup_rsync_target %}
ExecStart=/bin/sh -c 'rsync’ ??
{% endif %}
User=root
Group=systemd-journal

- name: Creates a matrix synapse backup cron file under /etc/cron.d
  cron:
    name: Matrix Backup Service
    weekday: "1"
    minute: "0"
    hour: "2"
    user: root
    job: "systemctl start matrix-backup.service"
    cron_file: matrix-backup

Setting up Matrix Synapse backups (optional)

This playbook installs a weekly cron backup.

Variable Table

Variables	Default	Example
matrix_backup_enabled	false	True
matrix_backup_bucket	""	"s3//bucketname/prefix/"
matrix_backup_bucket_endpoint	""	"https://nyc3.digitaloceanspaces.com"
matrix_backup_bucket_awscli_docker_image_latest	"amazon/aws-cli:2.0.10"	"amazon/aws-cli:latest"
matrix_backup_bucket_key_id	""	"AKIAQIOAVK3Q4HMXL272"
matrix_backup_bucket_key_secret	""	"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"
matrix_backup_rsync_target	""	??

Method 1: Rsync

??

Method 2: S3 Compatible object store

Setup: S3 compatible buckets

S3 compatible services https://en.wikipedia.org/wiki/Amazon_S3#S3_API_and_competing_services

Service Provider	Costs	Compatibility	Endpoint
AWS S3	https://aws.amazon.com/s3/pricing/	N/A	N/A
Digital Ocean Spaces	https://www.digitalocean.com/pricing/#Storage	https://developers.digitalocean.com/documentation/spaces/	`https://<region>.digitaloceanspaces.com`
Azure Blob	https://azure.microsoft.com/en-us/pricing/details/storage/blobs/	https://cloudblogs.microsoft.com/opensource/2017/11/09/s3cmd-amazon-s3-compatible-apps-azure-storage/	Requires minio
Blackblaze B2	https://www.backblaze.com/b2/cloud-storage-pricing.html	https://www.backblaze.com/b2/docs/s3_compatible_api.html	`https://s3.<region>.backblazeb2.com/`
Google Cloud Storage	https://cloud.google.com/storage/pricing	https://cloud.google.com/storage/docs/interoperability	`https://storage.googleapis.com`
Wasbi	https://wasabi.com/s3-compatible-cloud-storage/	https://wasabi-support.zendesk.com/hc/en-us/articles/115001910791-How-do-I-use-AWS-CLI-with-Wasabi-	`https://s3.wasabisys.com`
IBM Cloud Object Storage	https://cloud.ibm.com/catalog/services/cloud-object-storage	https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-aws-cli	`s3.<region>.cloud-object-storage.appdomain.cloud`
Linode Object Storage	https://www.linode.com/pricing/#row--storage	https://www.linode.com/docs/platform/object-storage/bucket-versioning/	`http://<region>.linodeobjects.com`
Dream Hosts	https://www.dreamhost.com/cloud/storage/	https://help.dreamhost.com/hc/en-us/articles/360022654971-AWS-CLI-commands-to-manage-your-DreamObjects-data	https://objects-us-east-1.dream.io

Preparation

Select a S3 compatible provider.
Create S3 Bucket
Create a specialized IAM users with the permissions recorded below. For users who deployed their postgres instance on an AWS EC2, you can create attachable IAM roles instead for password less S3 access.

Backup-acl.json

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::<your-bucket>",
            "Condition": {
                "ForAnyValue:IpAddress": {
                    "aws:SourceIp": [
                        "<Restrict-IP>"
                    ]
                }
            }
        },
        {
            "Sid": "VisualEditor3",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::<your-bucket>/matrix/*",
                "arn:aws:s3:::<your-bucket>/matrix"
            ],
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": "<Restrict-IP>"
                }
            }
        }
    ]
}

Restore-acl.json

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::<your-bucket>",
            "Condition": {
                "ForAnyValue:IpAddress": {
                    "aws:SourceIp": [
                        "<Restrict-IP>"
                    ]
                }
            }
        },
        {
            "Sid": "VisualEditor3",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:AbortMultipartUpload"
            ],
            "Resource": [
                "arn:aws:s3:::<your-bucket>/matrix/*",
                "arn:aws:s3:::<your-bucket>/matrix"
            ],
            "Condition": {
                "IpAddress": {
                    "aws:SourceIp": "<Restrict-IP>"
                }
            }
        }
    ]
}

Deploy Matrix S3 Backup

Using AWS IAM Role

Set matrix_backup_enabled and matrix_backup_bucket.

Using AWS IAM User

Set matrix_backup_enabled, matrix_backup_bucket, matrix_backup_bucket_key_id, and matrix_backup_bucket_key_secret

S3 Compatible Services

Set matrix_backup_enabled, matrix_backup_bucket, matrix_backup_bucket_key_id, matrix_backup_bucket_key_secret, and matrix_backup_bucket_endpoint

beardedlinuxgeek · 2020-08-14T02:31:40Z

In the spirit of Synapse being selfhosted, I would prefer to see rsync instead of S3. I also think that your back up is only for the database right? It would be good to include an option to backing up files (i.e. images) as well.

ptman · 2020-08-14T04:13:12Z

It would indeed be nice to not need an S3 API endpoint. E.g. rsync, restic, borgbackup

hungrymonkey · 2020-08-14T06:43:49Z

I apologize for not keeping my commits up to date because I have been busy with other things.

@ptman @beardedlinuxgeek I avoided rsync, restic, and borgbackup because everyone has their own backup workflow. Would you kindly document your workflow and your proposed Ansible vars for everyone's benefit? I want to avoid creating something nobody wants because I imagined the wrong user.

ptman · 2020-08-14T07:38:53Z

I think there are two parts:

Create a dump (not only sql, include media as well)
Optionally transfer the dump to a remote location (s3, duplicity, restic, borg, rclone, ...)

For 1, I would expect one to create timestamped dumps at a given interval (1d, 1h, whatever) and clean up old ones (7d, 30d, 90d, 1y, just keep the latest one (since you mostly care about dumping and then transfer them somewhere else)).

And yes, this should be a PR, not code in comments.

Hexalyse · 2021-03-10T18:18:52Z

Have all the issues about a backup role been abandoned? I've seen two issues created by @hungrymonkey but nothing seems to have been merged into main branch.

What is the proposed best way of doing it? Is a backup of the Postgres database enough to be able to reinstall matrix from scratch with the playbook, import the database, and be able to access the service with all the discussions again (albeit without the media files, but I don't care about those) ?

hungrymonkey · 2021-03-10T18:30:40Z

@Hexalyse The problem is that there isn't a best way to do backups. Unfortunately, all administrators have a personal preference and this playbook needs to be flexible enough to allow different workflows but also allow the maintainer feel comfortable to be included.

For this playbook, all you need to backup the db and figure out a method to restore it. The maintainer has added a hard dependency on Postgres so I assume it will be supported in the future.

Hexalyse · 2021-03-10T21:38:25Z

@hungrymonkey Sure... backups processes are opinionated. Wouldn't it be better to offer one by default, that users can use... or just leave disabled and do their own thing? Especially when you consider how many people don't set up backups, until the day something happens and ...

hungrymonkey · 2021-03-10T21:49:19Z

The only tool that can remotely can be default is a simple rsync but you still need a separate cheap reliable storage.

hungrymonkey · 2021-05-08T01:33:33Z

I have been reviewing the s3 counterpart of the container used in matrix-postgres-backup. The s3 container does not seem to be maintained anymore and the variables leaks go-cron variables. I wonder if I should leak it too.

ptman · 2021-12-09T13:07:59Z

https://github.com/spantaleev/matrix-docker-ansible-deploy/blob/master/docs/configuring-playbook-postgres-backup.md

Related to: - #1727 - #1754 - #1755 - #467

Related to: - spantaleev#1727 - spantaleev#1754 - spantaleev#1755 - spantaleev#467

ptman · 2023-02-27T13:45:57Z

Borg backup support exists?

spantaleev · 2023-02-27T13:49:15Z

I suppose that Borgbackup support is good enough and we can close this!

hungrymonkey changed the title ~~Should I create a pr for automated backups~~ Should I create a pr for automated backups? Apr 23, 2020

hungrymonkey mentioned this issue May 23, 2020

Added matrix backup service #516

Closed

spantaleev added a commit that referenced this issue Apr 19, 2022

Announce borg backup support

295ef29

Related to: - #1727 - #1754 - #1755 - #467

ksnieck pushed a commit to ksnieck/matrix-docker-ansible-deploy that referenced this issue Jul 4, 2022

Announce borg backup support

55645ae

Related to: - spantaleev#1727 - spantaleev#1754 - spantaleev#1755 - spantaleev#467

spantaleev closed this as completed Feb 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should I create a pr for automated backups? #467

Should I create a pr for automated backups? #467

hungrymonkey commented Apr 23, 2020 •

edited

hungrymonkey commented Apr 24, 2020 •

edited

spantaleev commented Apr 24, 2020

hungrymonkey commented Apr 24, 2020 •

edited

spantaleev commented Apr 25, 2020

hungrymonkey commented May 3, 2020 •

edited

beardedlinuxgeek commented Aug 14, 2020

ptman commented Aug 14, 2020

hungrymonkey commented Aug 14, 2020 •

edited

ptman commented Aug 14, 2020

Hexalyse commented Mar 10, 2021

hungrymonkey commented Mar 10, 2021

Hexalyse commented Mar 10, 2021

hungrymonkey commented Mar 10, 2021

hungrymonkey commented May 8, 2021 •

edited

ptman commented Dec 9, 2021

ptman commented Feb 27, 2023

spantaleev commented Feb 27, 2023

Should I create a pr for automated backups? #467

Should I create a pr for automated backups? #467

Comments

hungrymonkey commented Apr 23, 2020 • edited

hungrymonkey commented Apr 24, 2020 • edited

spantaleev commented Apr 24, 2020

hungrymonkey commented Apr 24, 2020 • edited

spantaleev commented Apr 25, 2020

hungrymonkey commented May 3, 2020 • edited

Setting up Matrix Synapse backups (optional)

Variable Table

Method 1: Rsync

Method 2: S3 Compatible object store

S3 compatible services https://en.wikipedia.org/wiki/Amazon_S3#S3_API_and_competing_services

Preparation

Deploy Matrix S3 Backup

Using AWS IAM Role

Using AWS IAM User

S3 Compatible Services

beardedlinuxgeek commented Aug 14, 2020

ptman commented Aug 14, 2020

hungrymonkey commented Aug 14, 2020 • edited

ptman commented Aug 14, 2020

Hexalyse commented Mar 10, 2021

hungrymonkey commented Mar 10, 2021

Hexalyse commented Mar 10, 2021

hungrymonkey commented Mar 10, 2021

hungrymonkey commented May 8, 2021 • edited

ptman commented Dec 9, 2021

ptman commented Feb 27, 2023

spantaleev commented Feb 27, 2023

hungrymonkey commented Apr 23, 2020 •

edited

hungrymonkey commented Apr 24, 2020 •

edited

hungrymonkey commented Apr 24, 2020 •

edited

hungrymonkey commented May 3, 2020 •

edited

hungrymonkey commented Aug 14, 2020 •

edited

hungrymonkey commented May 8, 2021 •

edited