Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add backup support for storage buckets #241

Closed
stgraber opened this issue Nov 27, 2023 · 6 comments
Closed

Add backup support for storage buckets #241

stgraber opened this issue Nov 27, 2023 · 6 comments
Assignees
Labels
API Changes to the REST API Documentation Documentation needs updating Easy Good for new contributors Feature New feature, not a bug
Milestone

Comments

@stgraber
Copy link
Member

Storage buckets in Incus provide a basic S3 API served by either minio or the Ceph Object Gateway.

The API itself doesn't support anything regarding snapshots or backups, leaving the user to dealing with that on their own with tools like rclone.

While lightweight snapshots aren't really an option we can consistently offer, offering the option to backup a bucket should be fine. This would then add support for the usual export/import combination of commands.

@stgraber stgraber added Feature New feature, not a bug Easy Good for new contributors labels Nov 27, 2023
@stgraber
Copy link
Member Author

For this one, I'm thinking:

  • Add a /1.0/storage-pools/POOL/buckets/BUCKET/backups API similar to that provided for storage volumes
  • A bucket backup is just a tarball of the bucket's content + metadata (bucket config/keys)
  • On import, we create the bucket, apply the config/keys from the metadata and then upload all the files from the backup

For minio, we may be able to use the shortcut of directly peaking at the filesystem to avoid having to use S3 to download/upload everything, but for Ceph, I suspect we'll have to go through S3 directly.

@stgraber stgraber added Documentation Documentation needs updating API Changes to the REST API labels Nov 28, 2023
@maveonair
Copy link
Member

maveonair commented Dec 22, 2023

I am currently working on this feature to export storage buckets and have a question about the MinIO metadata.

As it turns out, MinIO stores its metadata in .minio.sys in the bucket directory of the pool, which also seems to contain the access keys. This directory would therefore be included when creating a backup of the storage bucket:

.minio.sys:

Screenshot from 2023-12-22 10-39-48

storage bucket keys:

Screenshot from 2023-12-22 10-39-39

I now assume that we can reuse the same logic to activate the bucket and extract the keys into our database as we do today when creating a new scope. Namely by using

// ImportBucket takes an existing bucket on the storage backend and ensures that the DB records
// are restored as needed to make it operational with Incus.
// Used during the recovery import stage.
func (b *backend) ImportBucket(projectName string, poolVol *backupConfig.Config, op *operations.Operation) (revert.Hook, error) {

@stgraber would that be ok from your point of view or do you see a problem if a bucket is imported and the "original" metadata of MinIO is still included?

@stgraber
Copy link
Member Author

I'd prefer not to depend on Minio's own storage for this stuff.

Instead what I had in mind is this kind of layout for the tarball:

  • /
  • /backup.yaml
  • /bucket/FILES

With backup.yaml containing a copy of both the StorageBucket struct and list of StorageBucketKeys from the database.

That way the backup is in no way MinIO specific, making it possible to import into a Ceph pool instead as well as allow us to move to something other than MinIO in the future should we want to.

@stgraber
Copy link
Member Author

That means that on restore, we'd create the DB records from the data in backup.yaml, initialize a new MinIO or Ceph-object instance, add in the keys from the DB records and then push the data back into the bucket.

@maveonair
Copy link
Member

I'd prefer not to depend on Minio's own storage for this stuff.

Instead what I had in mind is this kind of layout for the tarball:

* /

* /backup.yaml

* /bucket/FILES

With backup.yaml containing a copy of both the StorageBucket struct and list of StorageBucketKeys from the database.

That way the backup is in no way MinIO specific, making it possible to import into a Ceph pool instead as well as allow us to move to something other than MinIO in the future should we want to.

Thank you very much for your advice.

At the moment I am using interface.BackupVolume(vol Volume, tarWriter *instancewriter.InstanceTarWriter, optimized bool, snapshots []string, op *operations.Operation) error to include the actual Storage Bucket content as used by the Instance and Storage Volume backup implementation. As I understand this implementation, everything within the specified volume is simply copied into the tarball.

So to achieve the layout you mentioned, I can't reuse interface.BackupVolume to achieve the desired result? It would be great if you could give me a hint so I can re-implement something I can already use.

Thanks in advance for your help.

@stgraber
Copy link
Member Author

We're going to want two new functions in backend.go:

  • BackupBucket
  • CreateBucketFromBackup

I don't think we'll need to get new functions in the individual storage drivers as we should be able to do things at a higher level.

So basically having BackupBucket use an S3 client to suck in all the data and write it into a tarball. And CreateBucketFromBackup doing the opposite, basically hitting CreateBucket, then putting the keys in place, then using an S3 client to push all the data from the backup back into place.

This approach will work fine with both MinIO and Ceph as the common thing they both provide is the S3 API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Changes to the REST API Documentation Documentation needs updating Easy Good for new contributors Feature New feature, not a bug
Development

No branches or pull requests

2 participants