Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mouting ZFS snapshot from S3 #32

Open
ad-m opened this issue Oct 7, 2019 · 1 comment
Open

Mouting ZFS snapshot from S3 #32

ad-m opened this issue Oct 7, 2019 · 1 comment

Comments

@ad-m
Copy link

ad-m commented Oct 7, 2019

The performed ZFS snapshot is not subject to the next modification. Snapshots are a set of blocks that have been modified and can no longer be changed.

Object-oriented storage is optimized for storing data that is "write-once read-many delete-eventually". Snapshots fully meet these assumptions fully.

In the case of Amazon S3, reading data may have comparable speed with reading data from disks that are also internally connected via the network as well.

Have you considered implementing ZFS snapshot mounting directly from S3 to enable backup data preview? If necessary, we can consider a local cache for some blocks. Similiar approach use TrailDB ( http://tech.nextroll.com/blog/data/2016/11/29/traildb-mmap-s3.html ) and MezzFS ( https://medium.com/netflix-techblog/mezzfs-mounting-object-storage-in-netflixs-media-processing-platform-cda01c446ba ) with very interesting result.

It is worth noting that volume snapshots in AWS EBS are mounted on-line and the necessary blocks are downloaded locally only at the time of access ( https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html ). Implementing the ability to read backups would open up similar possibilities.

I notice that it will not be possible to read blocks with the compression used. However, I think that the potentially unlimited capacity of snapshots and free access to them offset this limitation, especially since ZFS can support compression and block-level encryption on its own.

@ad-m
Copy link
Author

ad-m commented Nov 11, 2020

Interested in this issue, I experimented with creating a ZFS filesystem based on a block device provided by s3backer. This allows me to store snapshots on S3, while also allowing me to consolidate the snapshots that are stored in s3 and delete them in any order. s3backer supports removing blocks where TRIM has been done to free up space in s3 and supports thin-provisiong of space (you can create a 1TB block device, and used space from s3 perspective is space used in zfs).

I see a risk in terms of the consistency of the data saved on the disk, which results from the very essence of s3backer. I notice that there is a local cache to reduce read-after-write consistency problems, but the backup system has clearly separated read and write sequences:

  • writing are made when a new snapshot is stored,
  • readings are taken as the snapshot is received back (usually several hours or weeks after writing).

I can imagine introducing a limit on how often different operations can be performed to ensure data security. In many cases, the restore operation is latency tolerant (may start 5-10 minutes later), especially when balancing its effectiveness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant