Skip to content
Software defined distributed storage array with custom replication policies and strong emphasis on integrity and encryption
Branch: master
Clone or download
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
bin Use code generation for varastofuseclient Apr 16, 2019
cmd/varasto varastothumb: initial implementation for thumbnailing pictures Apr 10, 2019
docs
frontend Integrity verification support Apr 15, 2019
pkg
public UI: file type icons Apr 11, 2019
.travis.yml upgrade buildkit & Turbo Bob Feb 21, 2019
Gopkg.lock Use code generation for varastofuseclient Apr 16, 2019
Gopkg.toml Import initial progress Dec 2, 2018
LICENSE Initial import Nov 28, 2018
README.md Add drawing about architecture Apr 1, 2019
turbobob.json fix buildkit-js (tsconfig produced non-working webpack bundle) Apr 12, 2019

README.md

Build Status Download

Software defined distributed storage array with custom replication policies and strong emphasis on integrity and encryption.

See screenshots to get a better picture.

Status: currently under heavy development. Works so robustly (blobs currently cannot be deleted so if metadata DB is properly backed up, you can't lose data) that I'm already moving all my files in, but I wouldn't yet recommend this for anybody else.

Architecture

Ideas / goals

  • "RAID is not backup", so you would need backup on top of RAID anyway. But what if we designed for backup first and used the redundant backup storage as the primary source of truth?
  • Varasto works like GitHub, with your different directories being like GitHub repos, (we call them collections) but with Varasto making automatic commits (= backup interval) against them. If you accidentally delete a file, you will find it from a previous collection revision. You can "clone" collections you want to work on, to your computer, and when you stop working on them you can tell Varasto to delete the local copy and Varasto client will ensure that the Varasto server has the latest state before removing. This way your end devices can remain almost-stateless. Store only the things you are working on currently!
  • You don't need to clone collections if all you want to do is view files (such as look at photo albums, listen to music or watch movies) - Varasto server supports streaming too.
  • Works on Linux and Windows (mostly due to Go's awesomeness)
  • Integrity is the most important thing. Hashes are verified on writing to disk and on reading from disk.
  • Unified view of all of your data - never again have to remember which disk a particular thing was stored on! Got 200 terabytes of data spread across tens of disks? No problem!
  • Decoupling metadata from file content. You can move/rename files and folders and modify their metadata "offline", i.e. without touching the disk the actual file content is hosted on.
  • Configurable encryption. Each collection could have a separate encryption key, which itself is asymmetrically encrypted by your personal key which never leaves your hardware security module. This way if a hacker MITM's or otherwise learns of a collection-specific decryption key, she can't access your other collections. Particularly sensistive collections could have such an encryption key even on a file-by-file basis.
  • Related to previous point, we should investigate doing as much as possible in the client or the browser, so perhaps the decryption keys don't even have to be known by the server.
  • Configurable replication policies per collection. Your family photo albums could be spread on 2 local disks and 1 AWS S3 bucket, while a movie you ripped from a Blu-ray could be only on one disk because in the event of a disk crash, it could be easily recreated.
  • Accesses your files by using platform-specific snapshotting (LVM on Linux, shadow copies on Windows)
  • Kind of like Git or Mercurial but for all of your data, and meant to store all of your data in collections (modeled as directories). Version control-like semantics for collection history, but "commits" are scheduled instead of explicit. This is meant to back up all your data and backups are useless unless they are automated.
  • By not operating on (lower) block device level we don't need the complexity of RAID or specialized filesystems like ZFS etc. We can use commodity hardware and any operating system to reach the desired goals of integrity and availability. If your hard drive ever crashes, would you like to try the recovery with striped RAID / parity bits on a specialized filesystem, or just a regular NTFS or EXT4?

Inspired by & alternative software

You can’t perform that action at this time.