wip distributed backup / file mirroring tool
JavaScript
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
example
lib
.gitignore
LICENSE
README.md
index.js
package.json

README.md

backer

wip distributed backup / file mirroring tool

Current state: Waiting for cyphernet as base building block.

Idea

Mirror your files between all your computers and servers, so when a harddrive breaks or a computer gets stolen you don't lose anything.

Like bittorrent sync, but open source and in node.js, so it runs on SmartOS too.

Like dropbox, but stores your data only on your machines. No third party - except possibly internet providers - involved.

Road map

  • ✘ merkle tree representation of sync folder (#7)
  • ✘ diff merkle trees to figure out changed files
  • ✘ simple but slow syncing
  • ✘ only send diffs for more performance
  • ✘ watch files and keep tree up to date
  • ✘ replicate tree changes
  • ✘ simple responsive web frontend
  • ✘ mac os menu bar application
  • ✘ installer
  • ✘ public facing website

Associated projects

Possible features

  • revisions
  • public sharing
  • relays (#6)

Data structure

The filesystem is represented as a merkle tree, so backer can efficiently figure out what files changed.

Replication

On connection

When two or more nodes start replicating (i.e. mirroring, syncing) they need to figure out how their files differ, so they can exchange only what they really have to.

The most efficient way to do this - as far as I know - is to create a merkle tree (also known as hash tree) representing all the files and directories in your sync folder.

Since the file system is a tree this is a perfect fit!

When replication is kicked off, both nodes exchange their top hash, the hash of the root node. If that differs, they start going down the tree (i.e. going deeper into directories) and exchange hashes until the know which files are different and which are the same.

Then the individual files are diffed and those partial changes are exchanged and applied.

On further changes

TODO

Encryption

Since backer is only used for transferring data and never stores it on any server/computer but the user's, only transfor needs to be encrypted.

Encrypting your own harddrive is out of scope and there's plenty of tools that do this, one popular being truecrypt.

  • Maybe public/private key cryptography?

JS api

For maximum composability the transport part of the api will just be a duplex stream that is to be piped to another backer's duplex stream. This way it works over tcp, websockets, in memory, or over any another network or streaming interface.

var a = backer(__dirname + '/a').createStream();
var b = backer(__dirname + '/b').createStream();
a.pipe(b).pipe(a);

Over tcp it would look like this:

// computer A
var backer = require('backer');

var back = backer(__dirname);
net.createServer(function(con) {
  con.pipe(back.createStream()).pipe(con);
}).listen(PORT);

// computer B
var backer = require('backer');
var reconnect = require('reconnect');

var back = backer(__dirname);
reconnect(function(con) {
  con.pipe(back.createStream()).pipe(con);
}).listen(PORT);

There will be events emitted on the backer instance, which can then for example be fed to a web frontend.

Resources

Possible dependencies

Collaborators

There's a lot still to be figured out, so if you either are a mad scientist that juggles chainsaws while writing distributed systems, or want to be come one, this is the right place for you!

Plus, there's going to be need for designs, blog posts, a public facing website, etc.!

License

MIT