HowGreyholeWorks

gboudreau edited this page Jan 3, 2012 · 7 revisions

How Greyhole Works

Basics

  • The user defines it's storage pool by listing all the paths of the different disks he wants to use.
  • When files are added on Greyhole shares' Landing Zones, the Greyhole daemon moves those files into 1+ of the paths defined as the storage pool. It then replaces the original file (on the share) by a symbolic link pointing to one of the copy created in the pool.
  • Samba is configured to use opaque symbolic links, i.e. what are in fact symbolic links in the shares appear as normal files to clients.

In Details

  • Samba logs all writes, renames & deletes happening in Greyhole shares, using the Greyhole VFS module. /var/spool/greyhole is where the logged operations are kept: the VFS module creates a new file in that folder for each logged operation.
  • The Greyhole daemon consumes those logs, and inserts tasks in a custom database (MySQL).
  • The Greyhole daemon acts on each tasks in sequential order:
    • For writes, it checks the (user-specified) number of copies to keep, and will create as many copies of the file on the different disk included in the storage pool. Once it created the first copy, it will replace the original file by a symbolic link pointing to the new copy created.
    • For renames & deletes, it will replicate the operation on all directories part of the storage pool.

Example

What's better than a good example to understand a system!

  • /mnt/hdd0: 1TB empty hard-drive
  • /mnt/hdd1: 2TB empty hard-drive

smb.conf:

[Backups]
    path = /shares/Backups
    dfree command = /usr/bin/greyhole-dfree
    vfs objects = greyhole

[RecordedTV]
    path = /shares/RecordedTV
    dfree command = /usr/bin/greyhole-dfree
    vfs objects = greyhole

greyhole.conf:

# Storage pool directories
storage_pool_directory = /mnt/hdd0/gh, min_free: 50gb
storage_pool_directory = /mnt/hdd1/gh, min_free: 10gb

# Shares
num_copies[Backups] = 2
num_copies[RecordedTV] = 1

Example tasks:

When a remote computer (client) create a file named file1 on the Backups share, for example:

mount_smb //server/Backups /mnt/Backups
echo 1 > /mnt/Backups/file1

Then the server running Greyhole will do this:

  1. (Samba) writes the file into /shares/Backups/file1 (as defined in smb.conf)
  2. (Samba VFS module) writes a log file in /var/spool/greyhole/; the file name will be a serie of numbers
  3. (Greyhole daemon) reads the /var/spool/greyhole/ directories, and import each file it finds into it's MySQL database (tasks table); it will then remove the files it has processed from /var/spool/greyhole/
  4. (Greyhole daemon) read the tasks MySQL table, and executes each task in order. For the above Backups/file1 example, that would mean:
cp /shares/Backups/file1 /mnt/hdd1/gh/Backups/file1
rm /shares/Backups/file1
ln -s /mnt/hdd1/gh/Backups/file1 /shares/Backups/file1
cp /mnt/hdd1/gh/Backups/file1 /mnt/hdd0/gh/Backups/file1
  1. (Greyhole daemon) writes metadata in /mnt/hdd[0-1]/gh/.gh_metastore/Backups/file1

Here's more examples of how the Greyhole daemon would execute rename or delete tasks:

Client: mv /mnt/Backups/file1 /mnt/Backups/file2

Server:

mv /mnt/hdd1/gh/Backups/file1 /mnt/hdd1/gh/Backups/file2
mv /mnt/hdd0/gh/Backups/file1 /mnt/hdd0/gh/Backups/file2
# Update metadata in /mnt/hdd[0-1]/gh/.gh_metastore/Backups/file2

Client: rm /mnt/Backups/file2

Server:

rm /mnt/hdd1/gh/Backups/file2
rm /mnt/hdd0/gh/Backups/file2
rm /mnt/hdd[0-1]/gh/.gh_metastore/Backups/file2