a full featured file system for online data storage
Clone or download
Pull request Compare This branch is 9 commits ahead, 158 commits behind s3ql:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
contrib
rst
src/s3ql
tests
util
.dir-locals.el
.gitignore
.hgignore
.hgtags
Changes.txt
LICENSE
README.HG
README.rst
checkpatch.py
setup.cfg
setup.py

README.rst

IOSTACK Changes

IOStack EU project has modified S3QL to allow dynamic filters at the client side.

S3QL expects a /tmp/filters.ini file with the redis path and port and the key for the configuration file

[General] Param : 127.0.0.1 6379 client_filter_configuration

You can find an example of the file in the src/s3ql/filters.ini

The expected redis content is the next:

Configuration

In the client_filter_configuration something like:

"Filterqsub:127.0.0.1 6379 client_filter_qsub_extension client_filter_qsub_process;Filtertag:127.0.0.1 6379 client_filter_tag_list;Filter1:127.0.0.1 6379 client_filter_compression;Filter2:22;ORDER:Filtertag,Filter2 /tmp/"

The format is "NameOfTheFilter:parameters" Each Filter is separated by a ; There is a special key "ORDER" that configures the stack and the path where the filters will be downloaded "ORDER: Filter1,Filter2,Filter3 <path>

Finally we have for each filter the code in redis. You can find sample filters in the src/s3ql/ path. In order to upload to redis do this: redis-cli -x SET client_filter_Filtertag_code < src/s3ql/Filtertag.py

Individual Filters

Some filters need configuration stored in redis:

  • filterqsub

client_filter_qsub_extension "txt c3d" client_filter_qsub_process "/tmp/process.sh"

  • filtertag

client_filter_tag_list "file1.txt#SECRET file2#MORESECRET" Tags will be stored as Meta-Tag<index> in the metadata of each object (we look for partial matches, so a file can have multiple tags)

  • Filter1 or GZIP filter

client_filter_compression "on" or "off" to enable or disable compression.

Controlling native compression

  • Using the redis key client_native_compression with values : zlib, lzma, bzip2 or none. The compression can be switched on-off by a file basis. If the key is not set, the compression uses the default parameters (lzma)

About S3QL

S3QL is a file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack. S3QL effectively provides a hard disk of dynamic, infinite capacity that can be accessed from any computer with internet access.

S3QL is a standard conforming, full featured UNIX file system that is conceptually indistinguishable from any local file system. Furthermore, S3QL has additional features like compression, encryption, data de-duplication, immutable trees and snapshotting which make it especially suitable for online backup and archival.

S3QL is designed to favor simplicity and elegance over performance and feature-creep. Care has been taken to make the source code as readable and serviceable as possible. Solid error detection and error handling have been included from the very first line, and S3QL comes with extensive automated test cases for all its components.

Features

  • Transparency. Conceptually, S3QL is indistinguishable from a local file system. For example, it supports hardlinks, symlinks, standard unix permissions, extended attributes and file sizes up to 2 TB.

  • Dynamic Size. The size of an S3QL file system grows and shrinks dynamically as required.

  • Compression. Before storage, all data may compressed with the LZMA, bzip2 or deflate (gzip) algorithm.

  • Encryption. After compression (but before upload), all data can be AES encrypted with a 256 bit key. An additional SHA256 HMAC checksum is used to protect the data against manipulation.

  • Data De-duplication. If several files have identical contents, the redundant data will be stored only once. This works across all files stored in the file system, and also if only some parts of the files are identical while other parts differ.

  • Immutable Trees. Directory trees can be made immutable, so that their contents can no longer be changed in any way whatsoever. This can be used to ensure that backups can not be modified after they have been made.

  • Copy-on-Write/Snapshotting. S3QL can replicate entire directory trees without using any additional storage space. Only if one of the copies is modified, the part of the data that has been modified will take up additional storage space. This can be used to create intelligent snapshots that preserve the state of a directory at different points in time using a minimum amount of space.

  • High Performance independent of network latency. All operations that do not write or read file contents (like creating directories or moving, renaming, and changing permissions of files and directories) are very fast because they are carried out without any network transactions.

    S3QL achieves this by saving the entire file and directory structure in a database. This database is locally cached and the remote copy updated asynchronously.

  • Support for low bandwidth connections. S3QL splits file contents into smaller blocks and caches blocks locally. This minimizes both the number of network transactions required for reading and writing data, and the amount of data that has to be transferred when only parts of a file are read or written.

Development Status

S3QL is considered stable and suitable for production use. Starting with version 2.17.1, S3QL uses semantic versioning. This means that backwards-incompatible versions (e.g., versions that require an upgrade of the file system revision) will be reflected in an increase of the major version number.

Supported Platforms

S3QL is developed and tested under Linux. Users have also reported running S3QL successfully on OS-X, FreeBSD and NetBSD. We try to maintain compatibility with these systems, but (due to lack of pre-release testers) we cannot guarantee that every release will run on all non-Linux systems. Please report any bugs you find, and we will try to fix them.

Typical Usage

Before a file system can be mounted, the backend which will hold the data has to be initialized. This is done with the mkfs.s3ql command. Here we are using the Amazon S3 backend, and nikratio-s3ql-bucket is the S3 bucket in which the file system will be stored.

mkfs.s3ql s3://nikratio-s3ql-bucket

To mount the S3QL file system stored in the S3 bucket nikratio_s3ql_bucket in the directory /mnt/s3ql, enter:

mount.s3ql s3://nikratio-s3ql-bucket /mnt/s3ql

Now you can instruct your favorite backup program to run a backup into the directory /mnt/s3ql and the data will be stored an Amazon S3. When you are done, the file system has to be unmounted with

umount.s3ql /mnt/s3ql

Need Help?

The following resources are available:

Please report any bugs you may encounter in the Bitbucket Issue Tracker.

Contributing

The S3QL source code is available both on GitHub and BitBucket.