Skip to content

leoustc/bucketfs

Repository files navigation

BucketFS

BucketFS is a transactional FUSE filesystem for object storage. It mounts one or more buckets from a config file. Each bucket exposes a read-only namespace backed by a local cache, while a configured workdir is prefetched and bind-mounted as the only writable area. Paths outside the workdir are represented by zero-size placeholders created on directory listing; the real object is downloaded on first open/read.

All modifications stay local until the running process receives a stop signal (for example, systemctl stop), which unmounts and uploads regular files from the workdir back to S3. This keeps local access fast, downloads only the data you need, and uploads only when you want it.

What BucketFS is (and is not)

What it is

  • Local POSIX filesystem backed by a cache directory
  • Lazy object download on first open/read (outside workdir)
  • Prefetched workdir with upload on shutdown
  • Deterministic overwrite semantics
  • Designed for workflows (Nextflow, HPC, batch jobs)

What it is not

  • A distributed POSIX filesystem (all POSIX ops are local to the cache)
  • A shared multi-writer filesystem
  • A live write-through S3 mount
  • A replacement for NFS

Architecture overview

Each bucket uses its own cache root (from cachepath + bucket name):

cache_root/
├── .<bucket>/            # Local object cache
├── .<bucket>.index       # Cache index
└── .<bucket>.pid         # Daemon PID

All POSIX operations operate on the local cache. S3 is only touched for lazy fetch (outside workdir) and explicit shutdown (upload).

Lifecycle

1) Start (mount)

sudo systemctl start bucketfs.service

The systemd unit reads /etc/bucketfs/bucketfs.conf and logs to /var/log/bucketfs.log.

What happens:

  • Local cache and mount path are prepared per bucket
  • Workdir (if configured) is prefetched per bucket
  • FUSE filesystem is mounted per bucket (one thread per bucket)
  • After mounts are ready, the workdir is bind-mounted from cache into the bucket namespace
  • Process runs until it receives a stop signal

2) Stop (unmount + upload)

sudo systemctl stop bucketfs.service

What happens:

  • SIGTERM triggers shutdown in the main process
  • Unmounts workdir bind mounts
  • Stops FUSE loops and unmounts FUSE mounts
  • Uploads regular files under the workdir cache back to S3 (skipped if no workdir)

No deletes are performed on S3. Non-regular files are skipped.

Configuration file

bucketfs.conf is an INI file with one section per bucket:

[nf-data]
bucket = nf-data
aws_access_key_id = AKIA...
aws_secret_access_key = ...
region = us-east-1
endpoint = https://s3.amazonaws.com
workdir = /nf-data/workdir
#lastuploadfile = .exitcode

[ngi-igenomes]
bucket = ngi-igenomes
aws_access_key_id = AKIA...
aws_secret_access_key = ...
region = us-east-1
endpoint = https://s3.amazonaws.com
#workdir = /ngi-igenomes/workdir

Notes:

  • If workdir is unset, the mount is read-only.
  • workdir must be inside /<bucket> or it will be ignored.
  • cachepath defaults to /tmp/bucketfs if unset.
  • lastuploadfile defaults to .exitcode if unset.

Runtime behavior

Reads:

  • Outside workdir: ls creates zero-size placeholders; open/read downloads the object to cache before serving
  • Inside workdir: served from the local cache (prefetched at mount)

Writes (local overlay):

  • Allowed only inside workdir
  • Written only to local cache (S3 untouched until shutdown)

Other POSIX operations:

  • mkdir, unlink, rmdir, rename, chmod, chown, utimens, statfs, readlink, symlink, link, truncate all operate on the local cache
  • open, write, link, and symlink will lazily download missing source objects before acting (outside workdir)

Overwrite and conflict rules

These rules are intentional and strict:

  • On read miss: S3 overwrites local placeholder
  • On shutdown: local overwrites S3 for regular files under workdir
  • No merge
  • No conflict resolution
  • No remote deletes

Startup constraints

  • workdir must include /<bucket> (ignored otherwise)
  • workdir cannot be /
  • Prefetch failures abort the mount (no FUSE start)

Install from the prebuilt .deb

This folder includes a prebuilt Debian package for amd64:

sudo dpkg -i bucketfs_amd64.deb
sudo apt-get -f install

Then create your config (start from conf.sample) and place it at:

/etc/bucketfs/bucketfs.conf

Enable and start the service:

sudo systemctl enable --now bucketfs.service

Example workflow

sudo systemctl start bucketfs.service

# Run pipeline
nextflow run pipeline.nf --input /nf-data/data

# Sync and unmount
sudo systemctl stop bucketfs.service

Security notes

  • Credentials are stored in bucketfs.conf (AWS-style fields)
  • FUSE daemon does not need credentials after start
  • Shutdown happens in the running process when it receives SIGTERM

Limitations (by design)

  • Not safe for concurrent writers across machines
  • Writes are limited to the configured workdir
  • No remote deletes (uploads only)

Summary

BucketFS gives you:

  • POSIX where it matters
  • Explicit uploads
  • Deterministic behavior
  • Zero hidden magic

License

Apache-2.0

Optional next additions

  • Diagrams
  • Nextflow-specific section
  • Eviction policy docs
  • Security hardening notes

About

POSIX Filesystem for S3 Storage Bucket

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors