Skip to content

ekkolon/minikv

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

minikv

A distributed key-value store written in Rust.

minikv stores keys and metadata in LevelDB and object bytes on nginx WebDAV volume servers. Each object is replicated across a configurable number of volumes. The server handles routing, replication, and metadata; nginx handles the data.

Architecture

Client
  |
  v
frontend nginx (port 8080)        <- X-Accel-Redirect proxy
  |  proxy_pass ->
  v
minikv server (port 3000)         <- metadata, routing, replication
  |  replicates to ->
  +-- volume1 nginx (port 8080)   <- nginx DAV object storage
  +-- volume2 nginx (port 8080)
  +-- volume3 nginx (port 8080)

GET/HEAD flow: The server looks up the key in LevelDB, probes volume servers to find a live replica, then returns X-Accel-Redirect to the frontend nginx. nginx fetches the object body directly from the volume server and streams it to the client. The server is not in the data path for reads. Response headers (Content-Type, Content-Blake3, Key-Balance) come from stored metadata.

PUT flow: The server writes a soft-delete sentinel to LevelDB, replicates the object body to all replica volumes, optionally computes a BLAKE3 checksum, then marks the key as fully present.

Hashing

BLAKE3 is used for all content-addressing (key_to_path) and volume selection (key_to_volume).

The hash function determines the physical layout of all stored data. Changing it after data is written is a breaking change and requires a full rebalance.

Record Wire Format

Each LevelDB value encodes object metadata as a compact byte string:

[DELETED][HASH<64hex>][TYPE<mimetype>|]<vol1>,<vol2>,...
  • DELETED - present if soft-deleted (UNLINK has been called)
  • HASH<64hex> - BLAKE3-256 hex digest, present when --checksum is enabled
  • TYPE<mimetype>| - MIME type terminated by |, present when Content-Type was supplied on PUT
  • Remaining bytes - comma-separated volume addresses (host:port/svXX)

HTTP API

Method Path Description
PUT /<key> Store an object. Supply Content-Type for correct MIME metadata.
GET /<key> Retrieve an object (via X-Accel-Redirect or 302).
HEAD /<key> Returns metadata headers without body.
DELETE /<key> Hard delete. Requires prior UNLINK when --protect is set.
UNLINK /<key> Soft delete.
REBALANCE /<key> Move a single key to its ideal volume set.
GET /<prefix>?list List active keys under prefix. Accepts &start=X and &limit=N.
GET /<prefix>?unlinked List soft-deleted keys under prefix. Accepts &start=X and &limit=N.
GET /<prefix>?list-type=2&prefix=X S3-style XML key listing.
POST /<key>?uploads Initiate multipart upload. Returns an upload ID.
POST /<key>?uploadId=X Complete multipart upload.
POST /<key>?delete Batch delete (XML body).

Response Headers

Header Present on Description
Content-Type GET, HEAD MIME type from stored metadata
Content-Blake3 GET, HEAD BLAKE3-256 hex digest of object body
Key-Balance GET, HEAD balanced or unbalanced
Key-Volumes GET, HEAD Comma-separated list of volume addresses holding replicas

Running with Docker Compose

docker compose up --build

Services start in dependency order: volume nodes -> server -> frontend nginx.

The only externally exposed port is 8080 (frontend nginx). Volume nodes and the server are internal to the Docker network.

PUT an object

curl -X PUT -H "Content-Type: image/png" \
     --data-binary @photo \
     http://localhost:8080/mybucket/photo

It is recommended to supply a Content-Type header on PUT. It is stored in LevelDB and returned on subsequent GET/HEAD requests, allowing clients and browsers to handle the response correctly without guessing the format. Objects stored without one will be served as application/octet-stream.

GET an object

curl http://localhost:8080/mybucket/photo -o photo

Inspect metadata

curl -I http://localhost:8080/mybucket/photo

Soft delete then hard delete

curl -X UNLINK http://localhost:8080/mybucket/photo
curl -X DELETE http://localhost:8080/mybucket/photo

CLI Reference

minikv <COMMAND>

Commands:
  server     Run the HTTP server
  rebuild    Reconstruct LevelDB from volume server autoindex
  rebalance  Move all keys to their ideal volume set

server

--db <PATH>                    LevelDB directory            [env: MINIKV_DB]
--volumes <host:port,...>      Volume server addresses      [env: MINIKV_VOLUMES]
--replicas <N>                 Replica count (default: 3)   [env: MINIKV_REPLICAS]
--subvolumes <N>               Shard count (default: 10)    [env: MINIKV_SUBVOLUMES]
--voltimeout <duration>        Volume probe timeout         [env: MINIKV_VOLTIMEOUT]
--port <N>                     Listen port (default: 3000)  [env: MINIKV_PORT]
--public-volumes <host:port,>  External volume addresses    [env: MINIKV_PUBLIC_VOLUMES]
--fallback <host:port>         Fallback for missing keys    [env: MINIKV_FALLBACK]
--protect                      Require UNLINK before DELETE [env: MINIKV_PROTECT]
--checksum                     Store BLAKE3 digest on PUT   [env: MINIKV_CHECKSUM]
--accel-redirect               Use X-Accel-Redirect mode    [env: MINIKV_ACCEL_REDIRECT]
-v, --verbose                  Structured debug logging     [env: MINIKV_VERBOSE]

All flags can be set via environment variables. Duration values accept 1s, 500ms.

rebuild

Reconstructs LevelDB by scanning nginx autoindex JSON listings on all volume servers. Destructive: clears the existing DB before scanning. Use when LevelDB is lost but volume data is intact.

Content-Type metadata cannot be recovered during rebuild. It exists only in LevelDB, never on volume servers. Affected objects will be served as application/octet-stream until re-PUT with a Content-Type header.

rebalance

Moves all keys to their ideal volume set as computed by the current --volumes list. Run after adding or removing volume servers.

Consistency Model

  • PUT is atomic at the record level. The key is marked soft-deleted (in-progress sentinel) before any volume write, and marked fully present only after all replicas succeed. A crash mid-write leaves a soft-deleted key that can be cleaned up manually.
  • No read-after-write guarantee across replicas. GET probes volumes in random order and returns the first live replica.
  • Rebalance clears the stored hash for the moved object. The body is not re-verified during rebalance.
  • Soft delete (UNLINK) removes the key from client visibility immediately. The object bytes remain on volume servers until a hard DELETE is issued.

Content-Type and X-Accel-Redirect

When --accel-redirect is enabled the server returns X-Accel-Redirect instead of 302. The frontend nginx intercepts this, fetches the object body from the volume server internally, and sends it to the client. Because the body comes from nginx's internal subrequest rather than the server response, headers are carried across via nginx variable persistence:

  1. The server sets X-Content-Type: image/png on its response.
  2. nginx captures this as $upstream_http_x_content_type, a variable that persists across the internal redirect.
  3. The /accel/ location suppresses the volume's Content-Type and replaces it with $upstream_http_x_content_type.

In plain 302 mode, the server sets Content-Type on the redirect response and the client receives it on HEAD. The GET redirect goes to the volume server which returns application/octet-stream regardless of stored metadata. This is a known limitation of redirect mode.

License

GNU General Public License v2 (GPLv2). See LICENSE.

About

A distributed key-value store written in Rust, with LevelDB-backed metadata and object storage replicated across nginx WebDAV volumes

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Contributors