Skip to content

ivangsm/jay

Repository files navigation

Jay

CI codecov Go Report Card Go Version Release License: MIT

S3-compatible object storage server with a native binary protocol, written in Go.

Jay provides dual API access: a fully S3-compatible HTTP API and a high-performance native binary protocol for Go clients. It uses bbolt for metadata, atomic file writes with SHA-256 checksums, and includes background integrity scrubbing, garbage collection, and automated backups.

Features

  • S3-compatible HTTP API -- works with AWS CLI, SDKs, and any S3 client
  • Native binary protocol -- efficient Go client with connection pooling
  • Multipart uploads -- S3-compatible chunked uploads up to 10,000 parts
  • Presigned URLs -- time-limited delegated access via HMAC-SHA256
  • Range requests -- partial object reads (bytes=0-499, suffix, open-ended)
  • CopyObject -- server-side copy between buckets
  • Bucket policies -- prefix-based allow/deny rules with IP conditions
  • Token authentication -- scoped by actions, buckets, and key prefixes
  • AWS SigV4 -- simplified mode for AWS CLI compatibility
  • Integrity scrubbing -- periodic SHA-256 verification (10% sample/6h)
  • Quarantine -- automatic isolation of corrupted objects
  • Rate limiting -- per-token token bucket with configurable rate/burst
  • TLS -- optional HTTPS for S3 and admin APIs
  • Health checks -- liveness and readiness probes
  • Metrics -- JSON endpoint with operation counters and byte totals
  • Backups -- hourly metadata snapshots with automatic pruning

Quick Start

Docker Compose

export JAY_ADMIN_TOKEN=my-secret-admin-token
export JAY_SIGNING_SECRET=my-signing-secret
docker compose up -d

Build from Source

go build -o jay .
JAY_ADMIN_TOKEN=my-secret-admin-token ./jay

Jay listens on three ports:

  • :9000 -- S3-compatible API
  • :9001 -- Admin API + health checks
  • :4444 -- Native binary protocol

Configuration

Jay accepts configuration from environment variables, a YAML config file, or both. When both are provided, env vars win and every conflict is logged at WARN level.

Variable Default Description
JAY_CONFIG_FILE (optional) Path to a YAML config file. Can also be set via --config-file flag (flag takes precedence)
JAY_DATA_DIR ./data Data directory for objects and metadata
JAY_LISTEN_ADDR :9000 S3 API listen address
JAY_ADMIN_ADDR :9001 Admin API listen address
JAY_NATIVE_ADDR :4444 Native protocol listen address
JAY_ADMIN_TOKEN (required) Bearer token for admin API (≥ 32 chars)
JAY_SIGNING_SECRET (required) AES-GCM key for presigned URLs and token secrets (≥ 32 chars)
JAY_LOG_LEVEL info Log level: debug, info, warn, error
JAY_TLS_CERT (optional) Path to TLS certificate file
JAY_TLS_KEY (optional) Path to TLS private key file
JAY_RATE_LIMIT 100 Requests/sec per token (0 = disabled)
JAY_RATE_BURST 200 Rate limit burst size
JAY_TRUST_PROXY_HEADERS false Trust X-Forwarded-For / X-Real-IP headers
JAY_SCRUB_INTERVAL_HOURS 6 Scrubber interval
JAY_SCRUB_SAMPLE_RATE 0.1 Fraction of objects verified per pass, in (0, 1]
JAY_SCRUB_BYTES_PER_SEC 52428800 Scrubber read throttle (0 = unlimited)
JAY_SCRUB_MAX_PER_RUN 100 Max objects visited per bucket per pass

YAML Configuration File

Point jay at a YAML file via --config-file path/to/config.yml or JAY_CONFIG_FILE=path/to/config.yml:

# config.yml
data_dir: ./data
listen_addr: ":9000"
admin_addr: ":9001"
native_addr: ":4444"

# Secrets can reference env vars via ${VAR} interpolation.
# This lets you commit config.yml to git while keeping secrets in .env.
admin_token: ${JAY_ADMIN_TOKEN}
signing_secret: ${JAY_SIGNING_SECRET}

log_level: info
rate_limit: 100
rate_burst: 200
trust_proxy_headers: false

# Optional ${VAR:-default} syntax provides a fallback.
tls_cert: ${JAY_TLS_CERT:-}
tls_key: ${JAY_TLS_KEY:-}

scrub:
  interval_hours: 6
  sample_rate: 0.1
  bytes_per_sec: 52428800
  max_per_run: 100

seed_token:
  account: ${JAY_SEED_TOKEN_ACCOUNT:-}
  id: ${JAY_SEED_TOKEN_ID:-}
  secret: ${JAY_SEED_TOKEN_SECRET:-}

Rules:

  • Precedence: env var > YAML > hardcoded default. A conflict (both set to different values) logs WARN at startup but doesn't fail.
  • Interpolation: ${VAR} and ${VAR:-default} are resolved against os.Getenv on string values only. If neither is set, the value ends up empty (which then triggers the normal secret-length fail-fast if it's admin_token or signing_secret).
  • Mixing sources: perfectly fine to put non-sensitive config in YAML and keep secrets in env vars — interpolation is the bridge.

jay-config CLI

Convert between YAML and .env or validate a YAML config:

go build -o jay-config ./cmd/jay-config

# YAML → .env (writes to stdout if --output omitted)
jay-config yaml-to-env --input config.yml --output .env

# .env → YAML
jay-config env-to-yaml --input .env --output config.yml

# Validate YAML config (checks required secrets, seed-token atomicity, value ranges)
jay-config validate --input config.yml

${VAR} interpolation is preserved literally during conversion — the tool never resolves env vars, only moves keys between formats.

Seed Token

Jay can create an account and token at startup from environment variables, so a fresh deployment doesn't require a manual admin API call before clients can authenticate.

Variables

Variable Purpose
JAY_SEED_TOKEN_ACCOUNT Name of the account to create (e.g. falco)
JAY_SEED_TOKEN_ID Deterministic token ID used by the client (e.g. falco-native)
JAY_SEED_TOKEN_SECRET Plaintext secret; bcrypt-hashed before storage

Rules

  • All three set → Jay creates the account (idempotent by name) and the token (idempotent by ID) with wildcard actions ("*"). The same creds can be used against either the S3 or the native protocol.
  • All three empty → seed is skipped. Use this mode if you prefer to bootstrap tokens via the admin API.
  • One or two set → Jay refuses to start with an error. This prevents partially-configured deployments.

Idempotence and rotation

On every restart, Jay:

  • Looks up the account by name; if it exists, reuses its ID.
  • Looks up the token by ID. If the stored bcrypt hash matches JAY_SEED_TOKEN_SECRET, Jay logs seed: token already present, reusing and moves on.
  • If the hash does NOT match, Jay logs a WARN (seed: token exists but secret does not match env; refusing to overwrite) and keeps the old secret. Jay never silently overwrites a token.

To rotate the seed secret:

  1. Either change JAY_SEED_TOKEN_ID to a new value (the old token stays active; revoke it manually), or
  2. Use the admin API to revoke the old token (DELETE /_jay/tokens/{id}) first, then set the new JAY_SEED_TOKEN_SECRET and restart.

Example

export JAY_SEED_TOKEN_ACCOUNT=myapp
export JAY_SEED_TOKEN_ID=myapp-primary
export JAY_SEED_TOKEN_SECRET=$(openssl rand -base64 32)
./jay

On first boot you'll see:

seed: account created name=myapp account_id=...
seed: token created token_id=myapp-primary

On second boot:

seed: account exists, reusing
seed: token already present, reusing

Authentication

Create an Account and Token

# Create account
curl -X POST http://localhost:9001/_jay/accounts \
  -H "Authorization: Bearer $JAY_ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "myapp"}'
# Returns: {"account_id": "...", "name": "myapp", ...}

# Create token
curl -X POST http://localhost:9001/_jay/tokens \
  -H "Authorization: Bearer $JAY_ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"account_id": "ACCOUNT_ID", "name": "deploy-token"}'
# Returns: {"token_id": "...", "secret": "..."}

Using Tokens

Bearer token:

Authorization: Bearer <token_id>:<secret>

AWS SigV4 (simplified): Use token_id as the access key and any value as the secret key. Jay validates the token exists but does not verify the HMAC signature.

Token Scoping

Tokens can be scoped to specific actions, buckets, and key prefixes:

{
  "account_id": "...",
  "name": "readonly",
  "allowed_actions": ["object:get", "object:list"],
  "bucket_scope": ["public-assets"],
  "prefix_scope": ["images/"]
}

Available actions: bucket:list, bucket:read-meta, bucket:write-meta, object:get, object:put, object:delete, object:list, multipart:create, multipart:upload-part, multipart:complete, multipart:abort.

S3 API

Supported Operations

Operation Method Path
ListBuckets GET /
CreateBucket PUT /<bucket>
HeadBucket HEAD /<bucket>
DeleteBucket DELETE /<bucket>
ListObjectsV2 GET /<bucket>?list-type=2
PutObject PUT /<bucket>/<key>
GetObject GET /<bucket>/<key>
HeadObject HEAD /<bucket>/<key>
DeleteObject DELETE /<bucket>/<key>
CopyObject PUT /<bucket>/<key> x-amz-copy-source header
CreateMultipartUpload POST /<bucket>/<key>?uploads
UploadPart PUT /<bucket>/<key>?uploadId=X&partNumber=N
CompleteMultipartUpload POST /<bucket>/<key>?uploadId=X
AbortMultipartUpload DELETE /<bucket>/<key>?uploadId=X
ListParts GET /<bucket>/<key>?uploadId=X

AWS CLI Usage

# Configure AWS CLI
aws configure set aws_access_key_id <token_id>
aws configure set aws_secret_access_key <any-value>
aws configure set default.region us-east-1

# Basic operations
aws --endpoint-url http://localhost:9000 s3 mb s3://mybucket
aws --endpoint-url http://localhost:9000 s3 cp file.txt s3://mybucket/
aws --endpoint-url http://localhost:9000 s3 ls s3://mybucket/
aws --endpoint-url http://localhost:9000 s3 cp s3://mybucket/file.txt ./downloaded.txt
aws --endpoint-url http://localhost:9000 s3 sync ./local-dir s3://mybucket/prefix/

Native Protocol

Jay's native binary protocol uses a compact frame format for high-throughput scenarios.

Go Client

import "github.com/ivangsm/jay/proto/client"

// Connect
c, err := client.Dial("localhost:4444", tokenID, secret, 4)
if err != nil {
    log.Fatal(err)
}
defer c.Close()

// Create bucket
_, err = c.CreateBucket("mybucket")

// Upload object
result, err := c.PutObject("mybucket", "hello.txt",
    strings.NewReader("hello world"), 11, nil)

// Download object
obj, err := c.GetObject("mybucket", "hello.txt")
data, _ := io.ReadAll(obj.Body)
obj.Body.Close()

// Multipart upload
uploadID, _ := c.CreateMultipartUpload("mybucket", "large.bin", nil)
etag1, _ := c.UploadPart("mybucket", "large.bin", uploadID, 1, part1Reader, part1Size)
etag2, _ := c.UploadPart("mybucket", "large.bin", uploadID, 2, part2Reader, part2Size)
c.CompleteMultipartUpload("mybucket", "large.bin", uploadID, []client.CompletePart{
    {PartNumber: 1, ETag: etag1},
    {PartNumber: 2, ETag: etag2},
})

// List objects
list, _ := c.ListObjects("mybucket", &client.ListOptions{Prefix: "photos/"})

Admin API

All endpoints require Authorization: Bearer <JAY_ADMIN_TOKEN>.

Endpoint Method Description
/_jay/accounts POST Create account
/_jay/tokens POST Create token
/_jay/tokens GET List tokens
/_jay/tokens/{id} DELETE Revoke token
/_jay/metrics GET Server metrics
/_jay/presign POST Generate presigned URL
/_jay/quarantine GET List quarantined objects
/_jay/quarantine/revalidate POST Revalidate quarantined object
/_jay/quarantine DELETE Purge quarantined objects

CLI Admin Tool

go build -o jay-admin ./cmd/jay-admin

export JAY_ADMIN_TOKEN=my-secret-admin-token

jay-admin create-account -name myapp
jay-admin create-token -account ACCOUNT_ID -name deploy
jay-admin list-tokens
jay-admin revoke-token -id TOKEN_ID
jay-admin metrics
jay-admin presign -bucket mybucket -key file.txt -token-id TOKEN_ID
jay-admin quarantine-list
jay-admin quarantine-purge

Presigned URLs

Generate time-limited URLs via the admin API:

curl -X POST http://localhost:9001/_jay/presign \
  -H "Authorization: Bearer $JAY_ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "token_id": "TOKEN_ID",
    "method": "GET",
    "bucket": "mybucket",
    "key": "secret-file.txt",
    "expires_seconds": 3600
  }'

The returned URL contains X-Jay-Token, X-Jay-Expires, and X-Jay-Signature query parameters and can be used without any authorization header.

Bucket Policies

Set JSON policies on buckets to control access by token, prefix, and IP:

{
  "version": "2024-01-01",
  "statements": [
    {
      "effect": "allow",
      "actions": ["object:get", "object:list"],
      "prefixes": ["public/"],
      "subjects": ["*"],
      "conditions": {
        "ip_whitelist": ["10.0.0.0/8"]
      }
    },
    {
      "effect": "deny",
      "actions": ["*"],
      "prefixes": ["secret/"],
      "subjects": ["*"]
    }
  ]
}

Deny statements always take precedence over allow.

Monitoring

Health checks (on admin port, no auth required):

  • GET /health/live -- liveness probe (always 200)
  • GET /health/ready -- readiness probe (200 after startup recovery)

Metrics:

curl http://localhost:9001/_jay/metrics \
  -H "Authorization: Bearer $JAY_ADMIN_TOKEN"

Returns JSON with counters for PutObject, GetObject, DeleteObject, HeadObject, ListObjects, CreateBucket, DeleteBucket, AuthFailures, ChecksumFailures, BytesUploaded, BytesDownloaded, ObjectsQuarantined, and UptimeSeconds.

Architecture

  • Metadata: bbolt embedded key-value store (single-file, ACID)
  • Object storage: Atomic writes (temp file, fsync, rename, fsync dir) with 2-level sharded directory layout
  • Checksums: SHA-256 computed on every write, verified probabilistically on reads (5% sample)
  • Scrubber: Background goroutine checks 10% of objects every 6 hours
  • GC: Cleans temp files and empty dirs every 15 minutes
  • Backup: Hourly metadata snapshots, keeps 24, prunes after 7 days
  • Recovery: On startup, reconciles metadata and physical files, quarantines inconsistencies

About

S3-compatible object storage server in Go with a high-performance native binary protocol.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages