Hazelcast-inspired distributed data grid playground in Rust.
Library-first project with a runnable example: book_search_engine.
The goal is to experiment with cluster membership, partitioned storage, replication, distributed task execution, and full-text search.
Experimental / for fun. APIs and internals may change quickly.
I wanted to build a small distributed systems playground in Rust:
- SWIM-like gossip membership over UDP
- partitioned in-memory maps with replication
- distributed task queue/executor
- simple book ingestion + search demo
If you're into distributed systems and Rust, this repo is for you.
- Cluster membership service (
membership) - Partition manager with replication factor (
storage) - Distributed key-value maps (
storage) - Distributed task queue + worker execution (
executor) - Gutenberg ingestion pipeline (
ingestion) - Token-based search with score ranking (
search) - Health and debug HTTP endpoints
Core library modules:
olive_tree::membershipolive_tree::storageolive_tree::executorolive_tree::ingestionolive_tree::search
Runnable example:
examples/book_search_engine.rs
Networking model:
--bind <ip:port>is UDP gossip address- HTTP API listens on
bind_port + 1000
Example:
- gossip
127.0.0.1:5000=> HTTP127.0.0.1:6000
- Rust toolchain (stable)
cargocurl- optional:
jq
Terminal 1:
cargo run --example book_search_engine -- --bind 127.0.0.1:5000Terminal 2:
cargo run --example book_search_engine -- --bind 127.0.0.1:5001 --seed 127.0.0.1:5000Terminal 3:
cargo run --example book_search_engine -- --bind 127.0.0.1:5002 --seed 127.0.0.1:5000- Check routes:
curl -s http://127.0.0.1:6000/health/routes | jq .- Ingest a Gutenberg book (example:
1342):
curl -s -X POST http://127.0.0.1:6000/ingest/1342 | jq .- Check ingestion status:
curl -s http://127.0.0.1:6000/ingest/status/1342 | jq .- Search from another node:
curl -s "http://127.0.0.1:6001/search?q=darcy&limit=5" | jq .- Inspect node stats:
curl -s http://127.0.0.1:6002/health/stats | jq .Note: indexing is async through the distributed queue, so search results can appear with a short delay.
GET /health/routesGET /health/statsPOST /ingest/:book_idGET /ingest/status/:book_idGET /search?q=<query>&limit=<n>&offset=<n>POST /books- Internal replication/debug endpoints under:
/books/*/datalake/*/index/*/task/*/internal/*
Environment variables:
REPLICATION_FACTOR(default:2)MAX_BODY_BYTES(default:20971520, 20 MB)
Example:
REPLICATION_FACTOR=3 MAX_BODY_BYTES=52428800 cargo run --example book_search_engine -- --bind 127.0.0.1:5000cargo fmt
cargo check
cargo test- Persistent storage backend
- Better consistency and recovery semantics
- Query improvements (ranking, phrase search, filters)
- Better observability/metrics
- Docker Compose local cluster
- CI + benchmarks
PRs, issues, and architecture discussions are welcome. If you want to contribute, open an issue first for larger changes.
MIT. See LICENSE.