Zero-config, maximum performance. A high-performance shared-nothing architecture library for Rust with automatic hardware detection, lock-free message passing, and unified I/O workers.
IMPORTANT: Project in research and design phase. Drafts only.
// That's it! Auto-detects everything.
let pool = WorkerPool::new()?;Automatically detects and uses:
- 16 application workers (70% of your 24 cores)
- 4 I/O workers (2 per NUMA node)
- 2 GPU workers (detected 2 GPUs)
- io_uring storage (Linux 5.1+ detected)
- NUMA-aware allocation (2 NUMA nodes detected)
- CPU affinity (enabled for >16 cores)
Or use profiles for common scenarios:
let pool = WorkerPool::production()?; // Balanced for production
let pool = WorkerPool::low_latency()?; // <10ΞΌs optimizations
let pool = WorkerPool::performance()?; // Maximum throughputA complete shared-nothing architecture library that combines:
-
π§ Core Worker System β (MVP Complete)
- Isolated workers with no shared state
- Lock-free message passing (SPSC, MPMC)
- Data partitioning strategies
- Worker pools with automatic routing
-
π Networking Layer π (Designed)
- io_uring transport for ultra-low latency
- Dedicated I/O workers
- Protocol layer (HTTP, TCP, custom)
- Zero-copy where possible
-
πΎ Storage Layer π (Designed)
- io_uring for async I/O
- Block, KV, Object storage protocols
- Storage I/O workers
- Optional: SPDK, DAX/PMem
-
β‘ Accelerator Integration π (Designed)
- GPU compute (wgpu, CUDA, Metal, Vulkan)
- Dedicated accelerator workers
- Hybrid CPU/GPU pipelines
- Optional: QAT, DPU, TPU
-
ποΈ Zero-Config System π (Designed)
- Auto-detect all hardware capabilities
- Runtime adaptation based on workload
- Smart defaults for everything
- Profile-based presets
-
π Production Features π (Designed)
- Observability (metrics, tracing, logging)
- Fault tolerance (supervision, circuit breakers)
- State management (snapshots, replication)
- Security (TLS, encryption, sandboxing)
Status: MVP complete. 12-month roadmap to 1.0. See BACKLOG.md for details.
This library is designed as the low-level HPC (High-Performance Computing) layer for Pyralog - a platform for secure, parallel, distributed, and decentralized computing.
Pyralog achieves remarkable performance by building on shared-nothing architecture principles:
| Pyralog Achievement | Enabled By |
|---|---|
| 15.2M write ops/sec (4.8Γ Kafka) | Lock-free message passing, isolated workers |
| 45.2M read ops/sec (5.6Γ Kafka) | Zero-copy channels, cache-optimized workers |
| 650ms failover (15Γ faster than Kafka) | Fault-isolated workers, supervision trees |
| 99.5% efficiency at 50 nodes | Shared-nothing scalability, partitioning strategies |
| 4B+ ops/sec at 1024 nodes | Linear horizontal scaling |
Pyralog Layer β Shared-Nothing Component
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Distributed Log System β Worker Pool + Partitioning
Storage Engine β Isolated Worker State
Consensus Protocol (Raft) β Message Passing Channels
Replication & Quorums β Broadcast + Routing
Network Protocol β I/O Workers (Phase 2)
Analytics (DataFusion) β GPU Accelerators (Phase 4)
Multi-Node Coordination β Dedicated I/O Workers
Traditional distributed databases suffer from:
- π΄ Lock contention across nodes
- π΄ Shared state synchronization overhead
- π΄ Cache coherency bottlenecks
- π΄ Cross-node memory access latency
Shared-nothing architecture eliminates these:
- β Each worker = isolated state = no locks needed
- β Message passing = explicit data flow = predictable performance
- β Cache-line optimization = no false sharing = full CPU utilization
- β Partitioning = data locality = minimal cross-worker communication
Result: Pyralog achieves 4-15Γ performance improvements over competitors while maintaining strong consistency and fault tolerance.
use shared_nothing::prelude::*;
// Pyralog's log segment worker
struct LogSegmentWorker {
segment_id: u64,
}
impl Worker for LogSegmentWorker {
type State = LogSegment;
type Message = LogEntry;
fn init(&mut self) -> Result<Self::State> {
// Initialize isolated log segment
LogSegment::new(self.segment_id)
}
fn handle_message(&mut self, state: &mut Self::State, msg: Envelope<Self::Message>) -> Result<()> {
// Append to local segment (no cross-worker locking)
state.append(msg.payload)?;
Ok(())
}
}
// Pyralog creates worker pool per partition
let pool = WorkerPool::builder()
.factory(|| LogSegmentWorker::new())
.workers(num_cpus::get())
.cpu_affinity(true) // Pin to physical cores
.numa_aware(true) // NUMA-local allocation
.build()?;
// Messages routed by consistent hashing (minimal rebalancing)
pool.send_partitioned(&log_key, entry)?;This foundational HPC layer enables Pyralog to:
- Handle 28B+ operations/sec across 1024 nodes
- Achieve sub-millisecond latency for critical operations
- Scale linearly without performance degradation
- Maintain 99.99% uptime with automatic failover
See Pyralog benchmarks for detailed performance metrics.
- β Zero-sharing by design: Each worker maintains completely isolated state with no shared memory
- β Lock-free message passing: High-throughput channels optimized for minimal contention
- β Cache-optimized: Proper alignment and padding to prevent false sharing between CPU cores
- β Multiple channel types: SPSC, MPSC, and MPMC for different communication patterns
- β Data partitioning: Built-in strategies including hash, range, consistent hashing, and round-robin
- β Type-safe: Leverages Rust's type system for compile-time guarantees
- β Comprehensive documentation: Architecture guide, performance guide, quick start
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Worker Pool β
β ββββββββββββ ββββββββββββ ββββββββββββ βββββββββββββ
β β Worker 1 β β Worker 2 β β Worker 3 β β Worker 4 ββ
β β (Isolatedβ β (Isolatedβ β (Isolatedβ β (Isolatedββ
β β Memory) β β Memory) β β Memory) β β Memory) ββ
β ββββββ²ββββββ ββββββ²ββββββ ββββββ²ββββββ ββββββ²βββββββ
β β β β β β
β ββββββββββββββββ΄βββββββββββββββ΄βββββββββββββββ β
β β β
β βββββββββββΌββββββββββ β
β β Partitioner β β
β β (Hash/Range/etc) β β
β βββββββββββ²ββββββββββ β
ββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββ
β
Messages In
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Worker Pool β
β β
β ββββββββββββββββββ ββββββββββββββββββ ββββββββββββββββββ β
β β Application β β I/O Workers β β Accelerator β β
β β Workers β β β β Workers β β
β β β β ββββββββββββββ β β β β
β β β’ Business β β β Network β β β β’ GPU Compute β β
β β Logic ββββ€ β - TCP/UDP β β β β’ Crypto (QAT) β β
β β β’ Compute β β β - HTTP β β β β’ Compression β β
β β β’ State Mgmt β β β - Custom β β β β’ ML Inference β β
β β β β ββββββββββββββ β β β β
β β (Isolated β β ββββββββββββββ β β (Dedicated β β
β β per worker) ββββ€ β Storage β β β per device) β β
β β β β β - Block β β β β β
β β β β β - KV β β β β β
β β β β β - Object β β β β β
β ββββββββββββββββββ β ββββββββββββββ β ββββββββββββββββββ β
β β² ββββββββββββββββββ β² β
β β β² β β
β βββββββββββββββββββββββ΄ββββββββββββββββββββ β
β β β
β βββββββββββΌβββββββββββ β
β β Partitioner β β
β β (Auto-selected) β β
β ββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Zero-Config Auto-Detection Layer β β
β β β’ CPU/NUMA topology β’ Storage (SPDK, io_uring, DAX) β β
β β β’ Memory (PMem, DAX) β’ Network (DPDK, io_uring, RDMA) β β
β β β’ Accelerators (GPU, QAT, DPU, TPU) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- No Shared State: Workers never share memory, preventing data races and contention
- Message Passing Only: All communication happens through high-performance channels
- Horizontal Scalability: Add more workers linearly to increase capacity
- Fault Isolation: Worker failures don't cascade to other workers
- Data Locality: Partition strategies ensure related data stays on the same worker
Add to your Cargo.toml:
[dependencies]
shared-nothing = "0.1"
# Optional: Enable performance features
[features]
default = ["safe-defaults"]
performance = ["io-uring", "numa-aware", "cuda", "vulkan"]
server = ["io-uring", "spdk", "dpdk", "qat"]Note: Current release is MVP (core worker system only). Full feature set coming in 1.0.
| Phase | Timeline | Status | Features |
|---|---|---|---|
| Phase 0: MVP | Month 0 | β Complete | Core workers, channels, partitioning, pools |
| Phase 1: Foundation | M1-M2 | π Designed | Zero-config auto-detection, builder pattern, profiles |
| Phase 2: Networking | M2-M4 | π Designed | io_uring transport, I/O workers, protocols (HTTP, TCP) |
| Phase 3: Storage | M3-M5 | π Designed | io_uring storage, Block/KV/Object protocols |
| Phase 4: Accelerators | M4-M6 | π Designed | wgpu GPU compute, accelerator workers, hybrid pipelines |
| Phase 5: Zero-Config | M5-M7 | π Designed | Runtime adaptation, workload learning |
| Phase 6: Advanced | M6-M12 | π Designed | Observability, fault tolerance, state management |
| Phase 7: Production | M10-M12 | π Designed | Security, testing, optimization, documentation |
Target: 1.0 release in 12 months
See BACKLOG.md for detailed implementation plan.
use shared_nothing::prelude::*;
struct CounterWorker;
impl Worker for CounterWorker {
type State = u64;
type Message = u64;
fn init(&mut self) -> Result<Self::State> {
Ok(0)
}
fn handle_message(&mut self, state: &mut Self::State, message: Envelope<Self::Message>) -> Result<()> {
*state += message.payload;
println!("Counter: {}", *state);
Ok(())
}
}
fn main() -> Result<()> {
let config = WorkerConfig::new().with_name("counter");
let mut worker = shared_nothing::worker::spawn(CounterWorker, config)?;
worker.send(5)?;
worker.send(10)?;
worker.stop()?;
Ok(())
}use shared_nothing::prelude::*;
use shared_nothing::partition::HashPartitioner;
use std::sync::Arc;
struct DataProcessor {
id: usize,
}
impl Worker for DataProcessor {
type State = Vec<String>;
type Message = String;
fn init(&mut self) -> Result<Self::State> {
Ok(Vec::new())
}
fn handle_message(&mut self, state: &mut Self::State, message: Envelope<Self::Message>) -> Result<()> {
state.push(message.payload);
Ok(())
}
}
fn main() -> Result<()> {
let pool_config = shared_nothing::pool::PoolConfig::new()
.with_num_workers(4);
let mut pool = shared_nothing::pool::WorkerPool::with_partitioner(
pool_config,
|i| DataProcessor { id: i },
Arc::new(HashPartitioner::new()),
)?;
// Messages with the same key always go to the same worker
for i in 0..100 {
let key = format!("user-{}", i % 10);
pool.send_partitioned(&key, format!("data-{}", i))?;
}
pool.stop_all()?;
Ok(())
}Workers are isolated execution units that process messages:
- Isolated State: Each worker has its own
Statetype with no sharing - Message Handler: Process incoming messages asynchronously
- Lifecycle Hooks:
init(),handle_message(),tick(),shutdown() - Control Messages: Support for pause, resume, ping/pong
pub trait Worker: Send + Sized + 'static {
type State: Send + 'static;
type Message: Send + 'static;
fn init(&mut self) -> Result<Self::State>;
fn handle_message(&mut self, state: &mut Self::State, message: Envelope<Self::Message>) -> Result<()>;
fn shutdown(&mut self, state: Self::State) -> Result<()> { Ok(()) }
fn tick(&mut self, state: &mut Self::State) -> Result<()> { Ok(()) }
}High-performance message channels with multiple strategies:
- SPSC: Single Producer Single Consumer (fastest)
- MPSC: Multiple Producer Single Consumer (most common)
- MPMC: Multiple Producer Multiple Consumer (most flexible)
Features:
- Bounded and unbounded variants
- Cache-line aligned statistics
- Timeout support
- Zero-copy where possible
// Create different channel types
let (tx, rx) = Channel::spsc(1024); // Single producer/consumer
let (tx, rx) = Channel::mpsc(1024); // Multi producer/single consumer
let (tx, rx) = Channel::mpmc(1024); // Multi producer/multi consumer
let (tx, rx) = Channel::mpsc_unbounded(); // Unbounded channelDistribute work across workers efficiently:
let partitioner = HashPartitioner::new();
// Same key always maps to same workerlet partitioner = ConsistentHashPartitioner::new(num_workers, virtual_nodes);
// Minimal redistribution when workers are added/removedlet partitioner = RoundRobinPartitioner::new();
// Distribute evenly regardless of keylet partitioner = CustomPartitioner::new(|hash, num_workers| {
// Your custom logic here
(hash % num_workers as u64) as usize
});Manage multiple workers with automatic routing:
let config = PoolConfig::new()
.with_num_workers(8)
.with_cpu_affinity(true) // Pin workers to cores
.with_worker_config(
WorkerConfig::new()
.with_queue_capacity(1024)
.with_high_priority(true)
);
let mut pool = WorkerPool::new(config, |i| MyWorker::new(i))?;
// Send to specific worker
pool.send_to_worker(worker_id, message)?;
// Send based on partitioning key
pool.send_partitioned(&key, message)?;
// Broadcast to all workers
pool.broadcast(message)?;The repository includes several examples:
# Basic worker usage
cargo run --example basic_worker
# Data processing with worker pool
cargo run --example data_processing
# Distributed computation
cargo run --example distributed_compute| Metric | Target | Achieved |
|---|---|---|
| SPSC message latency | <100ns | β ~80ns |
| MPMC message latency | <500ns | β ~400ns |
| Throughput | >10M msg/sec | β ~12M msg/sec |
| Metric | Target | Status |
|---|---|---|
| HTTP request latency (p50) | <10ΞΌs | π Phase 2 |
| HTTP request latency (p99) | <100ΞΌs | π Phase 2 |
| Storage read (io_uring) | <50ΞΌs | π Phase 3 |
| Storage write (io_uring) | <100ΞΌs | π Phase 3 |
| GPU task offload overhead | <200ΞΌs | π Phase 4 |
Run benchmarks to see current performance:
# Channel performance
cargo bench --bench message_passing
# Worker pool performance
cargo bench --bench worker_pool- Use SPSC channels when you have single producer/consumer
- Enable CPU affinity for consistent performance (
with_cpu_affinity(true)) - Tune queue capacity based on your message rate
- Choose the right partitioner:
- Hash: For consistent key-to-worker mapping
- Consistent Hash: When workers may be added/removed
- Round Robin: For uniform distribution without affinity
- Batch messages when possible to reduce overhead
- Profile your workload to identify bottlenecks
Cache Line Optimization: The library uses cache line padding (64 bytes) to prevent false sharing between cores.
Lock-Free Design: Uses atomic operations and lock-free channels (flume/crossbeam) for minimal contention.
Zero-Copy: Message envelopes use move semantics to avoid unnecessary copies.
See PERFORMANCE.md for detailed performance analysis.
- Message arrives at worker pool
- Partitioner determines target worker based on key
- Message is sent through lock-free channel
- Worker receives and processes message
- Worker updates isolated state
- Optional: Worker sends results to other workers
CPU Core 0 CPU Core 1 CPU Core 2 CPU Core 3
βββββββββββ βββββββββββ βββββββββββ βββββββββββ
β Worker β β Worker β β Worker β β Worker β
β State β β State β β State β β State β
β (L1/L2) β β (L1/L2) β β (L1/L2) β β (L1/L2) β
ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ ββββββ¬βββββ
β β β β
ββββββββββββββββββββ΄βββββββββββββββββββ΄βββββββββββββββββββ
β
ββββββββββΌβββββββββ
β Message Queue β
β (Shared L3) β
βββββββββββββββββββ
- Worker Isolation: Panics in one worker don't affect others
- Graceful Shutdown: Workers finish processing before stopping
- Channel Disconnection Handling: Automatic error propagation
- Monitoring: Built-in statistics for message counts and errors
# Run all tests
cargo test
# Run tests with output
cargo test -- --nocapture
# Run specific test
cargo test test_worker_pool- README.md - This file (overview and quick start)
- QUICKSTART.md - Quick reference guide
- ARCHITECTURE.md - Design decisions and patterns
- PERFORMANCE.md - Performance guide and benchmarks
- BACKLOG.md - Detailed 12-month implementation roadmap
- ZERO_CONFIG.md - Zero-configuration system design
- NETWORKING_IMPLEMENTATION.md - Networking layer plan
- PROTOCOL_LAYER.md - Application protocol layer
- STORAGE.md - Low-level storage options
- STORAGE_PROTOCOL_LAYER.md - Storage subsystem design
- ACCELERATORS.md - GPU and accelerator integration
- ADVANCED_FEATURES.md - 50 advanced features across 13 categories
Generate and view the full API documentation:
cargo doc --openstruct ComplexState {
cache: HashMap<String, Vec<u8>>,
counters: Vec<AtomicU64>,
last_update: Instant,
}
impl Worker for MyWorker {
type State = ComplexState;
// ...
}struct ForwardingWorker {
next_worker_tx: Sender<Envelope<Message>>,
}
impl Worker for ForwardingWorker {
fn handle_message(&mut self, state: &mut State, msg: Envelope<Message>) -> Result<()> {
// Process message
process(&msg);
// Forward to next worker
self.next_worker_tx.send(msg)?;
Ok(())
}
}// Start with fewer workers
let mut pool = WorkerPool::new(
PoolConfig::new().with_num_workers(2),
factory
)?;
// Scale up by creating new pool and redistributing work
// (Note: Requires application-level coordination)Contributions are welcome! We're building this library systematically according to the BACKLOG.md.
Currently Accepting:
- Bug fixes for MVP code
- Documentation improvements
- Example applications
- Performance optimizations
- Platform-specific testing
Not Ready Yet (but coming soon):
- Networking layer (Phase 2)
- Storage layer (Phase 3)
- Accelerator integration (Phase 4)
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass (
cargo test) - Run clippy (
cargo clippy) - Format code (
cargo fmt) - Submit a pull request
See BACKLOG.md for:
- Current sprint tasks
- Upcoming features
- Epic tracking
- Implementation priorities
This project is dual-licensed:
TL;DR: Use this however you want, no attribution required. Built for maximum freedom and adoption.
Built with high-performance Rust libraries:
- flume - Fast multi-producer multi-consumer channels
- crossbeam - Lock-free data structures
- parking_lot - Faster synchronization primitives
- ahash - Fast hashing algorithm
Inspired by:
- Erlang/OTP actor model
- Akka framework
- Ray distributed computing
- Microsoft Orleans virtual actors
- Tokio - Most popular async runtime
- async-std - Alternative async runtime
- smol - Minimal async runtime
- flume - Fast MPMC channels
- crossbeam - Lock-free data structures
- parking_lot - Faster synchronization
Current State: MVP Complete (Month 0)
- β Core worker system working and tested
- β Comprehensive documentation
- β Design complete for all major features
- β 12-month roadmap to 1.0
Next Milestones:
- π Alpha (Month 4): Core + Networking + Zero-config
- π Beta (Month 7): + Storage + GPU + Adaptation
- π 1.0 (Month 12): Production-ready with >90% test coverage
Philosophy: "Zero config by default. Maximum control when needed."
Built with β€οΈ in Rust
For questions, issues, or feature requests, please open an issue.
Interested in the future of this library? Watch this repo and read BACKLOG.md for the detailed roadmap.