Skip to content

Feature/cluster#76

Merged
jamals86 merged 17 commits intomainfrom
feature/cluster
Jan 8, 2026
Merged

Feature/cluster#76
jamals86 merged 17 commits intomainfrom
feature/cluster

Conversation

@jamals86
Copy link
Copy Markdown
Collaborator

@jamals86 jamals86 commented Jan 8, 2026

No description provided.

Introduces kalamdb-raft crate and integrates Raft-based clustering configuration into ServerConfig. Adds a CommandExecutor abstraction to AppContext for supporting both standalone and Raft-based command execution. Updates dependencies and workspace members to include Raft and gRPC-related crates.
Configured static linking for libstdc++ and libgcc on Windows targets to avoid DLL dependencies, updating both Cross.toml and Dockerfile.cross-windows. Updated backend Dockerfile to use Debian bookworm for both build and runtime stages to ensure GLIBC compatibility.
Introduces Raft-based clustering to KalamDB, including new configuration options for cluster nodes, peer discovery, and sharding. Adds RaftManager and RaftExecutor for orchestrating Raft groups, implements provider-backed SystemApplier for metadata replication, and updates command execution to route through Raft in cluster mode. Also includes Docker Compose setup for multi-node clusters, new integration tests, and updates to system providers and configuration types to support cluster operation.
After initializing the cluster, the RaftManager now adds peer nodes defined in the configuration to the cluster and logs the process. Also updated the Raft replication spec tasks with new TODOs and observations regarding cluster behavior and error handling.
Refactors backend Dockerfile to optimize build caching by splitting dependency and source layers, adds dummy files for dependency resolution, and updates Dockerfile.prebuilt for local binary builds. Enhances cluster compose and scripts to use a shared storage volume for SHARED tables, improves cluster test automation with table/namespace/job wait helpers, and updates documentation to reflect these changes.
… info reporting

Test utilities now use environment variables for server URL, root password, and storage directory, enabling more flexible test configuration. Cluster info reporting in RaftExecutor is improved to use live OpenRaft metrics for accurate node roles and statuses. Test code and smoke tests are updated to use the new dynamic configuration, and flush manifest verification now supports both local and system.manifest-based checks. Minor fixes and improvements are made to test assertions, user creation, and file system checks.
Moved cluster configuration types to a new module and updated references across crates. Introduced UsersApplier trait and provider implementation for user metadata replication via Raft. Updated SQL handlers and executor logic to route user and table metadata changes through Raft in cluster mode. Added type-safe enums for cluster node roles and status, and improved command/response ergonomics for Raft groups.
NodeId is refactored to use u64 for compatibility with OpenRaft, replacing string-based IDs and related methods. SystemApplier and related interfaces now use strongly-typed IDs (NamespaceId, TableId, StorageId, UserId) and TableType enums. Cluster types now re-export OpenRaft's ServerState and provide conversion helpers. CLI and system providers updated to support cluster_id and improved prompt display. Tests and state machines updated for new ID types and table schemas.
Replaces string-based NodeId construction and usage with u64-based NodeId throughout the codebase. Updates tests, job management, app context, and initialization logic to use numeric node IDs for consistency and improved type safety.
Raised the default heartbeat interval to 250ms and election timeout to 500-1000ms in cluster and Raft manager configs. Updated RaftGroup to accept configuration parameters for timeouts, and modified RaftManager to pass the config to all group start calls. This allows for more flexible and consistent Raft timing configuration across the system.
Introduces cluster-wide live query notification via HTTP broadcast, including new API handler and payload types. Implements ProviderUserDataApplier and ProviderSharedDataApplier for Raft-based user/shared table data replication. Updates SQL handler to forward write operations to the cluster leader. Refactors AppContext and LiveQueryManager to support cluster notifier wiring. Adds/updates tests and scripts for cluster scenarios.
Eliminates the legacy HTTP live query notification system in favor of Raft-replicated data appliers for cluster-wide consistency. Removes related handlers, types, and code paths, and updates tests and comments to reflect the new architecture where each node notifies its own subscribers upon data application.
…ixes

This commit adds full state machine snapshot/restore support for Raft, including serializing and restoring state in raft_store.rs and system.rs. Cluster join now waits for learners to catch up and promotes them to voters only after replication, improving safety. DML handlers for DELETE and UPDATE now route all user and stream table operations through Raft in cluster mode, supporting multi-row operations and correct PK filtering. Tests and documentation are updated, and minor bug fixes and refactoring are included throughout the Raft manager, network, and applier layers.
Introduce persistent Raft log and metadata storage via kalamdb-store's StorageBackend and RaftPartitionStore. Add new_persistent constructors to RaftGroup and RaftManager, and update KalamRaftStorage to support both in-memory and persistent modes. Enhance state machines to propagate storage errors as DataResponse::Error, and update tests to cover persistent storage scenarios. Also add raft_storage.rs to kalamdb-store and export Raft storage types.
Deleted deprecated cluster management scripts and test output files. Updated cluster-related test files and documentation to reflect the removal and current best practices.
@jamals86 jamals86 merged commit 4808d41 into main Jan 8, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant