Improved the cluster raft and tests covarage and performance now the code is more stable when running in a cluster#236
Merged
Conversation
Move JWT validation off the HTTP upgrade path and parse Authorization metadata during upgrade. Introduce a unified post-upgrade auth handler (handle_upgrade_auth) that rate-limits, validates auth, finalizes auth state, and tears down failing connections. Rename internal auth helper to authenticate_ws_request and parse upgrade auth with parse_upgrade_auth. In ConnectionsManager add atomic reservation/release of connection slots (try_reserve_connection_slot/release_connection_slot), prevent duplicate connection IDs, and fix connection counting updates. Reduce per-connection channel capacities (notification: 256→64, event: 8→1) and update related docs/architecture to reflect new limits and a shorter auth timeout. Misc protocol/handler adjustments and cleanup to support these changes.
Use committed consumer-group offsets when consuming: lookup group offsets, use committed offset as start, and reuse ConsumerGroupId for fetch/ack calls. Add extensive integration test coverage for SQL topic/group behavior (fixtures, offset assertions, cache handling, resume/isolation/latest/offset cases) and helper utilities (TopicPublisherCacheGuard, setup_topic_source_fixture, polling helpers). Introduce a realtime chat benchmark and runner script (benchv2): chat_realtime benchmark implementation, run-chat-realtime.sh, and related benchmark wiring changes.
…ions lifecycle Broad set of changes across the backend: refactors, bug fixes and feature updates touching auth, SQL execution/helpers, WebSocket handling, core/transactions and multiple supporting crates. Updates span kalamdb-api (auth endpoints, rate limiter, UI embedding, WS events/protocol), kalamdb-auth (providers, validators, helpers, services), kalamdb-commons (conversions, ids, models, serialization), kalamdb-core (applier, manifest flush, SQL executor, transactions, schema registry), dialect/filestore/handlers/jobs/live/observability/pg/publisher/raft and many tests. Also adds new admin cluster commands (cluster/join and cluster/rebalance) and updates Cargo.toml / Cargo.lock. Changes include improved SQL helper utilities, streaming and file utilities, websocket context/messages/events, password/login flows, and numerous model/ID/serialization updates. This commit consolidates many coordinated updates across crates to support these features and fixes.
Multiple coordinated changes: - nextest: add a "stateful-heavy" test-group and assign many noisy/integration tests to it to avoid contention and preserve concurrency budget. - SQL API: parse/split/classify statements earlier in execute_sql_v1 and forward prepared statements to follower-forwarding logic to avoid duplicate parsing and better reuse metadata. - Forwarding: refactor forward logic to operate on PreparedApiExecutionStatement (use table type/classified info) when deciding target groups. - Raft/network: introduce helper APIs in ClusterClient to centralize gRPC client creation and metadata, and use network_factory.send_client_proposal to simplify Raft group proposal forwarding; improve RaftManager logging and peer summary reporting; add CreateNamespaceIfNotExists meta command for compatibility. - Namespace DDL: handle IF NOT EXISTS via new raft command and recognize "already exists" errors to return success without failing; register schema consistently; add tests. - Topic publisher: make claim handling robust (next_available_from) to avoid regressions when claims expire out-of-order; ensure fetch does not hold claim state during storage scans; add concurrency and redelivery tests. - Jobs manager: avoid dropping awakened jobs when semaphore is saturated by queueing pending awakenings and adjust periodic tasks ordering and WAL cleanup to spawn blocking flush in a task. - Configs: add KALAMDB_TOPIC_VISIBILITY_TIMEOUT_SECS (with legacy alias) env override and tests. - Minor: whitespace/serde alias cleanup and added docs (topic-consumption). These changes improve reliability under test/load, reduce redundant work during SQL forwarding, harden topic delivery semantics, and centralize raft/gRPC networking code for clearer error handling.
Multiple changes to improve Raft networking, leader-routing and test stability: - HTTP SQL execution/forwarding: adjust leader checks and request-id generation; forward_sql_if_follower now accepts raw SQL/params/namespace and supports prepared-statement routing for SELECTs. - Raft networking: introduce RaftChannelPool (shared Arc<DashMap>) and new_with_channel_pool/new_channel_pool constructors so all Raft groups reuse a node-level tonic channel pool; RaftNetworkFactory and RaftGroup updated to accept/inject the shared pool. - Raft manager: create and inject a single channel pool into all RaftGroup instances (both ephemeral and persistent groups). - Raft executor: add peer_stats_cache_refreshed_at with a 5s min refresh interval to throttle frequent peer info polls. - Network internals: minor channel clearing and TLS material handling fixes when toggling TLS. - Tests/CLI: add a shared HTTP client and per-URL KalamLinkClient cache to reduce connection churn; improve cluster readiness checks (SQL/auth readiness), avoid per-call TCP probes for leader cache, and make test helpers more robust (capture flush job IDs from responses or query scoped jobs to avoid races). - nextest: mark many heavy cluster tests into a stateful-heavy group to avoid concurrency-related flakiness. - scripts/cluster.sh: tweak server config (workers, datafusion settings, worker_max_blocking_threads) and make healthcheck retry more tolerant. Overall goal: reduce per-call socket/channel churn, reuse transports across Raft groups, throttle expensive polls, and make cluster tests more deterministic and less resource-hungry.
Make CLUSTER LIST a CLI-only command and switch cluster admin handlers to return structured Arrow result rows. - Treat CLUSTER LIST/STATUS/LS as invalid over SQL (parser/classifier now reject it) and remove the server-side ClusterList handler. - Add cluster/result_rows module to build consistent RecordBatches for cluster admin actions (clear, join, purge, rebalance, snapshot, stepdown, transfer-leader, trigger-election). - Update cluster handlers to return ExecutionResult::Rows via the new helpers instead of plain text messages. - Expose prepared_statement_target_group (pub(crate)) and update HTTP SQL execute path to early-reject multipart uploads when the request targets a non-leader shard (leader hint returned), avoiding file staging/forwarding. - Update tests and CLI docs to expect the new behavior and result schemas, and add CLI session cluster command stubs. - Misc: small formatting/refactor fixes in publisher tests and other minor tweaks.
Multiple improvements across SQL execution, storage backends, topic publishing, and RPC JSON handling: - API: add leader-routing error classification and return 503/NotLeader for detected leader routing failures. - SQL executor: implement plan_dml_with_provider_reload and cache_and_bind_dml_plan to reload table providers on missing-table errors, improve parameterized DML caching/binding, surface NotLeader errors from cached plan execution, and minor API/import cleanup. - Transactions: use DashMap entry API to atomically check/insert active transactions and avoid unnecessary clones when staging mutations. - Transactions overlay/staged mutation: avoid cloning effective entries and add approximate size estimation for staged mutations based on ScalarValue shapes instead of serde_json payload size. - RocksDB backend: optimize batched writes with per-partition CF/prefix cache, use a prefixed physical key helper, tune block cache creation, and implement chunked delete loop for drop_partition to avoid materializing all keys. - Entity store: optimize reverse scans using backend.scan_reverse and limit-aware iteration to avoid heavy buffering. - PG RPC: support more flexible JSON row payloads (plain scalars, arrays, and typed objects), add parsing helpers and tests, and update Update RPC handling. - Publisher/topic service: compute available delivery window (cursor + max contiguous size), honor effective fetch limits, and add tests for claim recovery/redelivery semantics. - CLI tests: add topic_test_support, detect redacted cluster SQL errors, and improve wait-for-query retry diagnostics by recording last output/error. These changes improve robustness for DML planning/exec, reduce memory/cpu overhead in storage operations, and make RPC/test tooling more tolerant of different JSON payload shapes.
Introduce a dedicated FlushScopeWriter to centralize Parquet write + manifest update logic for user/shared scopes and eliminate duplicated code in user/shared flush jobs. Add DELETE_BATCH_SIZE config and perform chunked deletions from RocksDB to avoid oversized batch operations. Refactor SharedTableFlushJob and UserTableFlushJob to use the scope writer, streamline deduplication (user-scoped in-memory processing), make scan batch size configurable, and remove redundant code paths. Refactor FlushExecutor to resolve flush targets via schema providers, run flush jobs on blocking threads through a helper, and limit concurrent post-flush maintenance tasks with a semaphore. Minor related changes: add scope_writer module, update exports in kalamdb-dba and start-up stats snapshot handling, and adapt callers to new APIs. These changes improve code reuse, memory behavior for per-user flushes, and robustness of flush/cleanup operations.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request includes dependency updates, test configuration improvements, and a series of code cleanups and import reordering in the authentication HTTP handlers. The main changes are grouped below
Improved the cluster raft and tests covarage and performance now the code is more stable when running in a cluster