Conversation
adds redis-compatible keyspace notifications via the notify-keyspace-events config parameter. when enabled, ember publishes to: - __keyspace@0__:<key> with the event name as the message - __keyevent@0__:<event> with the key name as the message changes: - ember-core: expire_sample now collects expired key names into a buffer; run_expiration_cycle returns Vec<String> of expired keys for callers - engine: new expired_tx broadcast channel in EngineConfig; shards broadcast expired key names to server without touching the GET/SET hot path - keyspace_notifications: flag parser (K/E/g/$/ l/z/h/s/x/A), notify helper - config: notify-keyspace-events added to MUTABLE_PARAMS and EmberConfig - server: keyspace_event_flags AtomicU32 on ServerContext; background task subscribes to expired-key broadcast and fires __keyevent@0__:expired - execute: notify_write helper (single atomic load guard — zero overhead when disabled) wired into SET, DEL, EXPIRE, HSET, LPUSH, RPUSH, ZADD, SADD - config set: updates keyspace_event_flags atomically at runtime
…on lag - replace gauge-based ember_keys_expired_total / ember_keys_evicted_total with monotonic counters (ember_expired_keys_total / ember_evicted_keys_total) by tracking deltas between polling intervals — semantics now match the prometheus _total convention - add ember_replication_connected_replicas and ember_replication_max_lag_records gauges; the lag is computed from ReplicaTracker.replica_lags() which returns write_offset - acked_offset per replica - add ReplicaTracker::replica_lags() helper to expose per-replica offsets
persistence: - sigkill_crash_recovery — writes 50 keys with appendfsync=always, kills the server immediately (no sleep, no graceful shutdown), restarts and verifies all 50 keys are intact; validates that fsync-per-write guarantees survive worst-case SIGKILL tls: - tls_basic_commands — generates a self-signed cert via rcgen at test time, starts the server with --tls-port / --tls-cert-file / --tls-key-file, connects with tokio-rustls using the cert as a pinned root, exercises PING / SET / GET over the TLS transport - plain_tcp_rejected_on_tls_port — verifies that a plain TCP connection to the TLS port never receives a valid PONG (TLS handshake must succeed before RESP3 commands are processed) helpers: add tls_cert_file / tls_key_file to ServerOptions; TestServer gains a tls_port field populated when TLS args are present
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
summary
two commits covering the remaining launch-readiness items:
prometheus metrics — the
ember_keys_expired_totalandember_keys_evicted_totalmetrics were incorrectly published as gauges. they're now proper counters that increment by delta on each polling interval, which matches the prometheus_totalnaming convention and enables rate queries likerate(ember_expired_keys_total[5m]). addsember_replication_connected_replicasandember_replication_max_lag_recordsgauges so operators can alert on replication health without pokingINFO replication.crash recovery test —
sigkill_crash_recoverywrites 50 keys withappendfsync always, drops the server immediately (no sleep, no graceful shutdown —Child::kill()is SIGKILL on unix), restarts with the same data directory, and asserts all 50 keys are present. validates the fundamentalappendfsync alwaysguarantee under worst-case conditions.TLS integration tests — two new tests in
tls.rs:tls_basic_commandsstarts a server with a self-signed cert generated byrcgen, connects viatokio-rustlsusing the cert as a pinned root, and verifies PING / SET / GET work correctly over TLSplain_tcp_rejected_on_tls_portverifies that a plain TCP connection to the TLS port never receives a valid PONGwhat was tested
cargo test -p ember-server— 143 unit tests, all passcargo check -p ember-integration-tests— cleancargo build -p ember-server— clean builddesign considerations
for the replication lag metric, a
_secondsgauge would require recording wall-clock timestamps per write — significant added complexity.ember_replication_max_lag_records(record count behind) is an honest and actionable metric: if it's non-zero and not converging, the replica is falling behind. operators can correlate with throughput to estimate time-to-catch-up.