Add version byte to ClusterConfig and ReplicationHistory serialization#1778
Merged
Add version byte to ClusterConfig and ReplicationHistory serialization#1778
Conversation
Add ClusterConfigVersion constant (version 1) to ClusterConfig that is serialized as the first byte of the config payload. On deserialization, the version is validated and an InvalidDataException is thrown on mismatch. All gossip paths now peek the version byte before full deserialization to reject incompatible payloads early with a warning: - NetworkClusterGossip (receive path) - GarnetServerNode.GossipAsync (response path) - TryMeetAsync caller (meet response path) - ReplicaFailoverSession (failover gossip response path) Also fix SerializeSlotMap to use relative stream position instead of hardcoded position 0 for the segment count header. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add ReplicationHistoryVersion constant (version 1) to ReplicationHistory that is serialized as the first byte of the payload. On deserialization, the version is validated and an InvalidDataException is thrown on mismatch. RecoverReplicationHistory handles version mismatches gracefully by catching the exception, logging a warning, and reinitializing fresh state. This is safe because replication history is re-negotiated on the next primary/replica sync. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR introduces explicit serialization version markers for ClusterConfig and ReplicationHistory to detect incompatible payloads during cluster gossip and replication-history disk recovery, and updates gossip/meet/failover paths to validate versions before deserializing.
Changes:
- Add a leading version byte to
ClusterConfigandReplicationHistorybinary formats, with validation on deserialization and aTryPeekVersionfast-path forClusterConfig. - Add version checks to gossip receive/response paths (including MEET and failover gossip responses) to reject incompatible payloads early with warnings.
- Fix
ClusterConfig.SerializeSlotMapto write the segment-count header at the correct reserved stream position (relative to current stream offset, not hardcoded to 0). - Add new cluster tests covering round-trip, version mismatch, and empty payload for
TryPeekVersion.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/Garnet.test.cluster/ClusterConfigTests.cs | Adds tests for versioned ClusterConfig/ReplicationHistory serialization behavior. |
| libs/cluster/Session/RespClusterBasicCommands.cs | Validates ClusterConfig version before gossip deserialization on the network receive path. |
| libs/cluster/Server/Replication/ReplicationHistoryManager.cs | Adds versioning to replication history serialization and attempts graceful recovery on version mismatch. |
| libs/cluster/Server/Gossip/Gossip.cs | Validates MEET response version before deserializing/merging cluster config. |
| libs/cluster/Server/Gossip/GarnetServerNode.cs | Validates gossip response version before deserializing/merging. |
| libs/cluster/Server/Failover/ReplicaFailoverSession.cs | Validates failover gossip response version before deserializing/merging. |
| libs/cluster/Server/ClusterConfigSerializer.cs | Implements TryPeekVersion, writes/validates version byte, and fixes slot-map segment-count header position. |
| libs/cluster/Server/ClusterConfig.cs | Defines ClusterConfigVersion constant. |
Comments suppressed due to low confidence (1)
libs/cluster/Server/Gossip/GarnetServerNode.cs:205
- This deserializes the same returnedConfigArray twice (once into 'other' and again inside TryMerge). Reuse the already-deserialized 'other' when merging to avoid extra allocations and parsing on the gossip hot path.
var other = ClusterConfig.FromByteArray(returnedConfigArray);
var current = clusterProvider.clusterManager.CurrentConfig;
// Check if gossip is from a node that is known and trusted before merging
if (current.IsKnown(other.LocalNodeId))
clusterProvider.clusterManager.TryMerge(ClusterConfig.FromByteArray(returnedConfigArray));
else
Replace the single leading version byte with a 2-byte magic prefix followed by the version byte. This eliminates ambiguity with legacy (pre-version) payloads: - ClusterConfig uses 'GC' (0x47 0x43) magic. As a little-endian UInt16 this equals 18243, which exceeds the maximum possible legacy segmentCount (16384), so it can never collide with an old payload. - ReplicationHistory uses 'GR' (0x47 0x52) magic, distinguishable from the legacy format which starts with a 7-bit encoded string length. TryPeekVersion now validates both magic bytes before extracting the version, ensuring legacy payloads are reliably rejected rather than silently misinterpreted. Add ClusterConfigLegacyPayloadRejectedTest to verify old-format payloads are properly rejected by both TryPeekVersion and FromByteArray. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When a MEET response has an incompatible config version, dispose the newly-created GarnetServerNode to avoid leaking sockets/resources, and count the meet request as failed in gossip stats. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Catch EndOfStreamException and IOException in addition to InvalidDataException during replication history recovery. This handles truncated or corrupted on-disk payloads gracefully by reinitializing fresh state instead of crashing the server. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Pass the already-deserialized ClusterConfig object directly to TryMerge instead of calling FromByteArray a second time. Fixes both ReplicaFailoverSession and GarnetServerNode.GossipAsync. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace explicit Encoding.ASCII with the default (UTF-8) in ClusterConfigSerializer and ReplicationHistory BinaryWriters. The encoding parameter only affects string serialization (ReadString/ Write(string)), not integer types. ASCII silently replaces non-ASCII characters with '?' causing data corruption. The BinaryReader in ClusterConfigSerializer was already using the default UTF-8, creating a writer/reader encoding asymmetry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Expand XML doc comments on ClusterConfigMagic and
ReplicationHistoryMagic to explain:
- The human-readable prefix values ('GC' = Garnet Config,
'GR' = Garnet Replication)
- Why a magic prefix is needed for backwards compatibility
with the legacy pre-versioned format
- How each magic value is guaranteed to never collide with
valid legacy payload headers
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace static readonly byte[] with static ReadOnlySpan<byte>
properties backed by UTF-8 string literals ('GC'u8 / 'GR'u8).
This avoids heap allocation — the data is embedded directly in
the assembly as a compile-time constant.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update serialization version fields from byte (1 byte) to int (4 bytes) for both ClusterConfig and ReplicationHistory. This changes the binary layout to: 2-byte magic word + 4-byte int version. - ClusterConfigVersion: byte -> int - ReplicationHistoryVersion: byte -> int - TryPeekVersion: reads 4 bytes via BitConverter.ToInt32 - FromByteArray: uses ReadInt32() with updated length checks - Tests: updated to use BitConverter for version corruption/validation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…zation
Drop the 2-byte magic prefix ("GC"/"GR") from both formats since legacy
(pre-versioned) format support is not needed. The binary layout is now
simply: version int (4 bytes) + payload. This reduces gossip message
overhead by 2 bytes per exchange.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…onHistory Revert version type from int (4 bytes) to byte (1 byte) for both ClusterConfig and ReplicationHistory serialization. The binary layout is now: version byte (1 byte) + payload, minimizing wire overhead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
badrishc
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add serialization versioning to
ClusterConfigandReplicationHistorybinary formats using a version byte header to enable safe detection of incompatible payloads during gossip and disk recovery.Motivation
Without version markers, nodes running different Garnet versions could exchange incompatible gossip payloads leading to deserialization failures or silent data corruption. The version byte enables:
Changes
1️⃣ ClusterConfig Versioning
libs/cluster/Server/ClusterConfig.csClusterConfigVersionconstant (version 1, byte)libs/cluster/Server/ClusterConfigSerializer.csToByteArray/FromByteArray; addedTryPeekVersionhelper; fixedSerializeSlotMapto use relative stream positionlibs/cluster/Session/RespClusterBasicCommands.csNetworkClusterGossip(receive path)libs/cluster/Server/Gossip/GarnetServerNode.csGossipAsync(response path)libs/cluster/Server/Gossip/Gossip.csTryMeetAsync(meet response path)libs/cluster/Server/Failover/ReplicaFailoverSession.cs2️⃣ ReplicationHistory Versioning
libs/cluster/Server/Replication/ReplicationHistoryManager.csReplicationHistoryVersionconstant (version 1, byte); version byte inToByteArray/FromByteArray; graceful recovery on mismatch (reinitializes fresh state)3️⃣ Tests
ClusterConfigVersionRoundTripTestClusterConfigVersionMismatchThrowsTestInvalidDataExceptionClusterConfigTryPeekVersionEmptyDataTestTryPeekVersionreturns falseReplicationHistoryVersionRoundTripTestReplicationHistoryVersionMismatchThrowsTestInvalidDataExceptionAll deserialization paths covered
ClusterManager.cs(disk recovery)FromByteArraythrowsInvalidDataExceptionNetworkClusterGossip(gossip receive)TryPeekVersionpre-checkGarnetServerNode.GossipAsync(gossip response)TryPeekVersionpre-checkGossip.TryMeetAsync(meet response)TryPeekVersionpre-checkReplicaFailoverSession(failover gossip)TryPeekVersionpre-checkRecoverReplicationHistory(disk recovery)