-
Notifications
You must be signed in to change notification settings - Fork 631
Description
Describe the bug
Randomly getting an 'An error occurred at ReplicationManager.RetrieveStoreCheckpoint' error message when replicating the Garnet instance.
When replicating a Garnet instance using Redis-cli or programmatically, I receive an 'OK' output, but replication does not occur. In the error log, I get the message: 'An error occurred at ReplicationManager.RetrieveStoreCheckpoint.'
Commands:
redis-cli -c -h 127.0.0.1 -p 6379 Cluster Replicate ff8f09782078336d22b570988dfaca602d097c6f
Ans: OK
redis-cli -h 127.0.0.1 -p 6380 cluster nodes
Ans: ff8f09782078336d22b570988dfaca602d097c6f 127.0.0.1:6380@16380,DUC-L155.ducontindia.local myself,master - 0 0 53 connected 0-16383
9dd6ee8dfc4817de91375c19522f682b2a77e1be 127.0.0.1:6379@16379,DUC-L155.ducontindia.local master - 638794782854197274 638794782854196345 90 connected
Error Log:
[000.2025-04-05 19:18:12.6720] (Trace) CLUSTER REPLICATE ff8f09782078336d22b570988dfaca602d097c6f
[000.2025-04-05 19:18:12.6721] (Trace) Success recover lock [ClusterReplicate]
[000.2025-04-05 19:18:12.6726] (Information) Initiating background checkpoint retrieval
[000.2025-04-05 19:18:12.6939] (Debug) [127.0.0.1:61344] [00981EB9] Disposing RespServerSession Id=20
[000.2025-04-05 19:18:12.6941] (Debug) [] [0040850F] Disposing RespServerSession Id=0
[000.2025-04-05 19:18:12.7099] (Information) <TsavoriteKV [main]> Index Checkpoint: b9b2d6f2-294c-4eff-bb9c-d00070ee2687
[000.2025-04-05 19:18:12.7101] (Information) <TsavoriteKV [main]> ******** Index Checkpoint Info for b9b2d6f2-294c-4eff-bb9c-d00070ee2687 ********
[000.2025-04-05 19:18:12.7101] (Information) <TsavoriteKV [main]> Table Size: 2097152
[000.2025-04-05 19:18:12.7101] (Information) <TsavoriteKV [main]> Main Table Size (in GB): 0.13421772799999998
[000.2025-04-05 19:18:12.7101] (Information) <TsavoriteKV [main]> Overflow Table Size (in GB): 1.0239999999999999E-06
[000.2025-04-05 19:18:12.7101] (Information) <TsavoriteKV [main]> Num Buckets: 16
[000.2025-04-05 19:18:12.7101] (Information) <TsavoriteKV [main]> Start Logical Address: 64
[000.2025-04-05 19:18:12.7101] (Information) <TsavoriteKV [main]> Final Logical Address: 64
[000.2025-04-05 19:18:12.7215] (Information) <TsavoriteKV [obj]> Index Checkpoint: b9b2d6f2-294c-4eff-bb9c-d00070ee2687
[000.2025-04-05 19:18:12.7215] (Information) <TsavoriteKV [obj]> ******** Index Checkpoint Info for b9b2d6f2-294c-4eff-bb9c-d00070ee2687 ********
[000.2025-04-05 19:18:12.7215] (Information) <TsavoriteKV [obj]> Table Size: 262144
[000.2025-04-05 19:18:12.7216] (Information) <TsavoriteKV [obj]> Main Table Size (in GB): 0.016777215999999998
[000.2025-04-05 19:18:12.7216] (Information) <TsavoriteKV [obj]> Overflow Table Size (in GB): 1.0239999999999999E-06
[000.2025-04-05 19:18:12.7216] (Information) <TsavoriteKV [obj]> Num Buckets: 16
[000.2025-04-05 19:18:12.7216] (Information) <TsavoriteKV [obj]> Start Logical Address: 72
[000.2025-04-05 19:18:12.7216] (Information) <TsavoriteKV [obj]> Final Logical Address: 72
[000.2025-04-05 19:18:12.7752] (Information) <TsavoriteLog [aof]> ******* Recovered HybridLog Stats *******
[000.2025-04-05 19:18:12.7753] (Information) <TsavoriteLog [aof]> Head Address: 64
[000.2025-04-05 19:18:12.7753] (Information) <TsavoriteLog [aof]> Safe Head Address: 64
[000.2025-04-05 19:18:12.7753] (Information) <TsavoriteLog [aof]> ReadOnly Address: 3864
[000.2025-04-05 19:18:12.7753] (Information) <TsavoriteLog [aof]> Safe ReadOnly Address: 3864
[000.2025-04-05 19:18:12.7753] (Information) <TsavoriteLog [aof]> Tail Address: 3864
[000.2025-04-05 19:18:12.7753] (Information) Recovered AOF: begin address = 64, tail address = 3864
[000.2025-04-05 19:18:12.7753] (Information) InitiateReplicaSync: AOF BeginAddress:64 AOF TailAddress:3864
[000.2025-04-05 19:18:12.7771] (Debug) Accepted TCP connection from 127.0.0.1:61346
[000.2025-04-05 19:18:12.8055] (Error) An error occurred at ReplicationManager.RetrieveStoreCheckpoint
System.Exception: ReplicaSyncSession replicaAofTail 3864 < canServeFromAofAddress 10536
at Garnet.cluster.ReplicationManager.g__ReplicaSyncAttachTask|20_0() in //libs/cluster/Server/Replication/ReplicaOps/ReplicaReceiveCheckpoint.cs:line 143
[000.2025-04-05 19:18:12.8059] (Trace) EndRecovery [ClusterReplicate,NoRecovery]_
Garnet.conf
Steps to reproduce the bug
When replicating a Garnet instance using Redis-cli or programmatically, I receive an 'OK' output, but replication does not occur. In the error log, I get the message: 'An error occurred at ReplicationManager.RetrieveStoreCheckpoint.'
Commands:
redis-cli -c -h 127.0.0.1 -p 6379 Cluster Replicate ff8f09782078336d22b570988dfaca602d097c6f
Ans: OK
redis-cli -h 127.0.0.1 -p 6380 cluster nodes
Ans: ff8f09782078336d22b570988dfaca602d097c6f 127.0.0.1:6380@16380,DUC-L155.ducontindia.local myself,master - 0 0 53 connected 0-16383
9dd6ee8dfc4817de91375c19522f682b2a77e1be 127.0.0.1:6379@16379,DUC-L155.ducontindia.local master - 638794782854197274 638794782854196345 90 connected
Error Log:
[000.2025-04-05 19:18:12.6720] (Trace) CLUSTER REPLICATE ff8f09782078336d22b570988dfaca602d097c6f
[000.2025-04-05 19:18:12.6721] (Trace) Success recover lock [ClusterReplicate]
[000.2025-04-05 19:18:12.6726] (Information) Initiating background checkpoint retrieval
[000.2025-04-05 19:18:12.6939] (Debug) [127.0.0.1:61344] [00981EB9] Disposing RespServerSession Id=20
[000.2025-04-05 19:18:12.6941] (Debug) [] [0040850F] Disposing RespServerSession Id=0
[000.2025-04-05 19:18:12.7099] (Information) <TsavoriteKV [main]> Index Checkpoint: b9b2d6f2-294c-4eff-bb9c-d00070ee2687
[000.2025-04-05 19:18:12.7101] (Information) <TsavoriteKV [main]> ******** Index Checkpoint Info for b9b2d6f2-294c-4eff-bb9c-d00070ee2687 ********
[000.2025-04-05 19:18:12.7101] (Information) <TsavoriteKV [main]> Table Size: 2097152
[000.2025-04-05 19:18:12.7101] (Information) <TsavoriteKV [main]> Main Table Size (in GB): 0.13421772799999998
[000.2025-04-05 19:18:12.7101] (Information) <TsavoriteKV [main]> Overflow Table Size (in GB): 1.0239999999999999E-06
[000.2025-04-05 19:18:12.7101] (Information) <TsavoriteKV [main]> Num Buckets: 16
[000.2025-04-05 19:18:12.7101] (Information) <TsavoriteKV [main]> Start Logical Address: 64
[000.2025-04-05 19:18:12.7101] (Information) <TsavoriteKV [main]> Final Logical Address: 64
[000.2025-04-05 19:18:12.7215] (Information) <TsavoriteKV [obj]> Index Checkpoint: b9b2d6f2-294c-4eff-bb9c-d00070ee2687
[000.2025-04-05 19:18:12.7215] (Information) <TsavoriteKV [obj]> ******** Index Checkpoint Info for b9b2d6f2-294c-4eff-bb9c-d00070ee2687 ********
[000.2025-04-05 19:18:12.7215] (Information) <TsavoriteKV [obj]> Table Size: 262144
[000.2025-04-05 19:18:12.7216] (Information) <TsavoriteKV [obj]> Main Table Size (in GB): 0.016777215999999998
[000.2025-04-05 19:18:12.7216] (Information) <TsavoriteKV [obj]> Overflow Table Size (in GB): 1.0239999999999999E-06
[000.2025-04-05 19:18:12.7216] (Information) <TsavoriteKV [obj]> Num Buckets: 16
[000.2025-04-05 19:18:12.7216] (Information) <TsavoriteKV [obj]> Start Logical Address: 72
[000.2025-04-05 19:18:12.7216] (Information) <TsavoriteKV [obj]> Final Logical Address: 72
[000.2025-04-05 19:18:12.7752] (Information) <TsavoriteLog [aof]> ******* Recovered HybridLog Stats *******
[000.2025-04-05 19:18:12.7753] (Information) <TsavoriteLog [aof]> Head Address: 64
[000.2025-04-05 19:18:12.7753] (Information) <TsavoriteLog [aof]> Safe Head Address: 64
[000.2025-04-05 19:18:12.7753] (Information) <TsavoriteLog [aof]> ReadOnly Address: 3864
[000.2025-04-05 19:18:12.7753] (Information) <TsavoriteLog [aof]> Safe ReadOnly Address: 3864
[000.2025-04-05 19:18:12.7753] (Information) <TsavoriteLog [aof]> Tail Address: 3864
[000.2025-04-05 19:18:12.7753] (Information) Recovered AOF: begin address = 64, tail address = 3864
[000.2025-04-05 19:18:12.7753] (Information) InitiateReplicaSync: AOF BeginAddress:64 AOF TailAddress:3864
[000.2025-04-05 19:18:12.7771] (Debug) Accepted TCP connection from 127.0.0.1:61346
[000.2025-04-05 19:18:12.8055] (Error) An error occurred at ReplicationManager.RetrieveStoreCheckpoint
System.Exception: ReplicaSyncSession replicaAofTail 3864 < canServeFromAofAddress 10536
at Garnet.cluster.ReplicationManager.g__ReplicaSyncAttachTask|20_0() in //libs/cluster/Server/Replication/ReplicaOps/ReplicaReceiveCheckpoint.cs:line 143
[000.2025-04-05 19:18:12.8059] (Trace) EndRecovery [ClusterReplicate,NoRecovery]_
Garnet.conf
Expected behavior
No response
Screenshots
No response
Release version
No response
IDE
No response
OS version
No response
Additional context
No response