[BUG] Continuous "I/O error reading bulk count from MASTER: No error information" Failing the Readiness Probe #11414

seanocca · 2022-10-21T04:45:11Z

Describe the bug

We are using this cache in conjunction with our Grafana Loki deployment handling roughly 100-200GB of uncompressed logs every day. This causes issues for the cache reading from master. The redis cluster handles the caching of compressed logs which should work to up to 100GB of throughput (well over the daily amount of 100GB uncompressed logs).

We hit a readiness probe failed error without any helpful error information

The error is as follows

I/O error reading bulk count from MASTER: No error information
RDB: 50 MB of memory used by copy-on-write
Reconnecting to MASTER xxx.xxx.xxx.xxx:6379 after failure
MASTER <-> REPLICA sync started
Non blocking connect for SYNC fired the event.
Master replied to PING, replication can continue...
Partial resynchronization not possible (no cached master)

The main issue for us is the No error information part

There is no way to debug this issue with this kind of response message

To reproduce

We use Kubernetes pods with the spotahome/redis-operator

The failover has some CustomConfig that will override the default values set by the operator (see below in additional information around CustomConfig).
We run 4 instances that have with 9 pod across them. We request 3 cores and 35GB of memory per pod

Expected behavior

We expect one of two scenarios to occur.

The pod to fail with an error message that can help us to change config to improve performance
The pod to either not fail the readiness probe or restart the pod on the occurrence of this error message (you might not be able to help with this one, as we use the redis-operator)

Additional information

We have the Persistent Volume Claim set to the size of 256GB. This should be more than enough data to hold the searched data for any timeframe.
CustomConfig set in the Redis Failover

"repl-timeout 610"
"save 60 5000"
"tcp-keepalive 610"
"maxclients 500000"
"oom-score-adj yes"
"oom-score-adj-values 0 200 800"
"dynamic-hz yes"

The text was updated successfully, but these errors were encountered:

vineelyalamarthy · 2022-10-23T02:14:51Z

is this Redis Cluster or sentinel?

seanocca · 2022-10-23T21:02:00Z

is this Redis Cluster or sentinel?

This error comes up on the redis cluster

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Continuous "I/O error reading bulk count from MASTER: No error information" Failing the Readiness Probe #11414

[BUG] Continuous "I/O error reading bulk count from MASTER: No error information" Failing the Readiness Probe #11414

seanocca commented Oct 21, 2022 •

edited

vineelyalamarthy commented Oct 23, 2022

seanocca commented Oct 23, 2022

[BUG] Continuous "I/O error reading bulk count from MASTER: No error information" Failing the Readiness Probe #11414

[BUG] Continuous "I/O error reading bulk count from MASTER: No error information" Failing the Readiness Probe #11414

Comments

seanocca commented Oct 21, 2022 • edited

vineelyalamarthy commented Oct 23, 2022

seanocca commented Oct 23, 2022

seanocca commented Oct 21, 2022 •

edited