Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Redis sentinel Error: waitpid() returned a pid we can't find in our scripts execution queue! #12731

Open
koenigfa1 opened this issue Nov 6, 2023 · 15 comments

Comments

@koenigfa1
Copy link

koenigfa1 commented Nov 6, 2023

Describe the bug

I want to setup Redis sentinel in my AKS based on the Bitnami Helm Chart (https://artifacthub.io/packages/helm/bitnami/redis/18.1.2). After deploying the Sentinel Container spams the message:

waitpid() returned a pid (996) we can't find in our scripts execution queue!

To reproduce
I am using an Image with the following Dockerfile:

FROM redis:7.0-alpine3.18

RUN apk add --no-cache bash
RUN apk add openssl

I am using the following YAML to deploy the Redis Sentinel with the above shown Image in the StatefulSet.

YAML.zip

Expected behavior

Redis and Sentinel are Running without these Logs shown in additional information.

Additional information

To make it clear i am NOT using the officall Bitnami Image shown in the Helm Chart docker.io/bitnami/redis:7.2.1-debian-11-r0 / docker.io/bitnami/redis-sentinel:7.2.1-debian-11-r0 because those critical CVEs in the images are not allowed in my environment.

Some Logs from the Pod

1.log
2.log
3.log

@koenigfa1
Copy link
Author

koenigfa1 commented Nov 7, 2023

│ sentinel 1:X 07 Nov 2023 14:28:18.174 # waitpid() returned a pid (144) we can't find in our scripts execution queue!                                            │
│ sentinel 1:X 07 Nov 2023 14:28:22.757 * +sentinel sentinel 817333fc9cfeb5cf8d7595ba084eac43e846dd09 10.248.64.245 26379 @ mymaster xx-yy-redis-n │
│ sentinel 1:X 07 Nov 2023 14:28:22.769 * Sentinel new configuration saved on disk                                                                                │
│ sentinel 1:X 07 Nov 2023 14:28:23.109 # waitpid() returned a pid (152) we can't find in our scripts execution queue!                                            │
│ sentinel 1:X 07 Nov 2023 14:28:28.123 # waitpid() returned a pid (169) we can't find in our scripts execution queue!

I see sentinel doing stuff correctly isnt it? Is the Log Message not an error just an information? If so what is the impact of the message? For more Logs from sentinel refer to the beginning of 1.Log File

@enjoy-binbin
Copy link
Collaborator

enjoy-binbin commented Nov 7, 2023

can you share the INFO SENTINEL output?
My guess is that sentinel has entered TILT mode. (Will this fsync be slow?)

│ sentinel 1:X 07 Nov 2023 14:28:22.769 * Sentinel new configuration saved on disk  

and can you check if the logs have some TITL keyword?

TILT mode source code:

/* This function checks if we need to enter the TILT mode.
 *
 * The TILT mode is entered if we detect that between two invocations of the
 * timer interrupt, a negative amount of time, or too much time has passed.
 * Note that we expect that more or less just 100 milliseconds will pass
 * if everything is fine. However we'll see a negative number or a
 * difference bigger than SENTINEL_TILT_TRIGGER milliseconds if one of the
 * following conditions happen:
 *
 * 1) The Sentinel process for some time is blocked, for every kind of
 * random reason: the load is huge, the computer was frozen for some time
 * in I/O or alike, the process was stopped by a signal. Everything.
 * 2) The system clock was altered significantly.
 *
 * Under both this conditions we'll see everything as timed out and failing
 * without good reasons. Instead we enter the TILT mode and wait
 * for SENTINEL_TILT_PERIOD to elapse before starting to act again.
 *
 * During TILT time we still collect information, we just do not act. */
void sentinelCheckTiltCondition(void) {
    mstime_t now = mstime();
    mstime_t delta = now - sentinel.previous_time;


    if (delta < 0 || delta > sentinel_tilt_trigger) {
        sentinel.tilt = 1;
        sentinel.tilt_start_time = mstime();
        sentinelEvent(LL_WARNING,"+tilt",NULL,"#tilt mode entered");
    }
    sentinel.previous_time = mstime();
}

@koenigfa1
Copy link
Author

@enjoy-binbin thank your for your answer. How can i get the INFO SENTINEL output? If i take a look into all my logs also into the logs linked in my post i never saw something like TILT in my logs. Thats the point why i am confused. How can we check the TILT mode do i need to configure something?

@enjoy-binbin
Copy link
Collaborator

INFO is a command, https://redis.io/commands/info/
or you can post the full INFO command output

@enjoy-binbin
Copy link
Collaborator

can you issue the info command to the sentinel node?

@koenigfa1
Copy link
Author

I kubectl exec into the sentinel container and authenticate with redis-cli. But i dont get any output:

    • k exec -it xy -c sentinel -- bash
    • redis-cli -a xyz
    • image

@enjoy-binbin
Copy link
Collaborator

you should try the sentinel port 26379, not 6379

@koenigfa1
Copy link
Author

koenigfa1 commented Nov 8, 2023

127.0.0.1:26379> info sentinel

Sentinel

sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.248.64.250:6379,slaves=2,sentinels=3

@enjoy-binbin
Copy link
Collaborator

Sorry, i don't have any clues right now.
@hwware do you have some suggestions?

@koenigfa1
Copy link
Author

So you think the redis and sentinel container are running correctly and are production ready? I dont know how to understand the log message...Is it an error or just a warning/info? Does it affect the redis/sentinel so the containers are or can't run correctly in the futher?

@enjoy-binbin
Copy link
Collaborator

it is just a warning.
as far as I know, it won't affect redis or sentinel, but it's better to test it (like try to do a failover in your env)

@koenigfa1
Copy link
Author

Alright. Can you give me a hint how to test in my env a failover scenario?

@enjoy-binbin
Copy link
Collaborator

https://redis.io/docs/management/sentinel/
you can get more info in here, and can learn how to use the sentinel or how sentinel work

@koenigfa1
Copy link
Author

Wow nice now i understand. Everything seems fine:
image
image

I see in the logs the health check fails and the failover from 10.248.64.250 to 10.248.65.12 is done successfully isn't it?

@koenigfa1
Copy link
Author

More logs from the failover
image

From my point of view everything works fine!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants