-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add liveness check for apiserver #57
Conversation
utils/install.go
Outdated
} | ||
|
||
vrrp_script chk_apiserver { | ||
script "/usr/bin/wget -q -O - https://127.0.0.1:6443/healthz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bind port of API server may not be 6443. Please reference config.MasterConfiguration.API.BindPort
utils/install.go
Outdated
|
||
vrrp_script chk_apiserver { | ||
script "/usr/bin/wget -q -O - https://127.0.0.1:6443/healthz" | ||
interval 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every second seems too frequent. How about 10 seconds, the default period for liveness probes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be a constant? Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen similar vrrp scripts with low interval values (usually 2 seconds)- do you think 10 seconds x8, will be too long on failure?
utils/install.go
Outdated
vrrp_script chk_apiserver { | ||
script "/usr/bin/wget -q -O - https://127.0.0.1:6443/healthz" | ||
interval 1 | ||
fall 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps too small? The kube-apiserver liveness probe defines a failureThreshold
of 8.
utils/install.go
Outdated
} | ||
|
||
vrrp_script chk_apiserver { | ||
script "/usr/bin/wget -q -O - https://127.0.0.1:6443/healthz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout # script considered failed after 'timeout' seconds
What do you think about defining timeout
? Not sure if keepalived kills the script... Can you check? If it doesn't, it's a good idea to add a timeout to wget, to ensure it doesn't hang.
utils/install.go
Outdated
} | ||
|
||
vrrp_script chk_apiserver { | ||
script "/usr/bin/wget -q -O - https://127.0.0.1:6443/healthz" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the output get logged? Do we care about the output? If we only care about the exit code, I suggest redirecting stdout/err to /dev/null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to log failovers due to api-health problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the check fails, it is logged with the exit status. The output of wget is not logged, updated to > /dev/null
utils/install.go
Outdated
confFile := filepath.Join(constants.SystemdDir, "keepalived.conf") | ||
writeTemplateIntoFile(kaConfFileTemplate, "vipConfFileTemplate", confFile, configTemplateVals) | ||
|
||
scriptFile := filepath.Join(constants.SystemdDir, "chk_apiserver.sh") | ||
writeTemplateIntoFile(kaVRRPScript, "kaVRRPScriptTemplate", scriptFile, configTemplateVals) | ||
ChangePermissions(scriptFile, 775) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems too open, need input on how restrictive it should be. keepalived
process will execute this script.
constants/constants.go
Outdated
VRRPScriptInterval = 10 | ||
VRRPScriptRise = 2 | ||
VRRPScriptFall = 8 | ||
WgetTimeout = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be less than VRRPScriptInterval, since, by the time wget returns in case of timeout, another check for api_server would have been issued.
constants/constants.go
Outdated
const ( | ||
VRRPScriptInterval = 10 | ||
VRRPScriptRise = 2 | ||
VRRPScriptFall = 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why this is set to 8, seems a little high? if I understand it correctly it would mean that VIP will fail over after 80 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After talking with @sarun87, we dropped it down to 6. I think we could drop it down further in the future as I see other vrrp scripts perform a check at much lower intervals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@puneetguptanitj Please see #57 (comment)
utils/install.go
Outdated
} | ||
kaServiceData := KaServiceData{confFile, constants.KeepalivedImage} | ||
kaServiceData := KaServiceData{confFile, scriptFile, constants.KeepalivedImage} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: abbreviation ka
might be a bit too short, other places we have kept the full name like in KeepAlivedImg
might be ok to use full name ?
utils/install.go
Outdated
|
||
kaVRRPScript := `#!/bin/sh | ||
|
||
/usr/bin/wget -T {{.WgetTimeout}} -qO - https://127.0.0.1:{{.InitConfig.MasterConfiguration.API.BindPort}}/healthz > /dev/null 2>&1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to understand, since we are using https here wouldn't we have to ignore certs verification or specify ca.crt as trusted for wget?
de9fa87
to
de9f2c0
Compare
Addresses #47