New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get kubelet log from journald when kubelet is running as a systemd service in e2e tests #19306
Get kubelet log from journald when kubelet is running as a systemd service in e2e tests #19306
Conversation
Labelling this PR as size/M |
887ed27
to
03fc3c2
Compare
GCE e2e test build/test passed for commit 887ed273227fdda07011af6ba26f8ebf765691c9. |
@@ -36,6 +36,11 @@ NODE_IMAGE_PROJECT=${KUBE_GCE_NODE_PROJECT:-"${MASTER_IMAGE_PROJECT}"} | |||
CONTAINER_RUNTIME=${KUBE_CONTAINER_RUNTIME:-docker} | |||
RKT_VERSION=${KUBE_RKT_VERSION:-0.5.5} | |||
|
|||
# If the distro is CoreOS, kubelet will run as a systemd service. | |||
if [[ "${OS_DISTRIBUTION}" == "coreos" ]]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at https://github.com/kubernetes/kubernetes/blob/master/cluster/saltbase/pillar/systemd.sls, I worry this is a bit too fragile.
Can we instead somehow detect at run-time in the code whether the kubelet is running under systemd or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ixdy ACK, I will give a try tomorrow, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ixdy How about ssh'ing to all the hosts, and then check the return code of 'sudo systemctl status kubelet.service' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you use go-systemd RunningFromSystemService
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, sorry,missed exactly where you're checking this from - I doubt you want to expose this from within the kubelet.
GCE e2e test build/test passed for commit 03fc3c27715e08083eacea5ac2371784f175341a. |
03fc3c2
to
ed5ff44
Compare
Pushed a new version, which checks the return code of |
GCE e2e build/test failed for commit ed5ff44bf8bb4264e9ba3d3c9508417ff7c20b5d. |
@ixdy This hardcodes "kubelet.service" as the name of the kubelet systemd service. Might not be the best solution, but should be better than my original one. Feedback is welcome :) |
@@ -97,3 +123,29 @@ func logCore(cmds []command, hosts []string, dir, provider string) { | |||
} | |||
wg.Wait() | |||
} | |||
|
|||
func findKubeletSystemdService(provider string, hosts ...string) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
two minor nits:
- maybe "isUsingSystemdKubelet" is a better name for this function?
- you never use the error return value - should this return only bool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ixdy Didn't follow what you meant by "never use the error return value"? If there's an error, a message will be printed.
ed5ff44
to
ddddb4a
Compare
GCE e2e test build/test passed for commit ddddb4aa7d6efbce7fa18cf85db5589475db47fc. |
@@ -97,3 +123,29 @@ func logCore(cmds []command, hosts []string, dir, provider string) { | |||
} | |||
wg.Wait() | |||
} | |||
|
|||
func isUsingSystemdKubelet(provider string, hosts ...string) (bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are only two return points from this function on lines 147 and 150, and neither one returns an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ixdy Got you. Removed the returned error. Thanks!
ddddb4a
to
c3405f5
Compare
GCE e2e build/test failed for commit c3405f54ea6d7f563630569520fe78d4cdbf1b3a. |
cmds := []command{{"cat /var/log/kube-proxy.log", "kube-proxy"}} | ||
|
||
found := isUsingSystemdKubelet(provider, hosts...) | ||
if found { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more minor nit (sorry): rather than using the variable found
, maybe just have
if isUsingSystemdKubelet(provider, hosts...) {
...
} else {
...
}
similarly for the master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ixdy done
c3405f5
to
f01c621
Compare
GCE e2e build/test failed for commit f01c621888ab693cecc10b347d60bb7196fd995d. |
@k8s-bot test this please |
{"cat /var/log/supervisor/supervisord.log", "supervisord"}, | ||
cmds := []command{{"cat /var/log/kube-proxy.log", "kube-proxy"}} | ||
if isUsingSystemdKubelet(provider, hosts...) { | ||
cmds = append(cmds, command{"sudo journalctl --output=cat -u kubelet.service", "kubelet"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we let the name of kubelet.service get passed down somehow rather than hard-code it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could factor out "kubelet.service" into a constant, sure. how likely is this to change, though?
GCE e2e test build/test passed for commit f01c621888ab693cecc10b347d60bb7196fd995d. |
@derekwaynecarr I feel it's better. Also if pass the name of the service, do we still need to check kubelet service at runtime? |
I think this is probably fine as-is. |
@ixdy Sorry, do you mean the current approach (which hardcodes the kubelet.service) is fine, or passing |
Approach as currently implemented (everything hard-coded) is fine with me. |
if err != nil { | ||
fmt.Printf("Error running command: %v\n", err) | ||
} | ||
results[i] = result.Code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to set to 0 if err != nil, is that right/expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonboulle So in this case we assume the kubelet is running as systemd service even when there are errors during SSH().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonboulle raises a good point. if a node has gone away (and thus this ssh call fails), we don't want that to influence the choice of logging location for the rest of the nodes.
I think we should probably be using only successful ssh connections to detect whether nodes are using systemd or not. Could probably even shortcut - return true/false for the first host that we successfully connect to, rather than checking all. I imagine that's the easiest fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ixdy Actually SSH failures doens't affect this as the code is still 0 when there is an error. But I made a change so instead of assume "kubelet is running as systemd service", now we assume "kubelet is not running as systemd service" and try to find the first successful detection of the kubelet.service.
@ixdy So is this OK to merge or you have more concerns? |
ping @ixdy ? |
f01c621
to
3b66179
Compare
If there is any successful detection of kubelet.service, fetch the kubelet logs using journalctl.
@ixdy Updated. |
3b66179
to
fb5ee73
Compare
LGTM, thanks |
GCE e2e test build/test passed for commit 3b66179039f36f10722c08dfb034db588f2ba603. |
GCE e2e test build/test passed for commit fb5ee73. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e test build/test passed for commit fb5ee73. |
Automatic merge from submit-queue |
Auto commit by PR queue bot
When the kubelet is running as a systemd service, we need to get the kubelet log from journald instead of /var/log/kubelet.log
I tested by running some jenkins e2e jobs against my own coreos/e2e branch. This works.The kubelet logs can be found here (kube-proxy log is still empty, because I haven't run them as static pods on that branch yet).
cc @dchen1107 @jonboulle @ixdy @zmerlynn