Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark /var/log/kops-controller.log as owned by kops-controller user #8454

Closed

Conversation

justinsb
Copy link
Member

@justinsb justinsb commented Jan 31, 2020

Mark /var/log/kops-controller.log as owned by kops-controller user

This allows it to be written from the kops-controller pod, which runs as user 1001.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 31, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 31, 2020
@justinsb
Copy link
Member Author

/hold

For discussion in office hours

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 31, 2020
@rifelpet
Copy link
Member

We're definitely seeing E2E failures now because of this issue: https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-kops-aws-1-18/1223377399593308163

I was under the impression that the periodic e2e jobs use the same kops-controller and dns-controller image tags as was built with the kops binary being tested. If thats not the case, it'd be great if we could achieve that.

@justinsb
Copy link
Member Author

justinsb commented Feb 1, 2020

@rifelpet thanks for confirming. The e2e jobs are supposed to use the local version, but I believe it's because I was using an old tag for kops-controller, and so they were pulling that old version from dockerhub instead of the version we were preloading.

I liked the suggestion of logging the version&sha in kops-controller. We should probably also have an e2e test to stop me making the mistake again.

As we discussed in office hours, I'm going to update kops-controller to use UID 1001, and then this file can be granted much tighter permissions.

@justinsb justinsb force-pushed the kops_controller_precreate_logfile branch from 6fb31c3 to 762c7b9 Compare February 1, 2020 13:01
@justinsb justinsb changed the title Mark /var/log/kops-controller.log as (world) writeable Mark /var/log/kops-controller.log as owned by kops-controller user Feb 1, 2020
@justinsb justinsb force-pushed the kops_controller_precreate_logfile branch 3 times, most recently from c08668c to 17837fc Compare February 1, 2020 13:52
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 1, 2020
@justinsb justinsb force-pushed the kops_controller_precreate_logfile branch from 17837fc to ed4c980 Compare February 1, 2020 13:56
@rifelpet
Copy link
Member

rifelpet commented Feb 1, 2020

This doesn't apply to dns-controller because it doesnt use a hostPath for logging, but for kops-controller is there any concern about the upgrade process?

  1. cluster is running an old version of kops-controller that still ran as root user (using a kops alpha release)
  2. kops is upgraded, update cluster --yes causes new kops-controller manifest to be applied. new kops-controller pod runs on old node that doesn't have the log file precreated by nodeup on the host. permissions issue causes pod to crashloop
  3. rolling-update is performed. new nodes aren't labeled correctly until the node that kops-controller was running on is drained and kops-controller is scheduled to a new node

Is that a problem, or are we not worried about it because there hasn't been a stable release that includes kops-controller yet?

@rifelpet
Copy link
Member

rifelpet commented Feb 1, 2020

/test pull-kops-e2e-kubernetes-aws

1 similar comment
@hakman
Copy link
Member

hakman commented Feb 1, 2020

/test pull-kops-e2e-kubernetes-aws

@justinsb
Copy link
Member Author

justinsb commented Feb 1, 2020

kops-controller is a daemonset, but I think you're right because once we kops update the manifest on S3 gets updates, the daemonset gets updated and can't start because of permissions. We should tackle that at some stage (we should not apply the update until we're ready for it), but that is the case today.

Ideas welcome :-)

return nil
}

// We run kops-controller under an unprivileged user (1001), and then grant specific permissions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// We run kops-controller under an unprivileged user (1001), and then grant specific permissions
// We run kops-controller under an unprivileged user (10011), and then grant specific permissions

@rifelpet
Copy link
Member

rifelpet commented Feb 1, 2020

My only idea would be running kops-controller as static pods on the masters, that way the manifest is only updated at instance-launch time. I'm not sure if its worth the effort though, or what other problems using static pods would solve or create.

@johngmyers
Copy link
Member

Didn't we want to set the DaemonSetUpdateStrategy to OnDelete?

Static pods would tie the version of kops-controller more strongly to the version of kops used to create the master. I think that would be a good thing.

Unfortunately, we don't get much testing of mixed-version clusters.

@johngmyers
Copy link
Member

Could someone remind me why kops-controller needs a special log file?

@justinsb
Copy link
Member Author

justinsb commented Feb 2, 2020

Trying to catalog the possibilities here:

  • Move DaemonSet to OnDelete / fixing so that updates aren't applied immediately: We should do this, but won't help us here where we're upgrading from an older version - unless we're very careful.
  • Move to static pods: We probably could do this here, because it's not really separable. But again this is difficult from the point of view of upgrading, and we're also trying to make everything more composable and explicit.
  • Stop using a non-privileged user-id: But separate users are good for security, and it doesn't look like user namespaces are coming soon Support User Namespaces in pods enhancements#127
  • Stop logging to the logfile: But this is a system component and we don't want to assume that we are capturing logs that we send to stdout in a pod.
  • Use an initContainer (running as root) to change the user id. I like this approach because there's less dependency on nodeup setting up things, so it feels scalable. It is a container running as root, but my intuition would be that as long as it runs an explicit command / script with a well-known image, that would actually be better than relying on e.g. nodeup - the ownership permissions would be more explicit.

I'm going to try the initContainer, therefore!

This allows it to be written from the kops-controller pod, which runs as user 1001.
@justinsb justinsb force-pushed the kops_controller_precreate_logfile branch from ed4c980 to 929d3e0 Compare February 2, 2020 13:14
@rifelpet
Copy link
Member

rifelpet commented Feb 2, 2020

I like the idea of having more logic in manifests because the differences during a kops upgrade can be previewed with a “kops update cluster” compare to nodeup changes which can’t be previewed.

It looks like this PR has the init container chowning the file but nodeup still creating the file with the same owner. Is that redundant or am I missing something?

@johngmyers
Copy link
Member

johngmyers commented Feb 2, 2020

I can see three upgrade problems with OnDelete:

  • If upgrading from a version that doesn't have the DaemonSet, pods will start up on the old nodes and presumably crash. But they might do something wrong.
  • If a pod gets deleted prior to the node, the replacement will be a new pod on an old node
  • If the upgrade fails and the user reverts to the older kops, the older kops will not downgrade the DaemonSet (this is not specific to OnDelete)

For static pods, the only upgrade problem I see is the difficulty in switching from DaemonSet to static pods and vice versa. If we start off with static pods then I don't see the problem.

@johngmyers
Copy link
Member

I'm still trying to figure out why the logging requirements are different for kops-controller versus, say, dns-controller. kops-controller presumably isn't going to be managing the container runtime, so if the container runtime isn't capturing pod logs it would seem prudent to address that first before worrying about kops-controller.

@olemarkus
Copy link
Member

I am also curious about the need to handle these logs specifically. Logging to host is very useful for components that prevents the cluster K8s API from working. kube-apiserver, etcd-manager etc is certainly in that group. But it is not clear to me that kops-controller is. If kops-controller is a non-static pod, I assume the cluster must be working well enough for the API to be working (and thus logs available through the API).

@justinsb
Copy link
Member Author

justinsb commented Feb 2, 2020

In terms of log capture, kops-controller is probably a middle-ground. The logs should also be available from kubectl logs, but we also likely want to retain them, which isn't normally the case with most pods.

Perhaps we should have made it a static pod, but that's a much bigger change right now. Realistically if we don't want to delay the release train I think we need to make it work with a DaemonSet.

On the question of whether the nodeup chown is now superfluous - yes. We probably should still create the user so we get a nice username. Pre-creating the logfile is no longer required - not sure if we should remove it or not. Input welcome!

@johngmyers
Copy link
Member

The thing about retaining logs on the host is that they are lost when the instance terminates. For log retention we just exfiltrate them to the centralized log retention and search subsystem, just like we do for ordinary pods.

If we are going to make it a static pod, it would be a lot easier to do so if we can do it before releasing a version with it as a DaemonSet.

@olemarkus
Copy link
Member

If kops-controller doesn't have to be static, I wouldn't make it static.
I am with @johngmyers on retaining logs. Seems simpler and more resilient to just handle logging with fluentd or similar like any other important pod. An exercise for the cluster admin.

@hakman
Copy link
Member

hakman commented Feb 3, 2020

Writing a log directly to disk may be problematic in long run. Depending on size and lifecycle, it may need to be rotated, compressed, deleted. Usually it's more trouble than it's worth it. We also recommend our customers to ship the logs somewhere.

@rifelpet
Copy link
Member

rifelpet commented Feb 3, 2020

The reason I had added hostPath logging for kops-controller was to troubleshoot issues with the E2E jobs, and the e2e framework that dumps logs into job artifacts only supports log files on the host or journalctl commands. I figured we could follow suit with the other kube-system pods that kops sets up for clusters.

Thinking about it more though, since kops-controller only uses the k8s api there may not be any scenarios where a user would not be able to fetch logs through the k8s api yet the kops-controller logs would be relevant to the problem. If the API is inaccessible, kops-controller will not be the root cause (in its current state of purely labeling nodes at least. As we extend kops-controller functionality that may change).

FWIW I believe logrotate does handle the kops-controller.log automatically. If users would prefer not to have the host logs, we can certainly remove it and instead add pod log dumping support to the e2e framework.

We may want to consider what kops-controller will be responsible for in the future and if hostPath logs would be useful in future failure modes.

@rifelpet
Copy link
Member

rifelpet commented Feb 3, 2020

I'm also curious why the presubmit E2E jobs are only ocassioanlly failing with this issue. Here's a failing job due to kops-controller not being Ready, but a later retry (check the PR history link) succeeds.

@justinsb
Copy link
Member Author

justinsb commented Feb 3, 2020

I do think there are two things here, and I think they are mostly independent decisions:

  • Should it be a DaemonSet or a static pod
  • Should we log to a file

I like the idea of looking at what functionality ends up in kops-controller. I imagine like kube-controller-manager it will end up as the "catch-all" place where miscellaneous kops server components live. I'd guess:

  • Node bootstrapping, both to reduce the requirement for nodes to access S3 and to implement secure kubelet bootstrap (as we have optionally on AWS but not elsewhere).
  • Running the controllers for addon operators (assuming we can figure out how to replace them with external controllers)
  • Probably some kops-specific controllers for the cluster-api

I would imagine that in general kops-controller can rely on kube-apiserver / kube-controller-manager / kube-scheduler being up (so logs should work), but it might be responsible for bringing up DNS, load balancers or networking (so kubectl logs might not work from outside the cluster).


Logging to a file doesn't cost us much, particularly as we've already implemented logrotate. We do have to deal with permissions, but I think that applies to static pods also, and we will want to lock down FS access permissions generally for anything we hostPath mount as we move to non-root containers.

While I agree it would be great if everybody ran some form of log aggregator, there isn't one that is widely used AFAIK, so (despite the limitations) I'm wary of just relying on kubectl logs.

That said, if others think hostPath isn't worth it, then we can certainly just stop doing hostPath logging here. I think that unblocks the releases. I think we can also choose to reintroduce it later.

I personally think the logfile is the less important decision.


On static pods vs daemonsets, this is much trickier. The big differences are:

  • A daemonset today is less tightly controlled so it rolls out immediately, whereas a static pod rolls out in the controlled fashion that we expect. (We could address this using nodeSelectors).
  • A static pod today is specified in nodeup code, so it's a little harder to update, though arguably this is what gives us the control. We could address this by putting the manifest into the state store or even the channel.
  • Daemonsets can use automatic service accounts / secrets / configmap mounts. This is convenient, but we can also just read configmaps / secrets directly.

The biggest reason to stick with with the daemonset is that it's what we've got, and switching to a static pod would delay our releases (notably 1.16.0). But OTOH I agree that it's much easier to change now before we've done a stable release.

@justinsb
Copy link
Member Author

justinsb commented Feb 3, 2020

Ah - and I think the reason for the flaky behaviour is that the test sometimes catches the pod when it has just started, before it tries to write to the logfile. I think is because we're missing readiness / liveness probes on the pod (though that depends on the details of the e2e test logic)

@hakman
Copy link
Member

hakman commented Feb 3, 2020

Ah - and I think the reason for the flaky behaviour is that the test sometimes catches the pod when it has just started, before it tries to write to the logfile. I think is because we're missing readiness / liveness probes on the pod (though that depends on the details of the e2e test logic)

I saw similar behaviour when DNS was failing for older versions. I think our main problem here is that we don't validate the cluster before starting the tests. The only validation is the check if all nodes are in Ready state.

https://github.com/kubernetes/test-infra/blob/16e19d8614072954e3760f0a4792868bc2b5af3f/kubetest/kops.go#L447

@johngmyers
Copy link
Member

johngmyers commented Feb 3, 2020

So a third issue: I very much dislike the security implications of running a grab-bag full of addon operators, possibly maintained by other teams, in the same address space as code that holds the keys to the kingdom. I would much prefer if such operators ran separately, with only the privileges they need.

@justinsb
Copy link
Member Author

justinsb commented Feb 3, 2020

On the third issue, because of the limitations of RBAC, addon-operators typically end up with very broad permissions. There is an intriguing possibility to implement something more precise than RBAC, in which case we could run that as a webhook authorizer in kops-controller and split the addon-operators into a less privileged binary (or binaries).

@johngmyers
Copy link
Member

I think I'm primarily concerned with the private keys that node-authorizer needs. I think it's fine to run node-controller in the same process since they both deal with node identity and there's little risk of expanding the blast radius, but I'm wary of adding more.

@johngmyers
Copy link
Member

If kops-controller has a root-owned in-container process for the purpose of keeping a node-level logfile, I would probably have to maintain a patch in our kops fork taking it out. Containers that can run root-owned processes are highly visible to admission controllers and I doubt I'd be able to justify its presence.

@justinsb
Copy link
Member Author

justinsb commented Feb 4, 2020

So I do want to keep the releases going; I'm a little unclear what people are proposing.

I think there are 3 options:

  • Move to a static pod (& keep logfile, keep non-root uid). The highest risk / longest delay option.
  • Stop logging to logfile, keep non-root uid, keep daemonset. Maybe we can tweak the e2e jobs to capture pod logs, if this is the main use-case today for the logfile. We can also try to make a recommendation on a good OSS logfile capturing process.
  • Stop using non-root uid, keep logfile, keep daemonset. I imagine we'd want to reintroduce the non-root uid, but we can probably do so when it doesn't add to our release backlog.

Personally I think compromising by turning off the logfile is the best option:

  • introducing the non-root uid doesn't become much easier in future
  • non-root uid + hostPath is hard (which pushes us away from static pods)
  • static pods also make it harder to force the leadership, which is something we likely want to do in future during rolling updates (though we don't have a natural mechanism for this with either static or non-static pods)
  • we should be able to get the logfile functionality by other means, using some form of log-capture solution

Separately, it sounds like we also want to create more controller processes rather than putting more functionality into kops-controller. In particular we should avoid giving access to the CA key, and we should avoid embedding third party addon operators that might be a vector for less vetted code. Those seem like reasonable goals and we can be mindful of them in the other PRs - we haven't merged anything too problematic yet (that I know of!)

@rifelpet
Copy link
Member

rifelpet commented Feb 4, 2020

I also think just turning off the log file is the best short term solution (keep the DaemonSet, remove the sidecar container). This would unblock the releases and the master branch and give us time to discuss potential alternative solutions, most of which I believe have reasonable migration paths. Since the current kops-controller relies on api-server being up, only having access to its logs via kubectl logs is acceptable for kops 1.16.0.

We may want to consider adding liveness and/or readiness probes to the DaemonSet though.

I can look into possible options for E2E log dumping as well as fixing the validation race condition.

justinsb added a commit to justinsb/kops that referenced this pull request Feb 4, 2020
Writing to a hostPath from a non-root container requires file
ownership changes, which is difficult to roll out today.  See
discussion in kubernetes#8454

We were primarily using the logfile for e2e diagnostics, so we're
going to look into collecting the information via other means instead.
justinsb added a commit to justinsb/kops that referenced this pull request Feb 4, 2020
Writing to a hostPath from a non-root container requires file
ownership changes, which is difficult to roll out today.  See
discussion in kubernetes#8454

We were primarily using the logfile for e2e diagnostics, so we're
going to look into collecting the information via other means instead.

We also haven't yet shipped this logfile in a released version (though
we have shipped it in beta releases)
johngmyers pushed a commit to johngmyers/kops that referenced this pull request Feb 6, 2020
Writing to a hostPath from a non-root container requires file
ownership changes, which is difficult to roll out today.  See
discussion in kubernetes#8454

We were primarily using the logfile for e2e diagnostics, so we're
going to look into collecting the information via other means instead.

We also haven't yet shipped this logfile in a released version (though
we have shipped it in beta releases)
@k8s-ci-robot
Copy link
Contributor

@justinsb: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 15, 2020
@justinsb
Copy link
Member Author

We did #8467 instead

/close

@k8s-ci-robot
Copy link
Contributor

@justinsb: Closed this PR.

In response to this:

We did #8467 instead

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants