Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeProblem API #23028

Closed
dchen1107 opened this issue Mar 16, 2016 · 28 comments
Closed

NodeProblem API #23028

dchen1107 opened this issue Mar 16, 2016 · 28 comments
Assignees
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Milestone

Comments

@dchen1107
Copy link
Member

We should introduce NodeProblem to Kubernetes Node API, so that different daemons on the node can report machine health issues to the control-plane through Kubelet. The machine health issues include bad cpu, bad memory, kernel deadlock, corrupted filesystems, etc. Once the control-plane has the visibility to those issues, we can discuss the remedy system.

@dchen1107 dchen1107 added priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node. team/api labels Mar 16, 2016
@dchen1107 dchen1107 added this to the next-candidate milestone Mar 16, 2016
@gmarek
Copy link
Contributor

gmarek commented Mar 21, 2016

+1 - I'd like to be kept up to date with this effort.

@liggitt
Copy link
Member

liggitt commented Mar 21, 2016

as we build integrations to the node api from other components, don't forget the potential need for things using the kubelet API to be able to provide auth (client cert or bearer token, etc).

@alex-mohr
Copy link
Contributor

FWIW, looks like on GCE that nodes can report as healthy even if the route controller has not yet fully programmed their route into GCE's network. Unsure if that warrants a separate issue or should be solved control-plane side by unioning node-reported readiness with environment-of-node-readiness?

@gmarek
Copy link
Contributor

gmarek commented Mar 22, 2016

@alex-mohr - IUUC this is an ubrella issue for v1.3 feature. GCE route thingy is a bug that we need to fix sooner. I filed #23327

@mikedanese
Copy link
Member

@klizhentas

@Random-Liu
Copy link
Member

Random-Liu commented Apr 18, 2016

Ref #23028, #24295

@dchen1107 dchen1107 modified the milestones: v1.3, next-candidate Apr 28, 2016
@dchen1107
Copy link
Member Author

We already have Event, NodeCondition today to report some issues on the node. Also a Taint API object is proposed and under reviewed. I don't want to introduce another API object blindly here. Instead of we should consolidate those objects or have a clear definition for each of them.

@Random-Liu and I had API related discussion with @bgrant0607 and @davidopp. We settled down with:

  • No new API object is introduced for now.
  • NodeProblemDetector (newly introduced daemon here) will use either Event or NodeCondition to carry a node problem. Most likely a triansient problem will be reported as an event, and a permanent problem as a condition, like what we are having today for out-of-resource (memory and disk).
  • In a long term, scheduler is going to only check against taints, not conditions at all. A separate controller will be introduced to the ecosystem to convert the problem related events and conditions to taints. But this is out of the scope of 1.3 release.

@Random-Liu and I had several design discussion today. Here is the current plan for 1.3 release

  • We are going to have a very simply and dump NodeProblemDetector implemention as a reference implementation.
  • The detector will run as a separate daemon on the node.
  • The daemon will only detect the issues from kernel log. :-)
  • The daemon will use watch API to propagate problem events or conditions to upstream layers. We initially had some concern with API server's scalability since this design is introduced another watcher channel per node. After talking to @lavalamp we think it is acceptable for 1.3 release.
  • In a long term, there might be more problem daemons introduced to the node to detect different failures on a node (for example, hardware failures, etc.), then API server's scalability might become to the real problem. We will reconsider the architecture of the problem reporting pipeline. But that is the implementation detail for the users. We have several alternative proposals to address the issue already.

@Random-Liu is working on the detail design for this. Thanks!

@dchen1107
Copy link
Member Author

cc/ @jlowdermilk for our later discussion on GKE node health tracking.

@davidopp
Copy link
Member

The daemon will use watch API to propagate problem events or conditions to upstream layers.

What is the daemon watching? It sounds like the daemon is write-only.

BTW an advantage of NodeCondition over Event is that NodeCondition can be piggybacked on existing NodeStatus updates (though of course it makes the NodeStatus object larger). But anyway, I agree events is reasonable if you are essentially just proxying kernel log messages.

@Random-Liu
Copy link
Member

Random-Liu commented Apr 29, 2016

What is the daemon watching? It sounds like the daemon is write-only.

@davidopp The daemon is watching the current NodeCondition (part of NodeStatus). Because to update the NodeCondition, the daemon needs to know the newest NodeCondition.
However a simple Get before update or Patch may also work. :)

@davidopp
Copy link
Member

I am not suggesting to use NodeStatus here, I just wanted to mention it.

@Random-Liu
Copy link
Member

@davidopp Updated the comment above to make it less ambiguous.

@gmarek
Copy link
Contributor

gmarek commented Apr 29, 2016

Events as api.Events? I thought that those are for human consumption only. cc @bgrant0607

@zmerlynn
Copy link
Member

Is this landing for v1.3? I was assuming it as a solution for #24295, but if that's not the case, that bug will need a different mitigation.

@Random-Liu
Copy link
Member

Random-Liu commented May 11, 2016

@zmerlynn The node-problem-detector will only monitor kernel log for 1.3. We may support simple docker problem detection in the future, and the node-problem-detector is very easy to be extended to do that.
Is there any symptom in the kernel log for #24295?
[ 6882.619004] aufs au_opts_verify:1570:docker[6990]: dirperm1 breaks the protection by the permission bits on the lower branch
seems to be quite noisy to me, it is full of my kernel log. :)

@zmerlynn
Copy link
Member

Nope. And yeah, that log is all over the place for basically any running container.

@dchen1107
Copy link
Member Author

re #24295, I updated it at #24295 (comment) That issue shouldn't be handled by node problem detector.

@Random-Liu
Copy link
Member

FYI, the first version of node problem detector is ready for review kubernetes/node-problem-detector#1

@davidopp
Copy link
Member

Since kubernetes/node-problem-detector#1 has been merged, can we remove v1.3 milestone?

@davidopp
Copy link
Member

davidopp commented Jun 3, 2016

ping

@Random-Liu
Copy link
Member

Random-Liu commented Jun 3, 2016

@davidopp This is almost done, only several things left:

  1. Documentation for node problem detector: Write Readme.md for NodeProblemDetector node-problem-detector#4, Write Readme.md for KernelMonitor node-problem-detector#5

The PRs of 1) and 2) are under review and 3) and 4) will be sent out soon. And 4) should not be blocker for this issue. :)

@dchen1107
Copy link
Member Author

I removed the control-plane team label since this one only covers node level change to make the problem visible to the upstream layers. There is no remedy controller built for 1.3 release.

@Random-Liu
Copy link
Member

@dchen1107 All things are done. Can we close this one?

@matchstick
Copy link
Contributor

Closing as we scrub bugs for 1.3. If it needs to be re-opened please do so for next-candidate.

@pwittrock
Copy link
Member

@Random-Liu
@dchen1107

Would you provide an update on the status for the documentation for this feature as well as add any PRs as they are created?

Not Started / In Progress / In Review / Done

Expected Merge Time

Thanks
@pwittrock

@Random-Liu
Copy link
Member

Random-Liu commented Jun 17, 2016

@pwittrock I have documentation for this feature, but I'm not sure whether or where we should put it in kubernetes.io.

@Random-Liu
Copy link
Member

@pwittrock @dchen1107 The document is In Review now: kubernetes/website#696.

@Random-Liu
Copy link
Member

@pwittrock Doc Status Upgrade: The doc is Done now. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

No branches or pull requests

10 participants