Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for running standalone #49

Merged

Conversation

andyxning
Copy link
Member

@andyxning andyxning commented Dec 14, 2016

Add support for running node-problem-detector standalone.

This PR adds two command line options

  • in-cluster-config configures whether we deploy NPD in cluster as a daemon set or on a bare metal standalone. Default to be true, i.e., running in cluster as a daemon set.
  • apiserver-addr is used to specify the apiserver address when in-cluster-config is set to false.

Finally, we can run NPD in standalone like this

node-problem-detector -in-cluster-config=false -apiserver-addr=APISERVER_ADDRESS -kernel-monitor=KERNEL_MONITOR_CONFIG_FILE

This change is Reviewable

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Dec 14, 2016
@andyxning
Copy link
Member Author

@Random-Liu PTAL.

@Random-Liu
Copy link
Member

@andyxning Thanks for the change! We also planned to provide a systemd service version of NPD, this would be quite useful!

Will review today.

@Random-Liu
Copy link
Member

Random-Liu commented Dec 15, 2016

@andyxning I've got another idea. Can we import and reuse the heapster code https://github.com/kubernetes/heapster/blob/96d9dbf9ee20c200bb7703a8dff0b5ea7c1d1ba1/common/kubernetes/configs.go?

In this way, we can avoid writing the code twice, and keep the ux the same with heapster. :)

And fix #27 and #21 at the same time.

@andyxning
Copy link
Member Author

andyxning commented Dec 16, 2016

@Random-Liu Yep. Indeed i just read the code in config.go and simply it. :)
I will refactor the code like to reuse package in heapster.

BTW, do you think that we need to port all the supporting parameters that heapster supports.

@Random-Liu
Copy link
Member

Random-Liu commented Dec 16, 2016

@andyxning Yeah, I think we have the same requirement here. https://github.com/kubernetes/heapster/blob/96d9dbf9ee20c200bb7703a8dff0b5ea7c1d1ba1/docs/source-configuration.md

We may want to add a flag like:

--apiserver=http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false&useServiceAccount=true&auth=

@andyxning
Copy link
Member Author

@Random-Liu OK. Will refactor code ASAP.

@andyxning andyxning force-pushed the add_support_for_running_standalone branch 9 times, most recently from b7076d3 to 1848b91 Compare December 16, 2016 15:28
detecting node problems and reporting them to apiserver. Now it is running as
layers in cluster management stack.

`node-problem-detector` can either run as a [DaemonSet](http://kubernetes.io/docs/admin/daemons/) or run standalone on bare metals.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/node-problem-detector/it
or
s/node-problem-detector/node-problem-detector

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


`node-problem-detector` can either run as a [DaemonSet](http://kubernetes.io/docs/admin/daemons/) or run standalone on bare metals.

`node-problem-detector` detects node problems and reporting them to apiserver.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this above, following "layers in cluster management stack".

"It is a daemon runs on each node, detects node problems and reports them to apiserver."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


`node-problem-detector` detects node problems and reporting them to apiserver.

Now it is running as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra new line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -68,6 +79,9 @@ spec:
spec:
containers:
- name: node-problem-detector
command:
- "/node-problem-detector"
- "-apiserver=http://APISERVER_IP:APISERVER_PORT?inClusterConfig=true&apiVersion=v1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to use the default command.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

)

func validateCmdParams() {
if len(*apiServer) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If apiserver-override is not set, we could just pass an empty url to heapster util, and it will return the default in cluster configuration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

cfg, err := restclient.InClusterConfig()

// error is ignored because we have checked it after command line is parsed.:)
uri, _ := url.Parse(apiServer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the function is called NewClientOrDie, I'm fine to panic or fatal here. :)

Copy link
Member Author

@andyxning andyxning Dec 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Random-Liu Just as the comment said, the error is ignored. Because the apiServer has been validated after command line argument is parsed in node_problem_detector.go's validateCmdParams function. It will fatal in there if the format of apiServer URI is invalid.

This will it fail faster if some precondition is not meet.

)

var (
kernelMonitorConfigPath = flag.String("kernel-monitor", "/config/kernel_monitor.json", "The path to the kernel monitor config file")
apiServer = flag.String("apiserver", "", "URI used to connect to Kubernetes ApiServer")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to apiserver-override.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

a [Kubernetes Addon](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
enabled by default in the GCE cluster.

# Command Line Arguments
* `apiserver`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we rename this to -apiserver-override?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

a [Kubernetes Addon](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
enabled by default in the GCE cluster.

# Command Line Arguments
* `apiserver`
`apiserver` command line argument can customize how to generate a Kubernetes ApiServer client according to `inClusterConfig` URI parameter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should move this into Usage. Probably add a "Override Apiserver Client Configuration" section

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -92,6 +106,9 @@ spec:
* If needed, you can use [ConfigMap](http://kubernetes.io/docs/user-guide/configmap/)
to overwrite the `config/`.

## Start Standalone
`node-problem-detector -apiserver=http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false&apiVersion=v1`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a bit more explanation here.

To run node problem detector in standalone with insecure connection to apiserver:

node-problem-detector -apiserver=http://APISERVER_IP:APISERVER_INSECURE_PORT?inClusterConfig=false

And it would be better to add another example, possibly using serviceAccount or kubeconfig.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.
@Random-Liu Examples about using serviceAccount and kubeconfig is out of mine knowledge now. :(.

Maybe others can do this. :)

@Random-Liu
Copy link
Member

@andyxning Some changes are needed.
And it seems that this PR has some problem with Godep. Could you fix the Godep and update Godep in a separate commit?

If you encountered any problem, feel free to tell me. :)

@andyxning
Copy link
Member Author

@Random-Liu I will make a separate PR for updating dependencies.

@andyxning andyxning force-pushed the add_support_for_running_standalone branch 9 times, most recently from 3079f6a to b2f6ce9 Compare December 18, 2016 01:50
)

func validateCmdParams() {
if _, err := url.Parse(*apiServerOverride); err != nil {
glog.Fatalf("apiserver-override %q is not a valid HTTP URI. %s", *apiServerOverride, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "apiserver-override %q is not a valid HTTP URI: %v".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -21,15 +21,26 @@ import (

"k8s.io/node-problem-detector/pkg/kernelmonitor"
"k8s.io/node-problem-detector/pkg/problemdetector"
"github.com/golang/glog"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

import (
 	"flag"
        "net/url"
  
  	"k8s.io/node-problem-detector/pkg/kernelmonitor"
  	"k8s.io/node-problem-detector/pkg/problemdetector"

  	"github.com/golang/glog"
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Lost of gofmt. :)

client "k8s.io/kubernetes/pkg/client/unversioned"
"k8s.io/kubernetes/pkg/types"
"k8s.io/kubernetes/pkg/util/clock"
"k8s.io/heapster/common/kubernetes"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. Group same type of packages together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

It is a daemon runs on each node, detects node
problems and reports them to apiserver.
node-problem-detector can either run as a
[DaemonSet](http://kubernetes.io/docs/admin/daemons/) or run standalone on bare metals.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily on bare metals, right? We can still run it standalone on VM. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I mean removing "bare metals".

@@ -48,6 +52,14 @@ List of supported problem daemons:
| [KernelMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/kernelmonitor) | KernelDeadlock | A problem daemon monitors kernel log and reports problem according to predefined rules. |

# Usage
## Override Apiserver Client Configuration
* `-apiserver-override`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a colon here, and do not add new line.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -48,6 +52,14 @@ List of supported problem daemons:
| [KernelMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/kernelmonitor) | KernelDeadlock | A problem daemon monitors kernel log and reports problem according to predefined rules. |

# Usage
## Override Apiserver Client Configuration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the section name to Flags, in the future we may add more flags. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@@ -48,6 +52,14 @@ List of supported problem daemons:
| [KernelMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/kernelmonitor) | KernelDeadlock | A problem daemon monitors kernel log and reports problem according to predefined rules. |

# Usage
## Override Apiserver Client Configuration
* `-apiserver-override`
`apiserver-override` command line argument can customize how to generate a Kubernetes ApiServer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrase the description a bit.

An URI parameter used to customize how node-problem-detector connects the apiserver. The format is the same with the source flag of Heapster:

http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false&userServiceAccount=false&auth=&insecure=

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I add a link to source and Heapster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@@ -92,6 +104,10 @@ spec:
* If needed, you can use [ConfigMap](http://kubernetes.io/docs/user-guide/configmap/)
to overwrite the `config/`.

## Start Standalone
`inClusterConfig` should be set to `false`. To run node-problem-detector standalone with an insecure apiserver connection:
`node-problem-detector -apiserver-override=http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/APISERVER_PORT/APISERVER_INSECURE_PORT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To run node-problem-detector standalone, you should set inClusterConfig to false and teach node-problem-detector how to access apiserver with apiserver-override.

To run node-problem-detector standalone with an insecure apiserver connection:

node-problem-detector -apiserver-override=http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false

For more scenarios, see here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@shyamjvs
Copy link
Member

shyamjvs commented Jan 5, 2017

@Random-Liu @andyxning I guess it would be helpful if we could also have the ServiceAccountFile overridable. As often in some images (like debian:jessie), we have the defaultServiceAccountFile on a read-only filesystem and hence overwriting the file is not possible when npd has to run in the stand-alone mode. One example where this would be useful is in kubemark where the node-problem-detector runs as a container inside the hollow-node pod. (Ref: https://github.com/kubernetes/kubernetes/blob/master/test/kubemark/resources/hollow-node_template.json#L147)

@Random-Liu
Copy link
Member

Offline discussed with @shyamjvs, we could use either a service account file or the kube config file to do the authentication.

@shyamjvs
Copy link
Member

shyamjvs commented Jan 6, 2017

Sorry about the confusion, my bad. Looks like what I suggested won't be needed as the npd runs in a separate container on which the path to the service account file is (of course) writable.
Was confusing this with the option of morphing the npd binary into the kubemark binary (just like the hollow-kubelet and hollow-proxy) instead of running separately.
Thanks for the clarification @Random-Liu :)

@gmarek
Copy link

gmarek commented Jan 6, 2017

@andyxning - do you think you can get it done soon-ish?

@Random-Liu Random-Liu mentioned this pull request Jan 7, 2017
11 tasks
@andyxning andyxning force-pushed the add_support_for_running_standalone branch from 7e0dc46 to 0413bb8 Compare January 7, 2017 02:15
@andyxning
Copy link
Member Author

andyxning commented Jan 7, 2017

@Random-Liu @gmarek All are done. PTAL.

Copy link
Member

@Random-Liu Random-Liu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with final nits.

It is a daemon runs on each node, detects node
problems and reports them to apiserver.
node-problem-detector can either run as a
[DaemonSet](http://kubernetes.io/docs/admin/daemons/) or run standalone on bare metals or VMs.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, I mean to remove "on bare metals or VMs".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misunderstanding. :)

connects the apiserver. The format is the same with the
[`source`](https://github.com/kubernetes/heapster/blob/master/docs/source-configuration.md#kubernetes)
flag of [Heapster](https://github.com/kubernetes/heapster):
`http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false&userServiceAccount=false&auth=&insecure=`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use "```" instead of "`".
See the difference:

http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false&userServiceAccount=false&auth=&insecure=

and
http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false&userServiceAccount=false&auth=&insecure=

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

teach node-problem-detector how to access apiserver with `apiserver-override`.

To run node-problem-detector standalone with an insecure apiserver connection:
`node-problem-detector -apiserver-override=http://APISERVER_IP:APISERVER_INSECURE_PORT?inClusterConfig=false`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@andyxning andyxning force-pushed the add_support_for_running_standalone branch from 0413bb8 to 7e55f8f Compare January 7, 2017 15:23
@andyxning
Copy link
Member Author

@Random-Liu PTAL.

@Random-Liu
Copy link
Member

@andyxning LGTM. I'll manually verify this PR before merging it. :)

@Random-Liu
Copy link
Member

@andyxning I've verified this PR. It works well. Thanks for the patch! :)

@Random-Liu Random-Liu merged commit 0cd7944 into kubernetes:master Jan 10, 2017
@shyamjvs
Copy link
Member

@Random-Liu Seems like you haven't uploaded the new image to gcr.io/google_containers/node-problem-detector:v0.2. Can we do it now, and do we plan to bump the version up to v0.3?

@gmarek
Copy link

gmarek commented Jan 10, 2017

Yup, we should do it

@shyamjvs
Copy link
Member

Done. Pushed to gcr.io/google_containers/node-problem-detector:v0.3.

However there are a couple of issues in the npd repo regarding this version update (I'll file an issue regarding this):

  1. Changing version tag inside the Makefile.
  2. Changing version tag inside the node-problem-detector.yaml.
  3. Updating the Readme to indicate that v0.3 is the latest version and suggested for use?

@Random-Liu I hope this push doesn't disturb any plans you might be having for npd release schedule. If yes, it's not too late or difficult to revert this change. We just did it for use in kubemark.

@Random-Liu Random-Liu added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 10, 2017
@andyxning andyxning deleted the add_support_for_running_standalone branch January 11, 2017 00:48
@andyxning
Copy link
Member Author

andyxning commented Jan 11, 2017

@gmarek @shyamjvs Maybe we should make a clear schedule about the release of node-problem-detector ASAP. Simply bump minor version for new binary maybe confusing for users. :)

This is on the planning of Kubernetes 1.6 release in #58 by @Random-Liu .

@shyamjvs
Copy link
Member

@andyxning That sounds reasonable. However, our work on kubemark was kind of blocked due to not having a stand-alone npd mode. So we just experimentally pushed the v0.3 image for use right now. However, since we would (hopefully) be the only ones using this image or anyone who'd want to use this image would know that it is not part of the release, it is safe to just rewrite v0.3 to the one you actually plan to release later on.
Does that seem ok to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants