-
Notifications
You must be signed in to change notification settings - Fork 631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for running standalone #49
add support for running standalone #49
Conversation
@Random-Liu PTAL. |
@andyxning Thanks for the change! We also planned to provide a systemd service version of NPD, this would be quite useful! Will review today. |
@andyxning I've got another idea. Can we import and reuse the heapster code https://github.com/kubernetes/heapster/blob/96d9dbf9ee20c200bb7703a8dff0b5ea7c1d1ba1/common/kubernetes/configs.go? In this way, we can avoid writing the code twice, and keep the ux the same with heapster. :) |
@Random-Liu Yep. Indeed i just read the code in BTW, do you think that we need to port all the supporting parameters that heapster supports. |
@andyxning Yeah, I think we have the same requirement here. https://github.com/kubernetes/heapster/blob/96d9dbf9ee20c200bb7703a8dff0b5ea7c1d1ba1/docs/source-configuration.md We may want to add a flag like:
|
@Random-Liu OK. Will refactor code ASAP. |
b7076d3
to
1848b91
Compare
detecting node problems and reporting them to apiserver. Now it is running as | ||
layers in cluster management stack. | ||
|
||
`node-problem-detector` can either run as a [DaemonSet](http://kubernetes.io/docs/admin/daemons/) or run standalone on bare metals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/node-problem-detector
/it
or
s/node-problem-detector
/node-problem-detector
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
||
`node-problem-detector` can either run as a [DaemonSet](http://kubernetes.io/docs/admin/daemons/) or run standalone on bare metals. | ||
|
||
`node-problem-detector` detects node problems and reporting them to apiserver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this above, following "layers in cluster management stack".
"It is a daemon runs on each node, detects node problems and reports them to apiserver."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
||
`node-problem-detector` detects node problems and reporting them to apiserver. | ||
|
||
Now it is running as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra new line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -68,6 +79,9 @@ spec: | |||
spec: | |||
containers: | |||
- name: node-problem-detector | |||
command: | |||
- "/node-problem-detector" | |||
- "-apiserver=http://APISERVER_IP:APISERVER_PORT?inClusterConfig=true&apiVersion=v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to use the default command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
) | ||
|
||
func validateCmdParams() { | ||
if len(*apiServer) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If apiserver-override
is not set, we could just pass an empty url to heapster util, and it will return the default in cluster configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
cfg, err := restclient.InClusterConfig() | ||
|
||
// error is ignored because we have checked it after command line is parsed.:) | ||
uri, _ := url.Parse(apiServer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the function is called NewClientOrDie
, I'm fine to panic or fatal here. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Random-Liu Just as the comment said, the error is ignored. Because the apiServer
has been validated after command line argument is parsed in node_problem_detector.go
's validateCmdParams
function. It will fatal in there if the format of apiServer
URI is invalid.
This will it fail faster if some precondition is not meet.
) | ||
|
||
var ( | ||
kernelMonitorConfigPath = flag.String("kernel-monitor", "/config/kernel_monitor.json", "The path to the kernel monitor config file") | ||
apiServer = flag.String("apiserver", "", "URI used to connect to Kubernetes ApiServer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to apiserver-override
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
a [Kubernetes Addon](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons) | ||
enabled by default in the GCE cluster. | ||
|
||
# Command Line Arguments | ||
* `apiserver` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we rename this to -apiserver-override
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
a [Kubernetes Addon](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons) | ||
enabled by default in the GCE cluster. | ||
|
||
# Command Line Arguments | ||
* `apiserver` | ||
`apiserver` command line argument can customize how to generate a Kubernetes ApiServer client according to `inClusterConfig` URI parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move this into Usage
. Probably add a "Override Apiserver Client Configuration" section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -92,6 +106,9 @@ spec: | |||
* If needed, you can use [ConfigMap](http://kubernetes.io/docs/user-guide/configmap/) | |||
to overwrite the `config/`. | |||
|
|||
## Start Standalone | |||
`node-problem-detector -apiserver=http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false&apiVersion=v1` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a bit more explanation here.
To run node problem detector in standalone with insecure connection to apiserver:
node-problem-detector -apiserver=http://APISERVER_IP:APISERVER_INSECURE_PORT?inClusterConfig=false
And it would be better to add another example, possibly using serviceAccount
or kubeconfig
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@Random-Liu Examples about using serviceAccount
and kubeconfig
is out of mine knowledge now. :(.
Maybe others can do this. :)
@andyxning Some changes are needed. If you encountered any problem, feel free to tell me. :) |
@Random-Liu I will make a separate PR for updating dependencies. |
3079f6a
to
b2f6ce9
Compare
) | ||
|
||
func validateCmdParams() { | ||
if _, err := url.Parse(*apiServerOverride); err != nil { | ||
glog.Fatalf("apiserver-override %q is not a valid HTTP URI. %s", *apiServerOverride, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "apiserver-override %q is not a valid HTTP URI: %v"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -21,15 +21,26 @@ import ( | |||
|
|||
"k8s.io/node-problem-detector/pkg/kernelmonitor" | |||
"k8s.io/node-problem-detector/pkg/problemdetector" | |||
"github.com/golang/glog" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
import (
"flag"
"net/url"
"k8s.io/node-problem-detector/pkg/kernelmonitor"
"k8s.io/node-problem-detector/pkg/problemdetector"
"github.com/golang/glog"
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done. Lost of gofmt
. :)
client "k8s.io/kubernetes/pkg/client/unversioned" | ||
"k8s.io/kubernetes/pkg/types" | ||
"k8s.io/kubernetes/pkg/util/clock" | ||
"k8s.io/heapster/common/kubernetes" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto. Group same type of packages together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
It is a daemon runs on each node, detects node | ||
problems and reports them to apiserver. | ||
node-problem-detector can either run as a | ||
[DaemonSet](http://kubernetes.io/docs/admin/daemons/) or run standalone on bare metals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily on bare metals, right? We can still run it standalone on VM. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I mean removing "bare metals".
@@ -48,6 +52,14 @@ List of supported problem daemons: | |||
| [KernelMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/kernelmonitor) | KernelDeadlock | A problem daemon monitors kernel log and reports problem according to predefined rules. | | |||
|
|||
# Usage | |||
## Override Apiserver Client Configuration | |||
* `-apiserver-override` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a colon here, and do not add new line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -48,6 +52,14 @@ List of supported problem daemons: | |||
| [KernelMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/kernelmonitor) | KernelDeadlock | A problem daemon monitors kernel log and reports problem according to predefined rules. | | |||
|
|||
# Usage | |||
## Override Apiserver Client Configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change the section name to Flags
, in the future we may add more flags. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@@ -48,6 +52,14 @@ List of supported problem daemons: | |||
| [KernelMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/kernelmonitor) | KernelDeadlock | A problem daemon monitors kernel log and reports problem according to predefined rules. | | |||
|
|||
# Usage | |||
## Override Apiserver Client Configuration | |||
* `-apiserver-override` | |||
`apiserver-override` command line argument can customize how to generate a Kubernetes ApiServer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrase the description a bit.
An URI parameter used to customize how node-problem-detector connects the apiserver. The format is the same with the source
flag of Heapster:
http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false&userServiceAccount=false&auth=&insecure=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, I add a link to source
and Heapster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
@@ -92,6 +104,10 @@ spec: | |||
* If needed, you can use [ConfigMap](http://kubernetes.io/docs/user-guide/configmap/) | |||
to overwrite the `config/`. | |||
|
|||
## Start Standalone | |||
`inClusterConfig` should be set to `false`. To run node-problem-detector standalone with an insecure apiserver connection: | |||
`node-problem-detector -apiserver-override=http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/APISERVER_PORT/APISERVER_INSECURE_PORT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To run node-problem-detector standalone, you should set inClusterConfig
to false
and teach node-problem-detector how to access apiserver with apiserver-override
.
To run node-problem-detector standalone with an insecure apiserver connection:
node-problem-detector -apiserver-override=http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false
For more scenarios, see here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
@Random-Liu @andyxning I guess it would be helpful if we could also have the |
Offline discussed with @shyamjvs, we could use either a service account file or the kube config file to do the authentication. |
Sorry about the confusion, my bad. Looks like what I suggested won't be needed as the npd runs in a separate container on which the path to the service account file is (of course) writable. |
@andyxning - do you think you can get it done soon-ish? |
7e0dc46
to
0413bb8
Compare
@Random-Liu @gmarek All are done. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with final nits.
It is a daemon runs on each node, detects node | ||
problems and reports them to apiserver. | ||
node-problem-detector can either run as a | ||
[DaemonSet](http://kubernetes.io/docs/admin/daemons/) or run standalone on bare metals or VMs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I mean to remove "on bare metals or VMs".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misunderstanding. :)
connects the apiserver. The format is the same with the | ||
[`source`](https://github.com/kubernetes/heapster/blob/master/docs/source-configuration.md#kubernetes) | ||
flag of [Heapster](https://github.com/kubernetes/heapster): | ||
`http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false&userServiceAccount=false&auth=&insecure=`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use "```" instead of "`".
See the difference:
http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false&userServiceAccount=false&auth=&insecure=
and
http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false&userServiceAccount=false&auth=&insecure=
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
teach node-problem-detector how to access apiserver with `apiserver-override`. | ||
|
||
To run node-problem-detector standalone with an insecure apiserver connection: | ||
`node-problem-detector -apiserver-override=http://APISERVER_IP:APISERVER_INSECURE_PORT?inClusterConfig=false` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
0413bb8
to
7e55f8f
Compare
@Random-Liu PTAL. |
7e55f8f
to
68b379c
Compare
@andyxning LGTM. I'll manually verify this PR before merging it. :) |
@andyxning I've verified this PR. It works well. Thanks for the patch! :) |
@Random-Liu Seems like you haven't uploaded the new image to gcr.io/google_containers/node-problem-detector:v0.2. Can we do it now, and do we plan to bump the version up to v0.3? |
Yup, we should do it |
Done. Pushed to gcr.io/google_containers/node-problem-detector:v0.3. However there are a couple of issues in the npd repo regarding this version update (I'll file an issue regarding this):
@Random-Liu I hope this push doesn't disturb any plans you might be having for npd release schedule. If yes, it's not too late or difficult to revert this change. We just did it for use in kubemark. |
@gmarek @shyamjvs Maybe we should make a clear schedule about the release of node-problem-detector ASAP. Simply bump minor version for new binary maybe confusing for users. :) This is on the planning of Kubernetes 1.6 release in #58 by @Random-Liu . |
@andyxning That sounds reasonable. However, our work on kubemark was kind of blocked due to not having a stand-alone npd mode. So we just experimentally pushed the v0.3 image for use right now. However, since we would (hopefully) be the only ones using this image or anyone who'd want to use this image would know that it is not part of the release, it is safe to just rewrite v0.3 to the one you actually plan to release later on. |
Add support for running
node-problem-detector
standalone.This PR adds two command line options
in-cluster-config
configures whether we deployNPD
in cluster as a daemon set or on a bare metal standalone. Default to betrue
, i.e., running in cluster as a daemon set.apiserver-addr
is used to specify the apiserver address whenin-cluster-config
is set tofalse
.Finally, we can run
NPD
in standalone like thisThis change is