Collecting and publishing cluster configuration #12323

jackgr · 2015-08-06T01:41:12Z

This issue captures the results of a conversation between @satnam6502 , @vishh and @jackgr regarding the use of cluster-insight to collect and publish cluster configuration for the graph tab developed for the Kubernetes UI. See also the related discussion in #8716.

cc: @vasbala, @EranGabber, @supriyagarg, @rimey, @bgrant0607

Overview

The graph tab provides a visualization of a running cluster based on the d3 force layout, showing all of the objects in the cluster and the relationships between them. It also provides two detailed views of the properties of a selected object: a summary view on the main page that displays a few properties, and a complete view on a child page that displays all of the properties.

The information presented includes:

objects and relationships exposed by the Kubernetes api server (e.g., nodes, pods, replication controllers, services, namespaces, etc.),
objects and relationships exposed by the Docker daemon on each cluster node (e.g., containers, processes, images), and
relationships among all of the objects that can be inferred (e.g., between a replication controller and its pods, between a service an its pods, between a container and its processes, etc.).

Rather than attempt to collect, analyze and organize all of this information on the client, we chose to write a Kubernetes service that does that work on the server side. Using a service has several advantages, such as:

simplifying the client by allowing it to focus exclusively on presentation,
hiding details from the client that might change over time by providing the cluster configuration information as an untyped JSON document,
reducing load on the cluster from multiple clients by caching the information and making it available on demand, and
separating the collection and publication of cluster configuration from the graph tab or any other client, so that it can be used for a variety of purposes.

Cluster Insight Design

Cluster insight is a set of python scripts that run inside a vanilla python container. (See the docker file for details.)

As revealed by the docker file, it runs in one of two modes:

in minion mode, it acts as a read only proxy for the local Docker daemon, providing read only access to the data it exposes, and
in master mode, it acts as an aggregator and cache, collecting information from the Kubernetes api server and from the minions, processing it, storing it, and returning it on demand in response to queries.

Two aspects of this design were cause for concern in the discussion on #8716:

In order to read from the Docker daemon, the python script must run as root, or as a member of the docker group, and
In order to retrieve information about Docker running on every node, the script must run in minion mode on every node.

Alternative Solutions

In practice, these issues may not be as severe as suggested by the discussion on #8716, since:

most containers commonly run their processes as root,
cluster-insight has a very small memory and cpu footprint in minion mode, and
cluster-insight does not provide write access to the Docker daemon.

Nevertheless, it was agreed that we should investigate the possibility of using cadvisor, which already runs on every node within the kubelet, to collect and publish configuration information, in addition to the information it already collects and publishes, which is primarily performance oriented, but does reveal some aspects of configuration.

Use Cases

In order to support focused discussion about alternative solutions, it was agreed that we should enumerate the key use cases for the configuration collection and publication service. The following use cases were identified and discussed:

Collecting the current actual configuration and making it available in response to queries, in order to support visualizations and other consumers.
Collecting the most recently supplied configuration, so that it can be compared with the current actual configuration to determine whether or not the latter contains values that deviate from the intended values.
Collecting the most recently supplied configuration, so that it can be used to provide defaults when updates do not specify values for all possible parameters, since it captures user preferences from previous configuration updates.
Collecting historical information about configuration changes, so that it can be correlated with events, logs and monitoring data to support anomaly detection and other forms of programmatic analysis.

Issues

The following issues were identified and examined during the discussion:

We should consider the general problem of collecting and publishing cluster configuration information in a holistic way, instead of focusing on the needs of a specific client or use case.
Configuration information collected by this service might overlap with similar information maintained by the api server, or as annotations on objects stored in etcd, and should perhaps be combined with it to reduce storage overhead.
It might be possible to collect the most recently supplied configuration directly from the deployment process by pushing new configuration to a configuration store, rather than to the api server, and then starting processes to make the actual configuration conform to it.
Different container engines may expose different configuration values, despite efforts to standardize their interfaces and configuration formats. We do not want to choose which container engines to support and which to ignore, or to keep up with changes in container engine APIs that have historically evolved rapidly and may continue to do so for some.
Some of the information already collected by cadvisor may overlap with some of the information currently collected by cluster-insight. We need a pass over the two information sets to reconcile them. In the process, we might also want to make some judgements regarding the value of specific configuration values, in order to reduce the number of values we collect, store and publish.

satnam6502 · 2015-08-06T04:13:32Z

/cc @satnam6502

pmorie · 2015-08-06T19:01:11Z

/cc @jwforres

bgrant0607 · 2015-08-06T19:35:58Z

cc @erictune

vishh · 2015-08-06T22:46:14Z

What information is required from the nodes?
The idea of separating out collection, storage and processing of config data seems very appealing to me.

emacias · 2015-08-10T14:16:49Z

Data regarding operation that is needed:

timestamp
username
id
operation type
status
any error messages
commandline used to run the operation (if applicable)
client (UI, commandline etc)
the new configuration

I expect not all the data is possible to get from node, but these are requirements from the UI point of view.

0xmichalis · 2017-06-09T17:10:00Z

@kubernetes/sig-cluster-lifecycle-misc

roberthbailey · 2017-06-12T06:32:44Z

kube-ui has been deprecated in favor of kube-dashboard and as this issue hasn't had any activity for almost 2 years I'm going to close it as obsolete. Please re-open if you want to keep driving this.

bgrant0607 added priority/backlog Higher priority than priority/awaiting-more-evidence. area/security area/introspection area/ui team/ux labels Aug 6, 2015

k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 31, 2017

k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Jun 9, 2017

k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 9, 2017

roberthbailey closed this as completed Jun 12, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collecting and publishing cluster configuration #12323

Collecting and publishing cluster configuration #12323

jackgr commented Aug 6, 2015

satnam6502 commented Aug 6, 2015

pmorie commented Aug 6, 2015

bgrant0607 commented Aug 6, 2015

vishh commented Aug 6, 2015

emacias commented Aug 10, 2015

0xmichalis commented Jun 9, 2017

roberthbailey commented Jun 12, 2017

Collecting and publishing cluster configuration #12323

Collecting and publishing cluster configuration #12323

Comments

jackgr commented Aug 6, 2015

Overview

Cluster Insight Design

Alternative Solutions

Use Cases

Issues

satnam6502 commented Aug 6, 2015

pmorie commented Aug 6, 2015

bgrant0607 commented Aug 6, 2015

vishh commented Aug 6, 2015

emacias commented Aug 10, 2015

0xmichalis commented Jun 9, 2017

roberthbailey commented Jun 12, 2017