Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collecting and publishing cluster configuration #12323

Closed
jackgr opened this issue Aug 6, 2015 · 7 comments
Closed

Collecting and publishing cluster configuration #12323

jackgr opened this issue Aug 6, 2015 · 7 comments
Labels
area/introspection area/security area/ui priority/backlog Higher priority than priority/awaiting-more-evidence. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.

Comments

@jackgr
Copy link
Contributor

jackgr commented Aug 6, 2015

This issue captures the results of a conversation between @satnam6502 , @vishh and @jackgr regarding the use of cluster-insight to collect and publish cluster configuration for the graph tab developed for the Kubernetes UI. See also the related discussion in #8716.

cc: @vasbala, @EranGabber, @supriyagarg, @rimey, @bgrant0607

Overview

The graph tab provides a visualization of a running cluster based on the d3 force layout, showing all of the objects in the cluster and the relationships between them. It also provides two detailed views of the properties of a selected object: a summary view on the main page that displays a few properties, and a complete view on a child page that displays all of the properties.

The information presented includes:

  • objects and relationships exposed by the Kubernetes api server (e.g., nodes, pods, replication controllers, services, namespaces, etc.),
  • objects and relationships exposed by the Docker daemon on each cluster node (e.g., containers, processes, images), and
  • relationships among all of the objects that can be inferred (e.g., between a replication controller and its pods, between a service an its pods, between a container and its processes, etc.).

Rather than attempt to collect, analyze and organize all of this information on the client, we chose to write a Kubernetes service that does that work on the server side. Using a service has several advantages, such as:

  • simplifying the client by allowing it to focus exclusively on presentation,
  • hiding details from the client that might change over time by providing the cluster configuration information as an untyped JSON document,
  • reducing load on the cluster from multiple clients by caching the information and making it available on demand, and
  • separating the collection and publication of cluster configuration from the graph tab or any other client, so that it can be used for a variety of purposes.

Cluster Insight Design

Cluster insight is a set of python scripts that run inside a vanilla python container. (See the docker file for details.)

As revealed by the docker file, it runs in one of two modes:

  • in minion mode, it acts as a read only proxy for the local Docker daemon, providing read only access to the data it exposes, and
  • in master mode, it acts as an aggregator and cache, collecting information from the Kubernetes api server and from the minions, processing it, storing it, and returning it on demand in response to queries.

Two aspects of this design were cause for concern in the discussion on #8716:

  • In order to read from the Docker daemon, the python script must run as root, or as a member of the docker group, and
  • In order to retrieve information about Docker running on every node, the script must run in minion mode on every node.

Alternative Solutions

In practice, these issues may not be as severe as suggested by the discussion on #8716, since:

  • most containers commonly run their processes as root,
  • cluster-insight has a very small memory and cpu footprint in minion mode, and
  • cluster-insight does not provide write access to the Docker daemon.

Nevertheless, it was agreed that we should investigate the possibility of using cadvisor, which already runs on every node within the kubelet, to collect and publish configuration information, in addition to the information it already collects and publishes, which is primarily performance oriented, but does reveal some aspects of configuration.

Use Cases

In order to support focused discussion about alternative solutions, it was agreed that we should enumerate the key use cases for the configuration collection and publication service. The following use cases were identified and discussed:

  • Collecting the current actual configuration and making it available in response to queries, in order to support visualizations and other consumers.
  • Collecting the most recently supplied configuration, so that it can be compared with the current actual configuration to determine whether or not the latter contains values that deviate from the intended values.
  • Collecting the most recently supplied configuration, so that it can be used to provide defaults when updates do not specify values for all possible parameters, since it captures user preferences from previous configuration updates.
  • Collecting historical information about configuration changes, so that it can be correlated with events, logs and monitoring data to support anomaly detection and other forms of programmatic analysis.

Issues

The following issues were identified and examined during the discussion:

  • We should consider the general problem of collecting and publishing cluster configuration information in a holistic way, instead of focusing on the needs of a specific client or use case.
  • Configuration information collected by this service might overlap with similar information maintained by the api server, or as annotations on objects stored in etcd, and should perhaps be combined with it to reduce storage overhead.
  • It might be possible to collect the most recently supplied configuration directly from the deployment process by pushing new configuration to a configuration store, rather than to the api server, and then starting processes to make the actual configuration conform to it.
  • Different container engines may expose different configuration values, despite efforts to standardize their interfaces and configuration formats. We do not want to choose which container engines to support and which to ignore, or to keep up with changes in container engine APIs that have historically evolved rapidly and may continue to do so for some.
  • Some of the information already collected by cadvisor may overlap with some of the information currently collected by cluster-insight. We need a pass over the two information sets to reconcile them. In the process, we might also want to make some judgements regarding the value of specific configuration values, in order to reduce the number of values we collect, store and publish.
@satnam6502
Copy link
Contributor

/cc @satnam6502

@pmorie
Copy link
Member

pmorie commented Aug 6, 2015

/cc @jwforres

@bgrant0607
Copy link
Member

cc @erictune

@bgrant0607 bgrant0607 added priority/backlog Higher priority than priority/awaiting-more-evidence. area/security area/introspection area/ui team/ux labels Aug 6, 2015
@vishh
Copy link
Contributor

vishh commented Aug 6, 2015

What information is required from the nodes?
The idea of separating out collection, storage and processing of config data seems very appealing to me.

@emacias
Copy link

emacias commented Aug 10, 2015

Data regarding operation that is needed:

  • timestamp
  • username
  • id
  • operation type
  • status
  • any error messages
  • commandline used to run the operation (if applicable)
  • client (UI, commandline etc)
  • the new configuration

I expect not all the data is possible to get from node, but these are requirements from the UI point of view.

@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 31, 2017
@0xmichalis
Copy link
Contributor

@kubernetes/sig-cluster-lifecycle-misc

@k8s-ci-robot k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Jun 9, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jun 9, 2017
@roberthbailey
Copy link
Contributor

kube-ui has been deprecated in favor of kube-dashboard and as this issue hasn't had any activity for almost 2 years I'm going to close it as obsolete. Please re-open if you want to keep driving this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/introspection area/security area/ui priority/backlog Higher priority than priority/awaiting-more-evidence. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

No branches or pull requests

10 participants