[Infra UI] Kubernetes kluster nodes and capacity incorrect #29497

sergeydeg · 2019-01-29T10:36:00Z

Kibana version: 6.5.4

Elasticsearch version: 6.5.4

Server OS version: CentOS 7

Browser version: Google Chrome 71.0.3578.98

Browser OS version: Windows 10

Original install method (e.g. download page, yum, from source, etc.): YUM

Describe the bug:
Having on-premise Kubernetes cluster (v.1.12.3) instantiated with Kubeadm. Metricbeat 6.5.3 installed.

Infra UI doesn't show Master nodes in "Hosts" screen, only worker nodes
Looking in Kubernetes metrics on each node - CPU/Memory/POD capacity shown only for node where Metricbeat installed, other nodes shown Zero capacity. Disk capacity shown correctly on all nodes.

Steps to reproduce:

Install Kubernetes Kubeadm cluster v1.12.3
Install Metricbeat Kubernetes yaml v.6.5.3
Open Infra UI
Open Metrics of each Host

Expected behavior:
Master nodes must be visible also.
Kubernetes metrics must show correct numbers

Screenshots (if relevant):

elasticmachine · 2019-02-20T22:38:49Z

Pinging @elastic/infrastructure-ui

skh · 2019-03-11T12:44:05Z

From https://www.elastic.co/guide/en/beats/metricbeat/6.5/running-on-kubernetes.html:

You deploy Metricbeat in two different ways at the same time:

As a DaemonSet to ensure that there’s a running instance on each node of the cluster. These instances are used to retrieve most metrics from the host, such as system metrics, Docker stats, and metrics from all the services running on top of Kubernetes.

As a single Metricbeat instance created using a Deployment. This instance is used to retrieve metrics that are unique for the whole cluster, such as Kubernetes events or kube-state-metrics.

Does your setup follow this documentation?

Can you add your configuration to this issue? Please take care to remove all sensitive information like passwords or internal IPs.

sergeydeg · 2019-03-15T07:04:41Z

A week ago I upgraded all ELK components to version 6.6.1. A problem stays the same.
I deployed Metricbeat using ELK provided Kubernetes manifest with two changes.

I use a Logstash output instead of Elasticsearch, and logstash send metricbeat events without transformation to elasticsearch;

    setup.template.enabled: false
    output.logstash:
      hosts: ['${LOGSTASH_HOST:elasticsearch}:${LOGSTASH_PORT:9200}']

I changed Daemonset config to work with on-premise kubeadm created cluster.

      hosts: ["https://${HOSTNAME}:10250"]
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      ssl.verification_mode: "none"
      ssl.certificate_authorities:
        - /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

exekias · 2019-04-08T14:39:01Z

Hi @sergeydeg,

Daemonsets don't run in master nodes by default, if you want to change that behavior you can add this to Metricbeat Pod spec in the Daemonset:

      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule

You have a full example here: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/#create-a-daemonset

Please report back, I'm considering adding a comment about this in the default manifests we ship

simioa · 2019-04-10T07:31:32Z

We have the same issue. "Node Disk Capacity" seems to be calculated correctly, "Node CPU Capacity", "Node Memory Capacity" and "Node Pod Capacity" not.
We are also using version 6.6.1 on all components.

I also checked if the used fields for calculation are present and contain valid values

Present and contain valid values but capacity still remains at 0 in visualizations and 0% in summary overwiev

sergeydeg · 2019-04-11T16:24:24Z

Hi @exekias,
metricbeat already contains all required data, also about master nodes.

I can't find source code for Infra UI, is it publicity available?

exekias · 2019-04-12T10:16:20Z

Node capacity is calculated based on information coming from the node kubelet. Are you seeing any error in Metricbeat logs?

Source for Infrastructure UI is located here: https://github.com/elastic/kibana/tree/master/x-pack/plugins/infra

simioa · 2019-04-12T11:06:45Z

I may found out what the problem is.
I activated elasticsearch query logging in kibana and found out that the particular query against elasticsearch which gets capacities matches the worker hostname against the host.name field.

{
    "type": "log",
    "@timestamp": "2019-04-12T10:39:50Z",
    "tags": [
        "elasticsearch",
        "query",
        "debug",
        "data"
    ],
    "pid": 28073,
    "message": "200\nPOST /_msearch?rest_total_hits_as_int=true&ignore_throttled=true\n{\"index\":[\"*-metrics-*\",\"*-logs-*\"],\"ignoreUnavailable\":true}\n{\"size\":0,\"query\":{\"bool\":{\"must\":[{\"range\":{\"@timestamp\":{\"gte\":1555061944736,\"lte\":1555065544736,\"format\":\"epoch_millis\"}}},{\"match\":{\"host.name\":\"redacted\"}}]}},\"aggs\":{\"capacity\":{\"filter\":{\"match_all\":{}},\"aggs\":{\"timeseries\":{\"date_histogram\":{\"field\":\"@timestamp\",\"interval\":\"1m\",\"min_doc_count\":0,\"extended_bounds\":{\"min\":1555061944736,\"max\":1555065544736}},\"aggs\":{\"max-cpu-cap\":{\"max\":{\"field\":\"kubernetes.node.cpu.allocatable.cores\"}},\"calc-nanocores\":{\"bucket_script\":{\"buckets_path\":{\"cores\":\"max-cpu-cap\"},\"script\":{\"source\":\"params.cores * 1000000000\",\"lang\":\"painless\",\"params\":{\"_interval\":60000}},\"gap_policy\":\"skip\"}}}}},\"meta\":{\"timeField\":\"@timestamp\",\"intervalString\":\"1m\",\"bucketSize\":60}}},\"timeout\":\"90s\"}\n{\"index\":[\"*-metrics-*\",\"*-logs-*\"],\"ignoreUnavailable\":true}\n{\"size\":0,\"query\":{\"bool\":{\"must\":[{\"range\":{\"@timestamp\":{\"gte\":1555061944736,\"lte\":1555065544736,\"format\":\"epoch_millis\"}}},{\"match\":{\"host.name\":\"redacted\"}}]}},\"aggs\":{\"used\":{\"filter\":{\"match_all\":{}},\"aggs\":{\"timeseries\":{\"date_histogram\":{\"field\":\"@timestamp\",\"interval\":\"1m\",\"min_doc_count\":0,\"extended_bounds\":{\"min\":1555061944736,\"max\":1555065544736}},\"aggs\":{\"avg-cpu-usage\":{\"avg\":{\"field\":\"kubernetes.node.cpu.usage.nanocores\"}}}}},\"meta\":{\"timeField\":\"@timestamp\",\"intervalString\":\"1m\",\"bucketSize\":60}}},\"timeout\":\"90s\"}"
}

particular position: "{"gte":1555061944736,"lte":1555065544736,"format":"epoch_millis"}}},{"match":{"host.name":"redacted"}}]}}"

But host.name has always the value of the worker running metricbeat with the state_* kubernetes metricsets which runs on only one worker.

I verified this by checking the metrics of the specific broker running the state_* metricsets in the infra ui and could see that the metrics are calculated and represented correctly

Shouldn't the query match against the kubernetes.node.name field to get correct values?

exekias · 2019-04-12T11:24:47Z

@simianhacker could you chime in here please?

simianhacker · 2019-07-08T19:50:02Z

For the CPU Capacity chart, we are querying the metricbeat-* index pattern with kubernetes.node.name as the filter set to the value of host.name (for the host you're looking at). The capacity value is a max of kubernetes.node.cpu.allocatable.cores and the used is an average of kubernetes.node.cpu.usage.nanocores.

We are doing the same thing for Memory Capacity Chart except the capacity is the max of kubernetes.node.memory.allocatable.bytes and used is the average of kubernetes.node.memory.usage.bytes.

@exekias Let me know if these are the wrong fields and we can file an issue to update to something more appropriate.

exekias · 2019-07-11T10:36:36Z

State metrics are reported from a single host, as they are global to the cluster. This means that host.name will contain the same host for all documents, while kubernetes.node.name will contain the node name, different in each document.

If you are filtering by kubernetes.node.name that should be ok. What about that match on host.name that @simioa found in the query?

simianhacker · 2019-07-22T14:46:32Z

@exekias @simioa The query was wrong, I filed a PR this weekend that fixes the query to use the correct field, kubernetes.node.name on the host page.

simianhacker · 2019-08-13T14:58:46Z

I'm going to close this since it was fixed by the PR mentioned above. If you are still having issue please re-open.

lukasolson added the Feature:Metrics UI Metrics UI feature label Feb 20, 2019

weltenwort added bug Fixes for quality problems that affect the customer experience Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services labels Feb 21, 2019

alvarolobato assigned skh Mar 4, 2019

alvarolobato added the [zube]: Inbox label Mar 4, 2019

skh added [zube]: In Progress and removed [zube]: Inbox labels Mar 11, 2019

skh removed their assignment Apr 15, 2019

jasonrhodes added [zube]: Ready and removed [zube]: In Progress labels Jun 3, 2019

jasonrhodes added [zube]: Investigate and removed [zube]: Ready labels Jul 1, 2019

sgrodzicki assigned simianhacker Jul 8, 2019

simianhacker mentioned this issue Jul 22, 2019

[Infra UI] Fix section mapping bug in node detail page #41641

Merged

simianhacker added [zube]: In Progress and removed [zube]: Investigate labels Jul 22, 2019

simianhacker added [zube]: Ready and removed [zube]: In Progress labels Aug 12, 2019

simianhacker closed this as completed Aug 13, 2019

zube bot added [zube]: Done and removed [zube]: Ready labels Aug 13, 2019

zube bot reopened this Aug 13, 2019

zube bot added [zube]: Ready and removed [zube]: Done labels Aug 13, 2019

simianhacker closed this as completed Aug 13, 2019

simianhacker added [zube]: Done and removed [zube]: Ready labels Aug 13, 2019

sgrodzicki removed the [zube]: Done label Sep 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Infra UI] Kubernetes kluster nodes and capacity incorrect #29497

[Infra UI] Kubernetes kluster nodes and capacity incorrect #29497

sergeydeg commented Jan 29, 2019

elasticmachine commented Feb 20, 2019

skh commented Mar 11, 2019

sergeydeg commented Mar 15, 2019

exekias commented Apr 8, 2019

simioa commented Apr 10, 2019

sergeydeg commented Apr 11, 2019

exekias commented Apr 12, 2019

simioa commented Apr 12, 2019 •

edited

Loading

exekias commented Apr 12, 2019

simianhacker commented Jul 8, 2019

exekias commented Jul 11, 2019

simianhacker commented Jul 22, 2019

simianhacker commented Aug 13, 2019

[Infra UI] Kubernetes kluster nodes and capacity incorrect #29497

[Infra UI] Kubernetes kluster nodes and capacity incorrect #29497

Comments

sergeydeg commented Jan 29, 2019

elasticmachine commented Feb 20, 2019

skh commented Mar 11, 2019

sergeydeg commented Mar 15, 2019

exekias commented Apr 8, 2019

simioa commented Apr 10, 2019

sergeydeg commented Apr 11, 2019

exekias commented Apr 12, 2019

simioa commented Apr 12, 2019 • edited Loading

exekias commented Apr 12, 2019

simianhacker commented Jul 8, 2019

exekias commented Jul 11, 2019

simianhacker commented Jul 22, 2019

simianhacker commented Aug 13, 2019

simioa commented Apr 12, 2019 •

edited

Loading