Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Infra UI] Kubernetes kluster nodes and capacity incorrect #29497

Closed
sergeydeg opened this issue Jan 29, 2019 · 13 comments
Closed

[Infra UI] Kubernetes kluster nodes and capacity incorrect #29497

sergeydeg opened this issue Jan 29, 2019 · 13 comments
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Metrics UI Metrics UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services

Comments

@sergeydeg
Copy link

Kibana version: 6.5.4

Elasticsearch version: 6.5.4

Server OS version: CentOS 7

Browser version: Google Chrome 71.0.3578.98

Browser OS version: Windows 10

Original install method (e.g. download page, yum, from source, etc.): YUM

Describe the bug:
Having on-premise Kubernetes cluster (v.1.12.3) instantiated with Kubeadm. Metricbeat 6.5.3 installed.

  1. Infra UI doesn't show Master nodes in "Hosts" screen, only worker nodes
  2. Looking in Kubernetes metrics on each node - CPU/Memory/POD capacity shown only for node where Metricbeat installed, other nodes shown Zero capacity. Disk capacity shown correctly on all nodes.

Steps to reproduce:

  1. Install Kubernetes Kubeadm cluster v1.12.3
  2. Install Metricbeat Kubernetes yaml v.6.5.3
  3. Open Infra UI
  4. Open Metrics of each Host

Expected behavior:
Master nodes must be visible also.
Kubernetes metrics must show correct numbers

Screenshots (if relevant):
hosts
metricbeatdata
rightnode
wrongnode

@elasticmachine
Copy link
Contributor

Pinging @elastic/infrastructure-ui

@lukasolson lukasolson added the Feature:Metrics UI Metrics UI feature label Feb 20, 2019
@weltenwort weltenwort added bug Fixes for quality problems that affect the customer experience Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services labels Feb 21, 2019
@skh
Copy link
Contributor

skh commented Mar 11, 2019

From https://www.elastic.co/guide/en/beats/metricbeat/6.5/running-on-kubernetes.html:

You deploy Metricbeat in two different ways at the same time:

  • As a DaemonSet to ensure that there’s a running instance on each node of the cluster. These instances are used to retrieve most metrics from the host, such as system metrics, Docker stats, and metrics from all the services running on top of Kubernetes.
  • As a single Metricbeat instance created using a Deployment. This instance is used to retrieve metrics that are unique for the whole cluster, such as Kubernetes events or kube-state-metrics.

Does your setup follow this documentation?

Can you add your configuration to this issue? Please take care to remove all sensitive information like passwords or internal IPs.

@sergeydeg
Copy link
Author

A week ago I upgraded all ELK components to version 6.6.1. A problem stays the same.
I deployed Metricbeat using ELK provided Kubernetes manifest with two changes.

  1. I use a Logstash output instead of Elasticsearch, and logstash send metricbeat events without transformation to elasticsearch;
    setup.template.enabled: false
    output.logstash:
      hosts: ['${LOGSTASH_HOST:elasticsearch}:${LOGSTASH_PORT:9200}']
  1. I changed Daemonset config to work with on-premise kubeadm created cluster.
      hosts: ["https://${HOSTNAME}:10250"]
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      ssl.verification_mode: "none"
      ssl.certificate_authorities:
        - /var/run/secrets/kubernetes.io/serviceaccount/ca.crt

@exekias
Copy link
Contributor

exekias commented Apr 8, 2019

Hi @sergeydeg,

Daemonsets don't run in master nodes by default, if you want to change that behavior you can add this to Metricbeat Pod spec in the Daemonset:

      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule

You have a full example here: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/#create-a-daemonset

Please report back, I'm considering adding a comment about this in the default manifests we ship

@simioa
Copy link

simioa commented Apr 10, 2019

We have the same issue. "Node Disk Capacity" seems to be calculated correctly, "Node CPU Capacity", "Node Memory Capacity" and "Node Pod Capacity" not.
We are also using version 6.6.1 on all components.

I also checked if the used fields for calculation are present and contain valid values

image

Present and contain valid values but capacity still remains at 0 in visualizations and 0% in summary overwiev

@sergeydeg
Copy link
Author

Hi @exekias,
metricbeat already contains all required data, also about master nodes.

I can't find source code for Infra UI, is it publicity available?

@exekias
Copy link
Contributor

exekias commented Apr 12, 2019

Node capacity is calculated based on information coming from the node kubelet. Are you seeing any error in Metricbeat logs?

Source for Infrastructure UI is located here: https://github.com/elastic/kibana/tree/master/x-pack/plugins/infra

@simioa
Copy link

simioa commented Apr 12, 2019

I may found out what the problem is.
I activated elasticsearch query logging in kibana and found out that the particular query against elasticsearch which gets capacities matches the worker hostname against the host.name field.

{
    "type": "log",
    "@timestamp": "2019-04-12T10:39:50Z",
    "tags": [
        "elasticsearch",
        "query",
        "debug",
        "data"
    ],
    "pid": 28073,
    "message": "200\nPOST /_msearch?rest_total_hits_as_int=true&ignore_throttled=true\n{\"index\":[\"*-metrics-*\",\"*-logs-*\"],\"ignoreUnavailable\":true}\n{\"size\":0,\"query\":{\"bool\":{\"must\":[{\"range\":{\"@timestamp\":{\"gte\":1555061944736,\"lte\":1555065544736,\"format\":\"epoch_millis\"}}},{\"match\":{\"host.name\":\"redacted\"}}]}},\"aggs\":{\"capacity\":{\"filter\":{\"match_all\":{}},\"aggs\":{\"timeseries\":{\"date_histogram\":{\"field\":\"@timestamp\",\"interval\":\"1m\",\"min_doc_count\":0,\"extended_bounds\":{\"min\":1555061944736,\"max\":1555065544736}},\"aggs\":{\"max-cpu-cap\":{\"max\":{\"field\":\"kubernetes.node.cpu.allocatable.cores\"}},\"calc-nanocores\":{\"bucket_script\":{\"buckets_path\":{\"cores\":\"max-cpu-cap\"},\"script\":{\"source\":\"params.cores * 1000000000\",\"lang\":\"painless\",\"params\":{\"_interval\":60000}},\"gap_policy\":\"skip\"}}}}},\"meta\":{\"timeField\":\"@timestamp\",\"intervalString\":\"1m\",\"bucketSize\":60}}},\"timeout\":\"90s\"}\n{\"index\":[\"*-metrics-*\",\"*-logs-*\"],\"ignoreUnavailable\":true}\n{\"size\":0,\"query\":{\"bool\":{\"must\":[{\"range\":{\"@timestamp\":{\"gte\":1555061944736,\"lte\":1555065544736,\"format\":\"epoch_millis\"}}},{\"match\":{\"host.name\":\"redacted\"}}]}},\"aggs\":{\"used\":{\"filter\":{\"match_all\":{}},\"aggs\":{\"timeseries\":{\"date_histogram\":{\"field\":\"@timestamp\",\"interval\":\"1m\",\"min_doc_count\":0,\"extended_bounds\":{\"min\":1555061944736,\"max\":1555065544736}},\"aggs\":{\"avg-cpu-usage\":{\"avg\":{\"field\":\"kubernetes.node.cpu.usage.nanocores\"}}}}},\"meta\":{\"timeField\":\"@timestamp\",\"intervalString\":\"1m\",\"bucketSize\":60}}},\"timeout\":\"90s\"}"
}

particular position: "{"gte":1555061944736,"lte":1555065544736,"format":"epoch_millis"}}},{"match":{"host.name":"redacted"}}]}}"

But host.name has always the value of the worker running metricbeat with the state_* kubernetes metricsets which runs on only one worker.

I verified this by checking the metrics of the specific broker running the state_* metricsets in the infra ui and could see that the metrics are calculated and represented correctly

Shouldn't the query match against the kubernetes.node.name field to get correct values?

@exekias
Copy link
Contributor

exekias commented Apr 12, 2019

@simianhacker could you chime in here please?

@simianhacker
Copy link
Member

For the CPU Capacity chart, we are querying the metricbeat-* index pattern with kubernetes.node.name as the filter set to the value of host.name (for the host you're looking at). The capacity value is a max of kubernetes.node.cpu.allocatable.cores and the used is an average of kubernetes.node.cpu.usage.nanocores.

We are doing the same thing for Memory Capacity Chart except the capacity is the max of kubernetes.node.memory.allocatable.bytes and used is the average of kubernetes.node.memory.usage.bytes.

@exekias Let me know if these are the wrong fields and we can file an issue to update to something more appropriate.

@exekias
Copy link
Contributor

exekias commented Jul 11, 2019

State metrics are reported from a single host, as they are global to the cluster. This means that host.name will contain the same host for all documents, while kubernetes.node.name will contain the node name, different in each document.

If you are filtering by kubernetes.node.name that should be ok. What about that match on host.name that @simioa found in the query?

@simianhacker
Copy link
Member

@exekias @simioa The query was wrong, I filed a PR this weekend that fixes the query to use the correct field, kubernetes.node.name on the host page.

@simianhacker
Copy link
Member

I'm going to close this since it was fixed by the PR mentioned above. If you are still having issue please re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Metrics UI Metrics UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services
Projects
None yet
Development

No branches or pull requests