Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[etcd] Use prometheus metricset for etcd v3 metrics #8438

Merged
merged 28 commits into from Nov 30, 2023

Conversation

gpop63
Copy link
Contributor

@gpop63 gpop63 commented Nov 8, 2023

Overview

Changed the data stream metricset to prometheus to ensure we can update and manage the data stream independently of the etcd beats module releases.

Other changes:

  • Added ECS fields
  • Added metric types and dimensions
  • Added a pipeline to rename fields
  • Added etcd labels and a labels fingerprint for TSDB purposes
TSDB test

Testing data stream metrics-etcd.metrics-default.
Index being used for the documents is .ds-metrics-etcd.metrics-default-2023.11.08-000001.
Index being used for the settings and mappings is .ds-metrics-etcd.metrics-default-2023.11.08-000001.

The time series fields for the TSDB index are: 
        - dimension (10 fields):
                - agent.id
                - cloud.account.id
                - cloud.availability_zone
                - cloud.instance.id
                - cloud.provider
                - cloud.region
                - container.id
                - etcd.labels.fingerprint
                - host.name
                - service.address
        - counter (7 fields):
                - etcd.memory.go_memstats_alloc.total.bytes
                - etcd.network.client_grpc_received.bytes
                - etcd.network.client_grpc_sent.bytes
                - etcd.server.grpc_handled.count
                - etcd.server.grpc_started.count
                - etcd.server.leader_changes.count
                - etcd.server.proposals_failed.count
        - gauge (5 fields):
                - etcd.disk.mvcc_db_total_size.bytes
                - etcd.memory.go_memstats_alloc.bytes
                - etcd.server.has_leader.count
                - etcd.server.proposals_committed.count
                - etcd.server.proposals_pending.count
        - routing_path (10 fields):
                - agent.id
                - cloud.account.id
                - cloud.availability_zone
                - cloud.instance.id
                - cloud.provider
                - cloud.region
                - container.id
                - etcd.labels.fingerprint
                - host.name
                - service.address

Index tsdb-index-enabled successfully created.

Copying documents from .ds-metrics-etcd.metrics-default-2023.11.08-000001 to tsdb-index-enabled...
All 23350 documents taken from index .ds-metrics-etcd.metrics-default-2023.11.08-000001 were successfully placed to index tsdb-index-enabled.

Checklist

  • I have reviewed tips for building integrations and this pull request is aligned with them.
  • I have verified that all data streams collect metrics or logs.
  • I have added an entry to my package's changelog.yml file.
  • I have verified that Kibana version constraints are current according to guidelines.

Author's Checklist

  • [ ]

How to test this PR locally

elastic-package stack up -d -v --version 8.12.0-SNAPSHOT

docker-compose.yml

version: '3'
services:
  etcd: 
    image: bitnami/etcd:3.5.1
    container_name: production_etcd
    environment:
      - ALLOW_NONE_AUTHENTICATION=yes
    ports:
      - 4001:4001
      - 2380:2380
      - 2379:2379
  elastic-agent:
    image: docker.elastic.co/beats/elastic-agent:8.12.0-SNAPSHOT
    container_name: elastic-agent
    restart: always
    user: root
    environment:
      - FLEET_ENROLLMENT_TOKEN=TOKEN
      - FLEET_ENROLL=1
      - FLEET_INSECURE=1
      - FLEET_URL=https://fleet-server:8220
networks:
  default:
    external: true
    name: elastic-package-stack_default

Related issues

Screenshots

scrnli_11_8_2023_11-08-43 PM

@elasticmachine
Copy link

elasticmachine commented Nov 8, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-11-30T12:45:58.443+0000

  • Duration: 17 min 56 sec

Test stats 🧪

Test Results
Failed 0
Passed 12
Skipped 0
Total 12

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

@elasticmachine
Copy link

elasticmachine commented Nov 8, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 100.0% (0/0) 💚
Files 100.0% (0/0) 💚 5.556
Classes 100.0% (0/0) 💚 5.556
Methods 60.0% (9/15) 👎 -29.701
Lines 100.0% (0/0) 💚 13.917
Conditionals 100.0% (0/0) 💚

@gpop63 gpop63 requested a review from agithomas November 8, 2023 22:16
@agithomas
Copy link
Contributor

Can you update the section Compatability highlighting the compatibility metrics ? It may be best to highlight that if the etcd version is 2, dataset named metrics is not supported.

@lalit-satapathy / @SubhrataK ,

We have three datasets based on V2 API - leader, self and store. Dataset named metrics datastream is based on V3 API.

If etcd version is above 3, by default it supports V3 API. But, It support V2 APIs (by setting ENV variables). So V2 & V3 APIs are supported in etcd version 3 and above. I have tested it with 3.5 version of etcd.

The issue here is, V2 APIs provide more metrics when compared to V3 APIs . So, by not supporting V2 API based metrics, customer may loose important insights. @gpop63 please correct me if am wrong.

Considering the above points, even when etcd version 2.x is not actively maintained, having V2 API continue to be supported in etcd version 3 and above, should we continue to support datasets leader, self and store in etcd GA version? I think, we should. Like to hear your inputs as well.

@agithomas
Copy link
Contributor

Based on the above comment, I think, we should do the TSDB enablement of all datasets .

Copy link
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added review comments

@gpop63
Copy link
Contributor Author

gpop63 commented Nov 9, 2023

@agithomas The use of etcd v2 is highly discouraged, I think it's not worth spending effort on it.

To my knowledge, v2 APIs can be enabled in etcd v3.5, but they are completely removed from v3.6 onwards.

Are you suggesting that this data stream should also capture v2 specific metrics from etcd v3 instances where the v2 APIs are enabled?

@agithomas
Copy link
Contributor

To my knowledge, v2 APIs can be enabled in etcd v3.5, but they are completely removed from v3.6 onwards.

I see that 3.6 is in draft state and 3.5 is the latest stable release. Can you share the link to source of info that says V2 APIs will be deprecated from 3.6 ? This will help the decision making process faster

@agithomas
Copy link
Contributor

Are you suggesting that this data stream should also capture v2 specific metrics from etcd v3 instances where the v2 APIs are enabled?

I am evaluating the benefit of retaining the existing datasets leader, self and store instead of deprecating or removing them as part of GA release.

@gpop63
Copy link
Contributor Author

gpop63 commented Nov 9, 2023

Can you share the link to source of info that says V2 APIs will be deprecated from 3.6 ? This will help the decision making process faster

etcd-io/etcd#12913

description: Collecting etcd metrics
title: Collect metrics from etcd v2 instances
description: Collecting metrics etcd v2 metrics
- type: prometheus/metrics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary as both v2 and v3 endpoint prefix are the same ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have 1 global hosts var instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that will serve the purpose as the prefix of both endpoints remain the same.

description: |
Memory allocated bytes as of MemStats Go
- name: go_memstats_alloc.bytes
type: long
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to put unit: byte and similar for field mappings?

@gpop63
Copy link
Contributor Author

gpop63 commented Nov 20, 2023

@agithomas from the new metrics added, which ones do you think should we add in the dashboard?

@gpop63
Copy link
Contributor Author

gpop63 commented Nov 20, 2023

/test

@agithomas
Copy link
Contributor

@ritalwar , can you please take a look at the PR and do a review? Kibana visualisation changes are suggested by me which is pending. You may look at other changes except this. Thanks in advance.

packages/etcd/changelog.yml Outdated Show resolved Hide resolved
packages/etcd/data_stream/metrics/sample_event.json Outdated Show resolved Hide resolved
@gpop63
Copy link
Contributor Author

gpop63 commented Nov 27, 2023

  • Added data stream filter
  • Improved lens titles, metric values format e.g. bytes.
  • Added Storage Metrics and Proposals Applied lenses

Before:
etcd_before

After:
etcd_after

gpop63 and others added 4 commits November 27, 2023 23:47
@agithomas
Copy link
Contributor

@gpop63 , can you have the panels organised in a way that

Panels 1,2,3 are in one row. 10 and 11 in another, 20 and 21 in another ? Please refer to the below mentioned screenshot.

Also, please update the package screenshot image as well.

image

@gpop63
Copy link
Contributor Author

gpop63 commented Nov 29, 2023

Updated:

image

Copy link
Contributor

@agithomas agithomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@agithomas
Copy link
Contributor

@ritalwar , can you please take a final look at the PR ?

Copy link
Contributor

@ritalwar ritalwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@gpop63 gpop63 merged commit 131e7e7 into elastic:main Nov 30, 2023
4 checks passed
@elasticmachine
Copy link

Package etcd - 0.7.0 containing this change is available at https://epr.elastic.co/search?package=etcd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants