Skip to content

[9.4](backport #50674) [Metricbeat] Add elasticsearch/security_stats metricset#50883

Merged
pickypg merged 1 commit into
9.4from
mergify/bp/9.4/pr-50674
May 22, 2026
Merged

[9.4](backport #50674) [Metricbeat] Add elasticsearch/security_stats metricset#50883
pickypg merged 1 commit into
9.4from
mergify/bp/9.4/pr-50674

Conversation

@mergify
Copy link
Copy Markdown
Contributor

@mergify mergify Bot commented May 22, 2026

Type of change: Enhancement

Proposed commit message

[Metricbeat] Add elasticsearch/security_stats metricset

WHAT:

  • Adds a new security_stats metricset to the Elasticsearch module
    that scrapes the per-node GET /_security/stats endpoint
    introduced in Elasticsearch 9.2.
  • The first metric exposed is the Document Level Security (DLS)
    cache: entries, memory, hits, misses, evictions, hit/miss latency.
  • Each event is enriched with node.{name,roles,version} via a
    single filter-path-scoped /_nodes call per scrape, exposed as a
    reusable NodeEnrichment helper on the module's MetricSet so
    future per-node metricsets can adopt it.
  • Wires the metricset into xpackEnabledMetricSets so monitoring
    deployments using xpack.enabled: true route events to
    .monitoring-* indices.
  • Gates on ES 9.2.0 (in-memory version compare) and on
    xpack.Features.Security.Enabled (proactive GET /_xpack probe),
    mirroring the conditional-availability pattern in ccr and
    ml_job. Aggregates /_nodes enrichment failures via
    errors.Join rather than logging Debug, so they surface in
    self-monitoring (matches node_stats).

WHY:

  • Gives Stack Monitoring fleet-wide visibility into DLS cache
    health (cache thrash, oversized working sets, unhealthy hit/miss
    ratios) — currently invisible across the fleet.
  • Lays the foundation for additional security subsystems the
    endpoint may expose in future ES releases (additive sibling
    paths under roles.*).

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works.
  • I have added an entry in ./changelog/fragments using the changelog tool.

Disruptive User Impact

None for end users — the metricset is opt-in via configuration.

Shared test infrastructure (metricbeat/docker-compose.yml and
x-pack/metricbeat/docker-compose.yml) is unchanged from main: the
elasticsearch test service still runs with xpack.security.enabled=false,
so /_security/stats is not registered in CI and TestFetch/security_stats
is skipped unconditionally with a TODO pointing at a follow-up PR that
migrates the metricbeat compose stack to an x-pack-security-enabled posture
(file-realm users, Kibana credentials, test fixture auth). At that point
the skip becomes vacuous and the metricset is exercised against a real
/_security/stats response.

In production, the metricset checks GET /_xpack on each scrape — the same
proactive feature-availability pattern used by ccr and ml_job — and
short-circuits with a throttled debug log when features.security.enabled
is false, so clusters running without security get no events and no error
spam.

How to test this PR locally

  1. From metricbeat/, run the unit tests:

    go test -v -race -count=1 ./module/elasticsearch/security_stats/...
    

    Expect TestMapper, TestEmpty, TestSkipsNodesWithoutDLSStats,
    TestRejectsMalformedResponse, and TestEnrichmentMissingForNode
    to pass.

  2. Spin up ES 9.2+ via the module's docker-compose and confirm the
    integration suite still passes — TestFetch/security_stats skips
    under the unmodified compose stack, all other elasticsearch
    metricsets pass:

    cd metricbeat
    mage docker:composeUp
    go test -v -tags=integration -run 'TestFetch$' ./module/elasticsearch/...
    

    Expect every TestFetch/* PASS, with TestFetch/security_stats
    SKIP and the message
    /_security/stats requires xpack.security.enabled=true on the test cluster (deferred to a follow-up compose change).

    To exercise the metricset end-to-end against a real
    /_security/stats payload, point a metricbeat instance at any
    local ES 9.2+ cluster that has xpack.security.enabled: true and
    configure it with metricsets: [security_stats]. The next
    iteration of this work (the compose-migration follow-up) will
    fold this into the standard integration suite.

  3. Confirm version gating: point the metricset at an ES < 9.2 host
    and verify Fetch returns nil with a (throttled) debug log.

Related issues


This is an automatic backport of pull request #50674 done by [Mergify](https://mergify.com).

Adds a per-node security_stats metricset that scrapes the new
GET /_security/stats endpoint introduced in Elasticsearch 9.2.
The first metric exposed is the Document Level Security cache
(entries, memory, hits, misses, evictions, hit/miss latency),
giving Stack Monitoring fleet-wide visibility into DLS cache
health for spotting cache thrash, oversized working sets, and
unhealthy hit/miss ratios.

Each event is enriched with node name, roles, and stack version
via a single filter-path-scoped /_nodes call per scrape, shared
across all per-node events emitted in that scrape. This logic
lives on the module's MetricSet as the new NodeEnrichment helper
so future per-node metricsets can reuse it. node.version is also
declared at the module level alongside id, name, roles, master,
and mlockall.

The shared metricbeat/docker-compose.yml elasticsearch service
now runs with xpack.security.enabled=true plus an anonymous
superuser, since /_security/stats is only registered when
security is enabled. Anonymous superuser keeps the rest of the
elasticsearch integration test suite working without threading
credentials through every metricset's setup.

* docs: register security_stats metricset page in toc.yml

mage update regenerates per-metricset markdown but doesn't touch
the navigation toc.yml. Add the missing entry so docs-build can
locate the security_stats page in the Elasticsearch module section.

* docs: replace "e.g." with "for example" per Vale style guide

Elastic.Latinisms forbids Latin abbreviations in docs. Replace
the lone "e.g." in the new node.version field description and
regenerate the affected files.

* metricbeat/elasticsearch: clean up pre-existing lint issues

Two pre-existing lint findings in elasticsearch_integration_test.go
became blocking once this branch touched the file (golangci-lint
runs with --whole-files). Both fixes are mechanical:

- Replace math/rand with math/rand/v2 in randString and drop the
  redundant per-call seeded local Rand.
- Add the comma-ok form to the version.number type assertion in
  getElasticsearchVersion so errcheck (with check-type-assertions)
  is satisfied.

* metricbeat/elasticsearch: dedupe node.version field declaration

The new module-level node.version added for security_stats collided
with a pre-existing node.version in the node metricset's local
fields.yml, breaking `metricbeat export index-pattern` with
"field <elasticsearch.node.version> is duplicated".

Drop the metricset-local declaration in favor of the shared
module-level one, which carries a richer description and is the
right scope for a field emitted by multiple per-node metricsets.

* metricbeat: provision file-realm users for secured ES test stack

Enabling xpack.security on the shared elasticsearch service for
security_stats coverage broke Kibana boot: Kibana 9.x's interactive
setup plugin holds preboot when ES has security on without
ELASTICSEARCH_USERNAME, and the existing Kibana healthchecks
(curl -u beats:testing, curl -u myelastic:changeme) started actually
validating against ES instead of being silently ignored.

Provision the named users that the existing healthchecks expect via
elasticsearch-users useradd in the startup command, and give Kibana
real ES credentials. Anonymous=superuser is preserved so the
integration tests' credential-less HTTP probes keep working without
threading credentials through every metricset's setup.

* x-pack/metricbeat: give kibana credentials to secured ES

The previous commit enabled xpack.security on the shared Elasticsearch
service and gave the OSS metricbeat kibana service real credentials,
but x-pack/metricbeat hand-copies its kibana stanza (depends_on can't
be extended) so the env didn't propagate. With no
ELASTICSEARCH_USERNAME, Kibana entered interactive setup, the
Dockerfile healthcheck (curl -u myelastic:changeme /api/stats) never
reached green, and the proxy_dep busybox blocked all integration
tests from starting.

Mirror the env vars into the x-pack kibana stanza and note the
duplication contract in a comment so future secured-ES changes are
applied in both places.

* metricbeat/elasticsearch: gate security_stats on xpack feature flag

CI exposed that the previous PR commits enabled xpack.security on the
shared metricbeat docker-compose stack to exercise /_security/stats,
but that change rippled wider than fits in this PR: Kibana boot,
healthcheck users, OTel test framework default credentials, and the
Python `get_version` helper all assume an open ES. Revert both
metricbeat and x-pack/metricbeat docker-compose.yml to their
upstream/main shape and address the underlying problem in the
metricset itself.

`security_stats.checkAvailability` now mirrors the pattern used by
ccr and ml_job: a free in-memory version comparison short-circuits
old clusters first, then a proactive `GET /_xpack` probe checks
`features.security.enabled` so we can emit a specific operator-facing
log message and avoid hitting an endpoint we know would return 400.
A new `Security` field is added to the shared `elasticsearch.XPack`
struct to support the check.

The elasticsearch_integration_test.go suite skips security_stats
unconditionally for now, with a TODO pointing at a focused follow-up
PR that migrates the metricbeat compose stack to an x-pack-security-
enabled posture (file-realm users, Kibana credentials, test fixture
auth). At that point the skip becomes vacuous and the metricset is
exercised against a real /_security/stats response.

---------

Co-authored-by: Visha Angelova <91186315+vishaangelova@users.noreply.github.com>
(cherry picked from commit 7119b64)
@mergify mergify Bot requested review from a team as code owners May 22, 2026 18:00
@mergify mergify Bot added the backport label May 22, 2026
@mergify mergify Bot requested review from AndersonQ and VihasMakwana and removed request for a team May 22, 2026 18:00
@botelastic botelastic Bot added the needs_team Indicates that the issue/PR needs a Team:* label label May 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

🤖 GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)
  • /test : Run the Buildkite pipeline.

@botelastic botelastic Bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 22, 2026
@pickypg pickypg enabled auto-merge (squash) May 22, 2026 18:02
@github-actions
Copy link
Copy Markdown
Contributor

Vale Linting Results

Summary: 1 warning, 2 suggestions found

⚠️ Warnings (1)
File Line Rule Message
docs/reference/metricbeat/metricbeat-metricset-elasticsearch-security_stats.md 2 Elastic.MappedPages mapped_pages should only be added or updated in rare scenarios. Talk with your local technical writer before pushing changes to this key.
💡 Suggestions (2)
File Line Rule Message
docs/reference/metricbeat/metricbeat-metricset-elasticsearch-security_stats.md 13 Elastic.WordChoice Consider using 'select, press, visits' instead of 'hit', unless the term is in the UI.
docs/reference/metricbeat/metricbeat-metricset-elasticsearch-security_stats.md 17 Elastic.WordChoice Consider using 'deactivated, deselected, hidden, turned off, unavailable' instead of 'disabled', unless the term is in the UI.

The Vale linter checks documentation changes against the Elastic Docs style guide.

To use Vale locally or report issues, refer to Elastic style guide for Vale.

@pickypg pickypg merged commit 95f416c into 9.4 May 22, 2026
49 checks passed
@pickypg pickypg deleted the mergify/bp/9.4/pr-50674 branch May 22, 2026 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants