Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Security Solution] Detection engine metrics logging improvements #166186

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dplumlee
Copy link
Contributor

@dplumlee dplumlee commented Sep 11, 2023

Summary

WIP

Adds telemetry and metrics logs for the Detection engine to help debug and visualize internal rule execution performance

In this PR:

  • Adds "documents of interest" debug logging with trackTotalHits ES param passed programmatically per rule execution to visualize documents the rule is parsing over to create candidate alerts. Also helps track if there are any late arriving documents by comparing the initial rule run to later runs with the same params. Implemented on all rule types. Example: Recent terms search totalHits: 40
  • Adds "unique terms count" logging for New Terms rule type to help visualize amount of unique terms rule is parsing over in both its Stage 1 and Stage 2 search after steps
  • Creates valueListFilteringDuration metric to log the aggregated amount of time the rule spends filtering against our defined "large" value lists
  • Modifies detection engine metrics to track rule metrics in a unified object so that multiple different phases of rule runs can be tracked rather than just total durations. This will help keep more detailed track of rule runs for rule types that have multiple different search phases (new terms, threat match, etc)

Example

[{
  phaseName: 'alert_enrichment',
  duration: '12345'
},
{
  phaseName: 'composite_agg',
  duration: '67890'
}]

Checklist

Delete any items that are not applicable to this PR.

For maintainers

@dplumlee dplumlee added technical debt Improvement of the software architecture and operational architecture release_note:skip Skip the PR/issue when compiling release notes Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. Team:Detection Rule Management Security Detection Rule Management Team labels Sep 11, 2023
@dplumlee dplumlee self-assigned this Sep 11, 2023
@kibana-ci
Copy link
Collaborator

kibana-ci commented Sep 20, 2023

💔 Build Failed

Failed CI Steps

Test Failures

  • [job] [logs] Jest Integration Tests #1 / checking migration metadata changes on all registered SO types detecting migration related changes in registered types
  • [job] [logs] Jest Integration Tests #1 / cli invalid config support exits with statusCode 64 and logs an error when config is invalid
  • [job] [logs] Jest Integration Tests #1 / cli serverless project type Kibana does not crash when running project type es
  • [job] [logs] Jest Integration Tests #1 / cli serverless project type Kibana does not crash when running project type oblt
  • [job] [logs] Jest Integration Tests #1 / cli serverless project type Kibana does not crash when running project type security
  • [job] [logs] Jest Integration Tests #6 / incompatible_cluster_routing_allocation retries the INIT action with a descriptive message when cluster settings are incompatible
  • [job] [logs] Jest Integration Tests #6 / migrating from 7.3.0-xpack which used v1 migrations copies all the document of the previous index to the new one
  • [job] [logs] Jest Integration Tests #6 / migrating from 7.3.0-xpack which used v1 migrations creates the new index and the correct aliases
  • [job] [logs] Jest Integration Tests #6 / migrating from 7.3.0-xpack which used v1 migrations migrates the documents to the highest version
  • [job] [logs] Jest Integration Tests #6 / migration v2 - read batch size does not reduce the read batchSize in half if no batches exceeded maxReadBatchSizeBytes
  • [job] [logs] Jest Integration Tests #6 / migration v2 - read batch size reduces the read batchSize in half if a batch exceeds maxReadBatchSizeBytes
  • [job] [logs] Jest Integration Tests #1 / migration v2 fails with a descriptive message when maxBatchSizeBytes exceeds ES http.max_content_length
  • [job] [logs] Jest Integration Tests #6 / migration v2 migrates saved objects normally with multiple ES nodes
  • [job] [logs] Jest Integration Tests #1 / migration v2 with corrupt saved object documents collects corrupt saved object documents across batches
  • [job] [logs] Jest Integration Tests #6 / SO default search fields make sure management types have the correct mappings for default search fields
  • [job] [logs] Jest Integration Tests #6 / SO type registrations does not remove types from registrations without updating excludeOnUpgradeQuery
  • [job] [logs] Jest Tests #7 / utils createSearchAfterReturnType createSearchAfterReturnType can override all values
  • [job] [logs] Jest Tests #7 / utils createSearchAfterReturnType createSearchAfterReturnType can override select values
  • [job] [logs] Jest Tests #7 / utils createSearchAfterReturnType createSearchAfterReturnType will return full object when nothing is passed
  • [job] [logs] Jest Tests #7 / utils createSearchAfterReturnTypeFromResponse empty results will return successful type
  • [job] [logs] Jest Tests #7 / utils createSearchAfterReturnTypeFromResponse multiple results will return successful type with expected success
  • [job] [logs] Jest Tests #7 / utils mergeReturns it merges a default "prev" and "next" correctly
  • [job] [logs] Jest Tests #7 / utils mergeReturns it merges search where values from "next" and "prev" are computed together
  • [job] [logs] Jest Integration Tests #6 / when splitting .kibana into multiple indices and one clone fails after resolving the problem and retrying the migration completes successfully

Metrics [docs]

‼️ ERROR: metrics for b3ec948 were not reported

History

  • 💔 Build #158193 failed eb746a61c320713e86173d72b81924ac37db92cf
  • 💔 Build #158056 failed e917e7ada676cceea147d378d7ed13d2f2959175

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @dplumlee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release_note:skip Skip the PR/issue when compiling release notes Team:Detection Rule Management Security Detection Rule Management Team Team:Detections and Resp Security Detection Response Team Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. technical debt Improvement of the software architecture and operational architecture
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants