[Security Solution] Adds event log telemetry specific for security solution rules #128216

FrankHassanabad · 2022-03-21T22:50:51Z

Summary

Adds event log telemetry specific for security solution rules.

This adds a new section to the telemetry underneath detectionMetrics called detection_rule_status.

<-- click this text to see a full JSON sample document

{
  "detection_rule_status": {
    "all_rules": {
      "eql": {
        "failed": 0,
        "top_failed": {},
        "partial_failure": 0,
        "top_partial_failure": {},
        "succeeded": 0,
        "index_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "search_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_count": 0
      },
      "indicator": {
        "failed": 0,
        "top_failed": {},
        "partial_failure": 0,
        "top_partial_failure": {},
        "succeeded": 0,
        "index_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "search_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_count": 0
      },
      "mlRule": {
        "failed": 0,
        "top_failed": {},
        "partial_failure": 0,
        "top_partial_failure": {},
        "succeeded": 0,
        "index_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "search_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_count": 0
      },
      "query": {
        "failed": 4,
        "top_failed": {
          "1": {
            "message": "an hour were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances name * id rule id execution id space ID default",
            "count": 4
          },
          "2": {
            "message": "an hour were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances name Endpoint Security id rule id execution id space ID default",
            "count": 2
          },
          "3": {
            "message": "an hour were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances name Telnet Port Activity id rule id execution id space ID default",
            "count": 2
          },
          "4": {
            "message": "hours were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances name * id rule id execution id space ID default",
            "count": 2
          },
          "5": {
            "message": "hours were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances name Endpoint Security id rule id execution id space ID default",
            "count": 1
          },
          "6": {
            "message": "hours were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances name Telnet Port Activity id rule id execution id space ID default",
            "count": 1
          }
        },
        "partial_failure": 2,
        "top_partial_failure": {
          "1": {
            "message": "This rule is attempting to query data from Elasticsearch indices listed in the Index pattern section of the rule definition however no index matching blah frank was found This warning will continue to appear until matching index is created or this rule is disabled",
            "count": 189
          },
          "2": {
            "message": "This rule is attempting to query data from Elasticsearch indices listed in the Index pattern section of the rule definition however no index matching logs-endpoint.alerts was found This warning will continue to appear until matching index is created or this rule is disabled If you have recently enrolled agents enabled with Endpoint Security through Fleet this warning should stop once an alert is sent from an agent",
            "count": 187
          }
        },
        "succeeded": 2,
        "index_duration": {
          "max": 228568,
          "avg": 2292.8852459016393,
          "min": 0
        },
        "search_duration": {
          "max": 324,
          "avg": 21.661202185792348,
          "min": 1
        },
        "gap_duration": {
          "max": 5651,
          "avg": 3929.4166666666665,
          "min": 2811
        },
        "gap_count": 10
      },
      "savedQuery": {
        "failed": 0,
        "top_failed": {},
        "partial_failure": 0,
        "top_partial_failure": {},
        "succeeded": 0,
        "index_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "search_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_count": 0
      },
      "threshold": {
        "failed": 0,
        "top_failed": {},
        "partial_failure": 0,
        "top_partial_failure": {},
        "succeeded": 0,
        "index_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "search_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_count": 0
      },
      "total": {
        "failed": 4,
        "partial_failure": 2,
        "succeeded": 2
      }
    },
    "elastic_rules": {
      "eql": {
        "failed": 0,
        "top_failed": {},
        "partial_failure": 0,
        "top_partial_failure": {},
        "succeeded": 0,
        "index_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "search_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_count": 0
      },
      "indicator": {
        "failed": 0,
        "top_failed": {},
        "partial_failure": 0,
        "top_partial_failure": {},
        "succeeded": 0,
        "index_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "search_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_count": 0
      },
      "mlRule": {
        "failed": 0,
        "top_failed": {},
        "partial_failure": 0,
        "top_partial_failure": {},
        "succeeded": 0,
        "index_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "search_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_count": 0
      },
      "query": {
        "failed": 2,
        "top_failed": {
          "1": {
            "message": "an hour were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances name Endpoint Security id rule id execution id space ID default",
            "count": 2
          },
          "2": {
            "message": "an hour were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances name Telnet Port Activity id rule id execution id space ID default",
            "count": 2
          },
          "3": {
            "message": "hours were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances name Endpoint Security id rule id execution id space ID default",
            "count": 1
          },
          "4": {
            "message": "hours were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances name Telnet Port Activity id rule id execution id space ID default",
            "count": 1
          }
        },
        "partial_failure": 1,
        "top_partial_failure": {
          "1": {
            "message": "This rule is attempting to query data from Elasticsearch indices listed in the Index pattern section of the rule definition however no index matching logs-endpoint.alerts was found This warning will continue to appear until matching index is created or this rule is disabled If you have recently enrolled agents enabled with Endpoint Security through Fleet this warning should stop once an alert is sent from an agent",
            "count": 187
          }
        },
        "succeeded": 1,
        "index_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "search_duration": {
          "max": 278,
          "avg": 8.165745856353592,
          "min": 1
        },
        "gap_duration": {
          "max": 5474,
          "avg": 3831,
          "min": 2811
        },
        "gap_count": 6
      },
      "savedQuery": {
        "failed": 0,
        "top_failed": {},
        "partial_failure": 0,
        "top_partial_failure": {},
        "succeeded": 0,
        "index_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "search_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_count": 0
      },
      "threshold": {
        "failed": 0,
        "top_failed": {},
        "partial_failure": 0,
        "top_partial_failure": {},
        "succeeded": 0,
        "index_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "search_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_count": 0
      },
      "total": {
        "failed": 2,
        "partial_failure": 1,
        "succeeded": 1
      }
    },
    "custom_rules": {
      "eql": {
        "failed": 0,
        "top_failed": {},
        "partial_failure": 0,
        "top_partial_failure": {},
        "succeeded": 0,
        "index_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "search_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_count": 0
      },
      "indicator": {
        "failed": 0,
        "top_failed": {},
        "partial_failure": 0,
        "top_partial_failure": {},
        "succeeded": 0,
        "index_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "search_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_count": 0
      },
      "mlRule": {
        "failed": 0,
        "top_failed": {},
        "partial_failure": 0,
        "top_partial_failure": {},
        "succeeded": 0,
        "index_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "search_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_count": 0
      },
      "query": {
        "failed": 2,
        "top_failed": {
          "1": {
            "message": "an hour were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances name * id rule id execution id space ID default",
            "count": 4
          },
          "2": {
            "message": "hours were not queried between this rule execution and the last execution so signals may have been missed Consider increasing your look behind time or adding more Kibana instances name * id rule id execution id space ID default",
            "count": 2
          }
        },
        "partial_failure": 1,
        "top_partial_failure": {
          "1": {
            "message": "This rule is attempting to query data from Elasticsearch indices listed in the Index pattern section of the rule definition however no index matching blah frank was found This warning will continue to appear until matching index is created or this rule is disabled",
            "count": 189
          }
        },
        "succeeded": 1,
        "index_duration": {
          "max": 228568,
          "avg": 4536.1945945945945,
          "min": 0
        },
        "search_duration": {
          "max": 324,
          "avg": 34.86486486486486,
          "min": 8
        },
        "gap_duration": {
          "max": 5651,
          "avg": 4027.8333333333335,
          "min": 3051
        },
        "gap_count": 4
      },
      "savedQuery": {
        "failed": 0,
        "top_failed": {},
        "partial_failure": 0,
        "top_partial_failure": {},
        "succeeded": 0,
        "index_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "search_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_count": 0
      },
      "threshold": {
        "failed": 0,
        "top_failed": {},
        "partial_failure": 0,
        "top_partial_failure": {},
        "succeeded": 0,
        "index_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "search_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_duration": {
          "max": 0,
          "avg": 0,
          "min": 0
        },
        "gap_count": 0
      },
      "total": {
        "failed": 2,
        "partial_failure": 1,
        "succeeded": 1
      }
    }
  }
}

manual testing

Add some alerts and malfunction them into partial errors by having them not have their indexes and malfunction them by shutting down Kibana for a while and then starting it back up to have Kibana miss some alerts. Then go to Advanced Settings -> scroll to the bottom and click cluster data. Should see data like this after scrolling down for a while:

Checklist

Delete any items that are not applicable to this PR.

Unit or functional tests were updated or added to match the most common scenarios

elasticmachine · 2022-03-22T18:06:32Z

Pinging @elastic/security-solution (Team: SecuritySolution)

x-pack/plugins/security_solution/server/usage/detections/rules/types.ts

…turn type for it

pgayvallet

That's a 6k lines addition to the x-pack/plugins/telemetry_collection_xpack/schema/xpack_plugins.json file.

cc @afharo and @Bamieh

pjhampton · 2022-03-23T10:21:37Z

x-pack/plugins/security_solution/server/usage/queries/utils/get_search_for_all_rules.ts

+            {
+              range: {
+                '@timestamp': {
+                  gte: 'now-24h',


No action is needed here, but I'm starting to wonder if we need to start finding the previous usage collector run timestamp and use that. I have noticed unusual collection cadences within some of the security clusters.

@pjhampton after #121656 clusters should report only once every 24h. Just FYI :)

Had the same question above 😅 https://github.com/elastic/kibana/pull/128216/files#r837923596

If consistency can't be guaranteed (runs same time each day, introducing drift), would be nice to have a service that pushes the right day window datetimes down for these queries to consume.

x-pack/plugins/security_solution/server/usage/queries/utils/get_search_for_custom_rules.ts

pjhampton

🌔 🚀 ✨ LGTM ✨ 🚀 🌔

Code looks solid as always, but this amount of new data in the usage collector makes me a little nervous for a new release. I'm starting to wonder if we can add this as a task instead. Regardless, as long as Kbn core is good with it so am I!

afharo

It looks good to me. I don't think this increase will make us hit the current payload limit.

However, I'd like to challenge the decision in the schema. Is there a particular reason why we want to report it as 1, 2, 3, ... vs. an array?

IMO, using an array may be more useful because an error message may be top 1 for some clusters and top 10 for some others. However, in the big picture, we would be more interested in tackling the ones that occur in most clusters, wouldn't we?

It will also simplify the schema by removing a lot of fields 😇

What do you think?

afharo · 2022-03-24T16:53:04Z

x-pack/plugins/security_solution/server/usage/collector.ts

+                top_failed: {
+                  '1': {
+                    message: {
+                      type: 'keyword',
+                      _meta: { description: 'Top 1 failed rule message' },
+                    },
+                  },


Q: Would it make sense to declare these as arrays? This way we can query detection_rule_status.all_rules.top_failed: "error message" and it will show up regardless if it's the top 1 or top N.

Suggested change

top_failed: {

'1': {

message: {

type: 'keyword',

_meta: { description: 'Top 1 failed rule message' },

},

},

top_failed: {

type: 'array',

items: {

message: {

type: 'keyword',

_meta: { description: 'One of top 10 failed rule messages' },

},

},

@afharo

I will update the schema as I have a mistake but what I left out was the count ... It really should be this:

"1": { "properties": { "message": { "type": "keyword", "_meta": { "description": "Top 1 partial failed rule message" } }, "count": { "type": "long", "_meta": { "description": "Top 1 partial failed rule message" } } } }

I haven't done a visualization with arrays that contain objects before. Is that well supported if the array contains objects? I know that nested fields are still lacking support and is being tracked here:
#1084

Are nested types supported in this schema? I didn't see an example of it.

If I changed this to an array with an object inside the array I don't know how easy that is going to be to visualize compared to just numbering them and stopping at 10.

Are nested types supported in this schema? I didn't see an example of it.

The schema is not a direct ES mapping (even thought it looks pretty similar 😅). You can specify the array of objects like this:

Suggested change

top_failed: {

'1': {

message: {

type: 'keyword',

_meta: { description: 'Top 1 failed rule message' },

},

},

top_failed: {

type: 'array',

items: {

message: {

type: 'keyword',

_meta: { description: 'One of top 10 failed rule messages' },

},

count: {

type: 'long',

_meta: { description: 'Number of times the message occurred' }

}

},

Then on the receiving end, we can work out different ways of storing it: for instance, I'd say that for easier consumption, we could loop through the array and index one document per object in the array with some common fields for correlation.

This way you get a "simple" event-like index to query and aggregate without depending on tree structures or nested types.

As a separate comment, I was wondering if this type of data makes sense to send it as events with our Event-Based Telemetry client coming up on 8.3 (meta issue here).

@FrankHassanabad @pjhampton @peluja1012 wdyt? Is it worth the wait for one extra minor?

Ok, I will change this to being an array with objects and update the code and tests with the array. Sounds good to me 👍 .

@afharo Thank you for reviewing this PR and for your suggestion of taking advantage of the new Telemetry client. While I think it might make sense to use this new client in the future, we have been waiting for the Telemetry data this PR provides for a couple of releases. It's important that we start collecting it as soon as possible so we can get ahead of customer issues related to rule health and performance. I suggest we go with the approach presented in this PR in 8.2 and then migrate over to the Event-Based system in a future release after it's ready for consumption.

@peluja1012 sounds like a good compromise to me. All clear! Thank you!

afharo · 2022-03-24T16:58:21Z

x-pack/plugins/security_solution/server/usage/queries/utils/get_search_for_all_rules.ts

+            {
+              range: {
+                '@timestamp': {
+                  gte: 'now-24h',


@pjhampton after #121656 clusters should report only once every 24h. Just FYI :)

afharo

LGTM! Thank you @FrankHassanabad for the latest changes to simplify the schema

spong · 2022-03-29T16:23:22Z

x-pack/plugins/security_solution/server/usage/collector.ts

+          detection_rule_status: {
+            all_rules: {
+              eql: {
+                failed: {
+                  type: 'long',
+                  _meta: { description: 'The number of failed rules' },
+                },


Real bummer this can't be broken up into smaller files and destructured all nicely so it's more easily consumed. Sounds like there's some telemetry tooling that is preventing this? We should open an issue to fix that tooling so these schemas can be better digested.

Do you have more details on this @afharo?

I'm also curious if the existing tooling does anything to 'lock' the file. @FrankHassanabad mentioned in another comment below about how it's very important to not change telemetry keys, and so am wondering if there is any tooling in place for this? Once broken out into separate files, we could always do a server-side hook to lock them in read-only mode perhaps? Same goes with any downstream generated files from these schemas.

spong · 2022-03-29T16:27:11Z

x-pack/plugins/security_solution/server/usage/collector.ts

+                  type: 'long',
+                  _meta: { description: 'The number of partial failure rules' },
+                },
+                top_partial_failure: {


nit: maybe make plural? Is this the first top partial failure, or a list (seems it's an array) of top partials. If the latter would be more description to be plural for folks building dashboards.

Sounds good. Will make this plural. Also will change top_failed

So you will have the changes:

top_partial_failure -> top_partial_failures top_failed -> top_failures failed -> failures

spong · 2022-03-29T16:48:39Z

Wasn't able to verify this data on any of these three servers:

Using this query:

GET cluster_telemetry*/_search
{
  "size": 1,
  "query": {
    "bool": {
      "must": [
        {
          "exists": {
            "field": "stack_stats-kibana-plugins.security_solution.detectionMetrics.detection_rule_status"
          }
        }
      ]
    }
  }
}

Would be nice if there was a 'flush' feature so I could run this locally and not have to wait 24hrs. Does this exist? Also not really clear where this data would go if I ran this locally anyway... 🤔 Trying to use the Rosetta Stone doc to understand this all, but very hard to grok with how complex telemetry is.

x-pack/plugins/security_solution/server/usage/collector.ts

spong · 2022-03-29T17:00:13Z

x-pack/plugins/security_solution/server/usage/collector.ts

+                index_duration: {
+                  max: {
+                    type: 'float',
+                    _meta: { description: 'The max duration' },


ux nit: Should these _meta:description's include the metric name in it too? Not sure how it shows up in the UI/dashboards, but this wouldn't be helpful if you're hovering on a pie chart or something and it says The max duration instead of something like The max indexing duration.

It doesn't show up. I think these are not used anywhere other than here in the code. This doesn't convert to the EDN files or to mappings as far as I know there as well.

Ahh, well if it doesn't make it to the cluster schema, I question its value here, no? 🤔

I would talk to someone from Kibana core about the usefulness and usage of it. I'm not for sure where it's used or if it is used within tooling somewhere else.

x-pack/plugins/security_solution/server/usage/collector.ts

spong · 2022-03-29T17:30:28Z

x-pack/plugins/telemetry_collection_xpack/schema/xpack_plugins.json

+                "detection_rule_status": {
+                  "properties": {
+                    "all_rules": {
+                      "properties": {


Oh my gosh, another one of these!? 😅 I hope this one is generated at least...?

It is generated with:

node scripts/telemetry_check.js --fix

Ahh, I see there's a README.md in the root folder (telemetry_collection_xpack/schema/README.md). That appears to be out of date though (says 2 files when there's 3?), and doesn't include that script you just mentioned to generate it...it just says

The X-Pack related schema for the content that will be nested in stack_stats.kibana.plugins.
It is automatically generated by @kbn/telemetry-tools based on the schema property provided by all the registered Usage Collectors via the usageCollection.makeUsageCollector API.

Which I guess does link off to another README from there (src/plugins/usage_collection/README.mdx) and it appears to be mentioned in that one, so all good. Just coming in at a different angle so a little confusing here, all good now! 👍

That is Kibana core's files.

spong · 2022-03-29T21:01:14Z

x-pack/plugins/security_solution/server/usage/detections/rules/get_metrics.mocks.ts

+        maxTotalSearchDuration: {
+          value: null,
+        },
+        gapCount: {
+          value: 0,
+        },


Note: diff here is that cardinality agg is going to return 0 (if missing), but the max/min/avg aggs will return null as providing missing here will mess with the actual value.

kibana-ci · 2022-03-29T21:01:41Z

💚 Build Succeeded

Metrics [docs]

Unknown metric groups

ESLint disabled in files

id	before	after	diff
`apm`	15	14	-1
`osquery`	5	4	-1
`securitySolution`	69	68	-1
`uptime`	7	6	-1
total			-4

ESLint disabled line counts

id	before	after	diff
`apm`	88	85	-3
`enterpriseSearch`	9	7	-2
`fleet`	47	46	-1
`osquery`	122	119	-3
`uptime`	49	43	-6
total			-15

References to deprecated APIs

id	before	after	diff
`canvas`	70	64	-6
`dashboard`	78	72	-6
`data`	475	465	-10
`dataEnhanced`	55	49	-6
`discover`	26	20	-6
`fleet`	20	19	-1
`lens`	18	14	-4
`management`	2	1	-1
`maps`	456	330	-126
`monitoring`	40	28	-12
`upgradeAssistant`	12	7	-5
`visDefaultEditor`	205	155	-50
`visTypeVega`	4	3	-1
`visualizations`	17	13	-4
total			-238

Total ESLint disabled count

id	before	after	diff
`apm`	103	99	-4
`enterpriseSearch`	9	7	-2
`fleet`	55	54	-1
`osquery`	127	123	-4
`securitySolution`	509	508	-1
`uptime`	56	49	-7
total			-19

History

💚 Build #34282 succeeded 0b66a79
💔 Build #34150 failed 3629087
💚 Build #32885 succeeded 19f6b4f
💚 Build #32866 succeeded 9d2ab74
💚 Build #32801 succeeded aaca574
💔 Build #32697 failed 6eeb2ac

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @FrankHassanabad

spong · 2022-03-29T21:05:21Z

x-pack/plugins/security_solution/server/usage/collector.ts

+                },
+              },
+            },
+            elastic_rules: {


Another note, there is no 'space-awareness' in this telemetry, so if a user installs the prebuilt rules in 10 spaces, this telemetry will say there's some 6k elastic rules installed (6xx prebuilt rules shipped x 10).

Another note to future self: rules are queried directly to determine elastic vs custom rules, and so are capped to 10k across all spaces:

kibana/x-pack/plugins/security_solution/server/usage/queries/get_detection_rules.ts

Lines 49 to 54 in 7c850dd

const query: SavedObjectsCreatePointInTimeFinderOptions = {

type: 'alert',

perPage: maxPerPage,

namespaces: ['*'],

filter,

};

So if a user has >10k rules between spaces things could start getting wonky -- we should test this and verify functionality and consider an implementation that takes space-awareness into account. Also be good to test these queries where the terms is gonna be a full 10k as well and see what that does to perf... 😮

spong · 2022-03-29T21:08:20Z

x-pack/plugins/security_solution/server/usage/detections/rules/get_metrics.ts

@@ -108,6 +122,7 @@ export const getRuleMetrics = async ({
    return {
      detection_rule_detail: elasticRuleObjects,
      detection_rule_usage: rulesUsage,
+      detection_rule_status: eventLogMetricsTypeStatus,
    };
  } catch (e) {
    // ignore failure, usage will be zeroed. We use debug mode to not unnecessarily worry users as this will not effect them.


How do you know that a failure occurred in the actual telemetry? On failures we're returning empty values?

spong · 2022-03-29T21:13:59Z

x-pack/plugins/security_solution/server/usage/queries/get_event_log_by_type_and_status.ts

+ * transformed. If it malfunctions or times out then it will not malfunction other
+ * parts of telemetry.
+ * NOTE: This takes in "ruleResults" to filter against elastic rules and custom rules.
+ * If the event log recorded information about which rules were elastic vs. custom this


Yeah, would be nice if we logged more metadata about the rule itself so there's not so much overhead in processing like this.

x-pack/plugins/security_solution/server/usage/queries/utils/get_search_for_all_rules.test.ts

spong

A bit to grok at first (especially being new to telemetry), but zoomed with @FrankHassanabad and we pair reviewed the whooole thing. Thanks for implementing some of the nits from my feedback! And note, I was not able to verify any of this telemetry made it to any of the staging servers, but I'm sure we'll be performing some additional validation testing once we get the first builds with this code in, and can always tweak before the final BC. @FrankHassanabad did verify independently during development that they made it, so we should be all good here.

All that said, thanks for taking the time to review this all with me @FrankHassanabad, and LGTM from the Security Solution side! Appreciate all the tests, and how modular and re-usable you structured this code (++ all the tests too!). 👍 🙌 🚀

FrankHassanabad added 2 commits March 21, 2022 16:27

Metric event log telemetry and unit tests

4ddf259

Merge branch 'main' into add-event-log

bbdd383

FrankHassanabad self-assigned this Mar 22, 2022

FrankHassanabad added 4 commits March 21, 2022 18:41

Added schema

8bc33d8

Fix e2e tests

87aeaf1

Merge branch 'main' into add-event-log

0d5f3ef

Merge branch 'main' into add-event-log

6eeb2ac

FrankHassanabad changed the title ~~Add event log~~ [Security Solution] Adds event log telemetry and metrics Mar 22, 2022

FrankHassanabad added v8.2.0 backport:skip This commit does not require backporting release_note:enhancement labels Mar 22, 2022

FrankHassanabad changed the title ~~[Security Solution] Adds event log telemetry and metrics~~ [Security Solution] Adds event log telemetry specific for security solution rules Mar 22, 2022

FrankHassanabad added Team: SecuritySolution Security Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc. telemetry Issues related to the addition of telemetry to a feature labels Mar 22, 2022

Merge branch 'main' into add-event-log

84e07ac

FrankHassanabad marked this pull request as ready for review March 22, 2022 18:00

FrankHassanabad requested review from a team as code owners March 22, 2022 18:00

FrankHassanabad requested a review from pjhampton March 22, 2022 19:18

FrankHassanabad commented Mar 22, 2022

View reviewed changes

x-pack/plugins/security_solution/server/usage/detections/rules/types.ts Outdated Show resolved Hide resolved

FrankHassanabad added 7 commits March 22, 2022 13:39

Removed one dead interface and added some more comments

aaca574

Added unit test and more mock data

9d2ab74

Adds unit test for count

5198932

Adds unit test for get_event_log_agg_by_rule_type

c52d457

Adds unit test for get_event_log_agg_by_rule_type_metrics and adds re…

3cd7075

…turn type for it

Adds unit test for get_event_log_agg_by_rule_types

70f1194

Adds unit test for get_event_log_agg_by_rule_types_metrics

aa140d6

pgayvallet reviewed Mar 23, 2022

View reviewed changes

pjhampton reviewed Mar 23, 2022

View reviewed changes

x-pack/plugins/security_solution/server/usage/queries/utils/get_search_for_custom_rules.ts Show resolved Hide resolved

pjhampton approved these changes Mar 23, 2022

View reviewed changes

afharo reviewed Mar 24, 2022

View reviewed changes

FrankHassanabad added 3 commits March 28, 2022 09:55

Merge branch 'main' into add-event-log

3629087

Converted the structure to use arrays

5c57859

Merge branch 'main' into add-event-log

0b66a79

afharo approved these changes Mar 29, 2022

View reviewed changes

spong reviewed Mar 29, 2022

View reviewed changes

x-pack/plugins/security_solution/server/usage/collector.ts Outdated Show resolved Hide resolved

spong reviewed Mar 29, 2022

View reviewed changes

x-pack/plugins/security_solution/server/usage/collector.ts Outdated Show resolved Hide resolved

spong reviewed Mar 29, 2022

View reviewed changes

x-pack/plugins/security_solution/server/usage/collector.ts Outdated Show resolved Hide resolved

spong reviewed Mar 29, 2022

View reviewed changes

x-pack/plugins/security_solution/server/usage/collector.ts Show resolved Hide resolved

spong reviewed Mar 29, 2022

View reviewed changes

FrankHassanabad added 3 commits March 29, 2022 13:09

Updated from PR review to change names and add plurals

e60ae20

JSON schema updated from running tooling

23a23e4

Merge branch 'main' into add-event-log

10035dc

spong reviewed Mar 29, 2022

View reviewed changes

x-pack/plugins/security_solution/server/usage/queries/utils/get_search_for_all_rules.test.ts Show resolved Hide resolved

spong approved these changes Mar 29, 2022

View reviewed changes

FrankHassanabad merged commit de894d1 into elastic:main Mar 29, 2022

FrankHassanabad deleted the add-event-log branch March 29, 2022 22:32

	const query: SavedObjectsCreatePointInTimeFinderOptions = {
	type: 'alert',
	perPage: maxPerPage,
	namespaces: ['*'],
	filter,
	};

[Security Solution] Adds event log telemetry specific for security solution rules #128216

[Security Solution] Adds event log telemetry specific for security solution rules #128216

Conversation

FrankHassanabad commented Mar 21, 2022 • edited

Summary

Checklist

elasticmachine commented Mar 22, 2022

pgayvallet left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pjhampton left a comment

Choose a reason for hiding this comment

afharo left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FrankHassanabad Mar 28, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afharo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FrankHassanabad Mar 29, 2022 • edited

Choose a reason for hiding this comment

spong commented Mar 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spong Mar 29, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spong Mar 29, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kibana-ci commented Mar 29, 2022

💚 Build Succeeded

Metrics [docs]

ESLint disabled in files

ESLint disabled line counts

References to deprecated APIs

Total ESLint disabled count

History

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spong left a comment

Choose a reason for hiding this comment

FrankHassanabad commented Mar 21, 2022 •

edited

pgayvallet left a comment •

edited

afharo left a comment •

edited

FrankHassanabad Mar 28, 2022 •

edited

FrankHassanabad Mar 29, 2022 •

edited

spong Mar 29, 2022 •

edited

spong Mar 29, 2022 •

edited