Strip field from events for event search #149113

nkhristinin · 2023-01-18T12:30:05Z

Strip source, and keep only fields used for mappings for events search of IM rule.

Tests on cloud env:

Threat Indicators - 1.500.000 documents
Source - 1.000.000 documents.

No changes:

Strip fields:

…-ref HEAD~1..HEAD --fix'

nkhristinin · 2023-01-19T15:37:48Z

@elasticmachine merge upstream

dhurley14

Looks great!!! One minor typo fix in the test and another suggestion to add a comment explaining what / why we added the new property for the search after query. Thanks for taking the time to investigate the potential (and now proven!) benefits of ignoring the source of the body from the query response. LGTM!

dhurley14 · 2023-01-19T16:15:04Z

x-pack/plugins/security_solution/server/lib/detection_engine/signals/build_events_query.test.ts

@@ -555,6 +555,61 @@ describe('create_signals', () => {
    });
  });

+  test('it respects overriderBody params', () => {


Suggested change

test('it respects overriderBody params', () => {

test('it respects overrideBody params', () => {

small typo

dhurley14 · 2023-01-19T16:22:41Z

x-pack/plugins/security_solution/server/lib/detection_engine/signals/build_events_query.ts

@@ -27,6 +28,7 @@ interface BuildEventsSearchQuery {
  secondaryTimestamp: TimestampOverride | undefined;
  trackTotalHits?: boolean;
  additionalFilters?: estypes.QueryDslQueryContainer[];
+  overrideBody?: OverrideBodyQuery;


@nkhristinin could we add some js docs comments explaining what this property is used for? Something like this suggestion?

Suggested change

overrideBody?: OverrideBodyQuery;

/**

* overrideBody allows the search after to ignore the _source property of the result, thus reducing the size of the response and increasing the performance of the query.

*/

overrideBody?: OverrideBodyQuery;

Hey, thanks! Added comment to a single search function.

e40pud

LGTM

nkhristinin · 2023-01-23T13:13:16Z

@elasticmachine merge upstream

vitaliidm · 2023-01-24T12:32:26Z

x-pack/plugins/security_solution/server/lib/detection_engine/signals/types.ts

+
+export interface OverrideBodyQuery {
+  _source: estypes.SearchSourceConfig;
+  fields: estypes.Fields;


search query body has some other fields as well, as: sort, query etc.
Defining only 2 of these might be unnecessary restrictive, given its name suggests the whole body override.

On second note - would it make sense to define them as non required fields in this interface? It would make this parameter more flexible, if you would want to override _source field only, for example

Good suggestion!

I changed those fields to optional.

I was trying to change OverrideBodyQuery to estypes.SearchRequest['body'], but it wasn't working, because looks like
buildEventsSearchQuery return compatible but not the exact type.

Later in a single search after we cast this query to estypes.SearchRequest

So if somebody need to add new fields, it shouldn't be a problem to add it into OverrideBodyQuery

thanks for looking into it.
I think it's good to go.
With added comment it should not be an issue to add new fields if needed

nkhristinin · 2023-01-25T10:16:37Z

@elasticmachine merge upstream

…-ref HEAD~1..HEAD --fix'

nkhristinin · 2023-01-26T08:13:28Z

@elasticmachine merge upstream

nkhristinin · 2023-01-26T13:37:55Z

@elasticmachine merge upstream

kibana-ci · 2023-01-26T14:04:28Z

💛 Build succeeded, but was flaky

Failed CI Steps

Security Solution Tests #3

Test Failures

[job] [logs] Security Solution Tests #3 / Timelines Creates a timeline by clicking untitled timeline from bottom bar can be added notes

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @nkhristinin

## Terms query for Indicator Match rule TODO: [] need more unit/integrations tests, but ready for review The indicator match rule will use terms query when it is possible to search for matches for threat-first-search and for events-first-search. ## How the match query worked: Example for threat-first-search. If we have matching conditions like: `host.ip ==== indicator.host.ip` or (`source.name === indicator.source.name` AND `host.name === indicator.host.name`) It will generate queries like: ``` match: {host.ip: "1"}, or match: {host.ip: "2"} or match: {host.ip: "3"} or (match: {source.name: "1"} and match: {host.name: "1"}) or (match: {source.name: "2"} and match: {host.name: "2"}) or (match: {source.name: "3"} and match: {host.name: "3"}) ``` Each match will also have `_name` fields like: `${threatId}_${threatIndex}_${threatFields}_${sourceField}` So and because it's 1:1 relation between match and response, later at enrichment stage will be clear which threat matches which event. ## Terms query. We do fetch info about mapping for fields which use for match conditions of the IM rule. Terms query doesn't support all field types, this is why there is some allowed list which field types. Terms query not applied for AND conditions. For example: Fields types host.ip - `ip` user.name - `keyword` user.description - `text` indicator.host.ip_range - `ip_range` `host.ip === indicator.host.ip` or `host.ip_range === indicator.host.ip` or (`source.name === indicator.source.name` AND `host.name === indicator.host.name`) It will generate queries like: ``` terms: {host.ip: ["1","2","3"]}, or match: {host.ip_range: "1"} // terms query support range fields, but it will be difficult later to understand which threat match which event, because we can have more than 1 response for this condition or match: {host.ip_range: "2"} or (match: {source.name: "1"} and match: {host.name: "1"}) or (match: {source.name: "2"} and match: {host.name: "2"}) or (match: {source.name: "3"} and match: {host.name: "3"}) ``` For terms query, we don't know which response matches with events, this is why we do match it back in the code. ## Other changes Threat-first-search - will do one extra request to have all matched threats. For example: The threat index has 1.000.000 documents. IM rule gets the first batch of 9.000 threats and builds a query to the events index. It returns 100 events (max_signal = 100). Then it tries to enrich those 100 events with threat info. The problem is that the original implementation will enrich with the only threats from this 9.000 batch. And it will ignore other matches in 1.000.000 threats. This way we do one extra request in the end from potential alerts to threat index. # Tests performance In the best case, it can improve performance by around 3x times. [Base](#149113) Threat Indicators - 1.500.000 documents Source - 1.000.000 documents. 1 field for match condition <img width="557" alt="213484531-3ab68c61-c3f5-4e28-b2c4-c1e90a5b1775" src="https://user-images.githubusercontent.com/7609147/215526984-ff027ba1-2f64-49fe-8fe8-a23ff4eda4dc.png"> This PR: <img width="537" alt="Screenshot 2023-01-30 at 20 20 32" src="https://user-images.githubusercontent.com/7609147/215575128-730514ac-a186-4ab8-87fd-af2ea8f79cec.png"> --------- Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

## Terms query for Indicator Match rule TODO: [] need more unit/integrations tests, but ready for review The indicator match rule will use terms query when it is possible to search for matches for threat-first-search and for events-first-search. ## How the match query worked: Example for threat-first-search. If we have matching conditions like: `host.ip ==== indicator.host.ip` or (`source.name === indicator.source.name` AND `host.name === indicator.host.name`) It will generate queries like: ``` match: {host.ip: "1"}, or match: {host.ip: "2"} or match: {host.ip: "3"} or (match: {source.name: "1"} and match: {host.name: "1"}) or (match: {source.name: "2"} and match: {host.name: "2"}) or (match: {source.name: "3"} and match: {host.name: "3"}) ``` Each match will also have `_name` fields like: `${threatId}_${threatIndex}_${threatFields}_${sourceField}` So and because it's 1:1 relation between match and response, later at enrichment stage will be clear which threat matches which event. ## Terms query. We do fetch info about mapping for fields which use for match conditions of the IM rule. Terms query doesn't support all field types, this is why there is some allowed list which field types. Terms query not applied for AND conditions. For example: Fields types host.ip - `ip` user.name - `keyword` user.description - `text` indicator.host.ip_range - `ip_range` `host.ip === indicator.host.ip` or `host.ip_range === indicator.host.ip` or (`source.name === indicator.source.name` AND `host.name === indicator.host.name`) It will generate queries like: ``` terms: {host.ip: ["1","2","3"]}, or match: {host.ip_range: "1"} // terms query support range fields, but it will be difficult later to understand which threat match which event, because we can have more than 1 response for this condition or match: {host.ip_range: "2"} or (match: {source.name: "1"} and match: {host.name: "1"}) or (match: {source.name: "2"} and match: {host.name: "2"}) or (match: {source.name: "3"} and match: {host.name: "3"}) ``` For terms query, we don't know which response matches with events, this is why we do match it back in the code. ## Other changes Threat-first-search - will do one extra request to have all matched threats. For example: The threat index has 1.000.000 documents. IM rule gets the first batch of 9.000 threats and builds a query to the events index. It returns 100 events (max_signal = 100). Then it tries to enrich those 100 events with threat info. The problem is that the original implementation will enrich with the only threats from this 9.000 batch. And it will ignore other matches in 1.000.000 threats. This way we do one extra request in the end from potential alerts to threat index. # Tests performance In the best case, it can improve performance by around 3x times. [Base](elastic#149113) Threat Indicators - 1.500.000 documents Source - 1.000.000 documents. 1 field for match condition <img width="557" alt="213484531-3ab68c61-c3f5-4e28-b2c4-c1e90a5b1775" src="https://user-images.githubusercontent.com/7609147/215526984-ff027ba1-2f64-49fe-8fe8-a23ff4eda4dc.png"> This PR: <img width="537" alt="Screenshot 2023-01-30 at 20 20 32" src="https://user-images.githubusercontent.com/7609147/215575128-730514ac-a186-4ab8-87fd-af2ea8f79cec.png"> --------- Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Strip source, and keep only fields used for mappings for events search of IM rule. Tests on cloud env: <img width="1182" alt="Screenshot 2023-01-19 at 16 42 19" src="https://user-images.githubusercontent.com/7609147/213486861-1a0e5a20-b575-4ed6-ac9b-cf9fb8a8c14f.png"> Threat Indicators - 1.500.000 documents Source - 1.000.000 documents. No changes: <img width="564" alt="Screenshot 2023-01-19 at 16 30 19" src="https://user-images.githubusercontent.com/7609147/213484465-ad20d5b2-c7a7-42d3-9d92-478c5d046636.png"> Strip fields: <img width="557" alt="Screenshot 2023-01-19 at 16 30 27" src="https://user-images.githubusercontent.com/7609147/213484531-3ab68c61-c3f5-4e28-b2c4-c1e90a5b1775.png"> Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>

## Terms query for Indicator Match rule TODO: [] need more unit/integrations tests, but ready for review The indicator match rule will use terms query when it is possible to search for matches for threat-first-search and for events-first-search. ## How the match query worked: Example for threat-first-search. If we have matching conditions like: `host.ip ==== indicator.host.ip` or (`source.name === indicator.source.name` AND `host.name === indicator.host.name`) It will generate queries like: ``` match: {host.ip: "1"}, or match: {host.ip: "2"} or match: {host.ip: "3"} or (match: {source.name: "1"} and match: {host.name: "1"}) or (match: {source.name: "2"} and match: {host.name: "2"}) or (match: {source.name: "3"} and match: {host.name: "3"}) ``` Each match will also have `_name` fields like: `${threatId}_${threatIndex}_${threatFields}_${sourceField}` So and because it's 1:1 relation between match and response, later at enrichment stage will be clear which threat matches which event. ## Terms query. We do fetch info about mapping for fields which use for match conditions of the IM rule. Terms query doesn't support all field types, this is why there is some allowed list which field types. Terms query not applied for AND conditions. For example: Fields types host.ip - `ip` user.name - `keyword` user.description - `text` indicator.host.ip_range - `ip_range` `host.ip === indicator.host.ip` or `host.ip_range === indicator.host.ip` or (`source.name === indicator.source.name` AND `host.name === indicator.host.name`) It will generate queries like: ``` terms: {host.ip: ["1","2","3"]}, or match: {host.ip_range: "1"} // terms query support range fields, but it will be difficult later to understand which threat match which event, because we can have more than 1 response for this condition or match: {host.ip_range: "2"} or (match: {source.name: "1"} and match: {host.name: "1"}) or (match: {source.name: "2"} and match: {host.name: "2"}) or (match: {source.name: "3"} and match: {host.name: "3"}) ``` For terms query, we don't know which response matches with events, this is why we do match it back in the code. ## Other changes Threat-first-search - will do one extra request to have all matched threats. For example: The threat index has 1.000.000 documents. IM rule gets the first batch of 9.000 threats and builds a query to the events index. It returns 100 events (max_signal = 100). Then it tries to enrich those 100 events with threat info. The problem is that the original implementation will enrich with the only threats from this 9.000 batch. And it will ignore other matches in 1.000.000 threats. This way we do one extra request in the end from potential alerts to threat index. # Tests performance In the best case, it can improve performance by around 3x times. [Base](elastic#149113) Threat Indicators - 1.500.000 documents Source - 1.000.000 documents. 1 field for match condition <img width="557" alt="213484531-3ab68c61-c3f5-4e28-b2c4-c1e90a5b1775" src="https://user-images.githubusercontent.com/7609147/215526984-ff027ba1-2f64-49fe-8fe8-a23ff4eda4dc.png"> This PR: <img width="537" alt="Screenshot 2023-01-30 at 20 20 32" src="https://user-images.githubusercontent.com/7609147/215575128-730514ac-a186-4ab8-87fd-af2ea8f79cec.png"> --------- Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Strip field for event search

f8f7ccd

nkhristinin added Team:Security Solution Platform Security Solution Platform Team ci:cloud-deploy Create or update a Cloud deployment release_note:fix labels Jan 18, 2023

nkhristinin self-assigned this Jan 18, 2023

[CI] Auto-commit changed files from 'node scripts/precommit_hook.js -…

01732ca

…-ref HEAD~1..HEAD --fix'

nkhristinin added release_note:enhancement and removed release_note:fix labels Jan 18, 2023

nkhristinin added 2 commits January 19, 2023 11:00

add tests

a2714aa

Fix tests

dc5ae64

Merge branch 'main' into im-events-fields

6f9ccc8

nkhristinin marked this pull request as ready for review January 19, 2023 15:38

nkhristinin requested a review from a team as a code owner January 19, 2023 15:38

dhurley14 approved these changes Jan 19, 2023

View reviewed changes

marshallmain requested a review from vitaliidm January 19, 2023 18:20

e40pud approved these changes Jan 23, 2023

View reviewed changes

Merge branch 'main' into im-events-fields

f1465ab

vitaliidm reviewed Jan 24, 2023

View reviewed changes

kibanamachine and others added 3 commits January 25, 2023 05:16

Merge branch 'main' into im-events-fields

71b840d

Fix tests, comments and types

27bdd4e

[CI] Auto-commit changed files from 'node scripts/precommit_hook.js -…

f476fd4

…-ref HEAD~1..HEAD --fix'

nkhristinin requested a review from vitaliidm January 25, 2023 16:44

Merge branch 'main' into im-events-fields

07e933a

vitaliidm approved these changes Jan 26, 2023

View reviewed changes

Merge branch 'main' into im-events-fields

3102896

nkhristinin merged commit 9e96440 into elastic:main Jan 27, 2023

kibanamachine added v8.7.0 backport:skip This commit does not require backporting labels Jan 27, 2023

nkhristinin mentioned this pull request Jan 30, 2023

Terms query for Indicator Match rule #144511

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strip field from events for event search #149113

Strip field from events for event search #149113

nkhristinin commented Jan 18, 2023 •

edited

nkhristinin commented Jan 19, 2023

dhurley14 left a comment

dhurley14 Jan 19, 2023

dhurley14 Jan 19, 2023

nkhristinin Jan 25, 2023

e40pud left a comment

nkhristinin commented Jan 23, 2023

vitaliidm Jan 24, 2023

nkhristinin Jan 25, 2023

vitaliidm Jan 26, 2023

nkhristinin commented Jan 25, 2023

nkhristinin commented Jan 26, 2023

nkhristinin commented Jan 26, 2023

kibana-ci commented Jan 26, 2023 •

edited

	test('it respects overriderBody params', () => {
	test('it respects overrideBody params', () => {

Strip field from events for event search #149113

Strip field from events for event search #149113

Conversation

nkhristinin commented Jan 18, 2023 • edited

nkhristinin commented Jan 19, 2023

dhurley14 left a comment

Choose a reason for hiding this comment

dhurley14 Jan 19, 2023

Choose a reason for hiding this comment

dhurley14 Jan 19, 2023

Choose a reason for hiding this comment

nkhristinin Jan 25, 2023

Choose a reason for hiding this comment

e40pud left a comment

Choose a reason for hiding this comment

nkhristinin commented Jan 23, 2023

vitaliidm Jan 24, 2023

Choose a reason for hiding this comment

nkhristinin Jan 25, 2023

Choose a reason for hiding this comment

vitaliidm Jan 26, 2023

Choose a reason for hiding this comment

nkhristinin commented Jan 25, 2023

nkhristinin commented Jan 26, 2023

nkhristinin commented Jan 26, 2023

kibana-ci commented Jan 26, 2023 • edited

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

Metrics [docs]

History

nkhristinin commented Jan 18, 2023 •

edited

kibana-ci commented Jan 26, 2023 •

edited