Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make alert params searchable #50213

Open
mikecote opened this issue Nov 11, 2019 · 54 comments
Open

Make alert params searchable #50213

mikecote opened this issue Nov 11, 2019 · 54 comments
Labels
discuss enhancement New value added to drive a business result Feature:Alerting impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. R&D Research and development ticket (not meant to produce code, but to make a decision) research Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@mikecote
Copy link
Contributor

mikecote commented Nov 11, 2019

From #50222:

Use Case:
As a rule user, I need to sort, filter, and sometimes aggregate on rule types. For example, I need to sort my rule types on severity, or I need to filter them by severity.

Technical solution:
The current alerting/actions does not allow mapping down to the level of alerting/actions parameters. Therefore we cannot use the saved objects API of kql mixed with "order by". If that were changed and we were allowed mapping abilities to the alerting/actions parameters that would solve this. Either that or a plain API (even if slower like a table scan) to abstract us away so we can natively to the actions/alerting objects would make it to where we don't have write our own hand rolled solutions.

The main focus of this issue is making alert params (more than action config) searchable, sortable and filterable if there's extra work necessary to support this in actions, we can create a follow up issue.

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-stack-services (Team:Stack Services)

@mikecote mikecote moved this from Backlog to Long Term in Make it Action Dec 6, 2019
@bmcconaghy bmcconaghy added Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) and removed Team:Stack Services labels Dec 12, 2019
@mikecote mikecote moved this from Long Term to To-Do 7.8 (Ordered by priority) in Make it Action Mar 25, 2020
@gmmorris gmmorris moved this from To-Do 7.8 (Ordered by priority) to In Progress in Make it Action May 6, 2020
@gmmorris gmmorris self-assigned this May 6, 2020
@mikecote
Copy link
Contributor Author

mikecote commented May 6, 2020

Relates to "Make rules sortable, filterable, and aggregatable" in #50222.

@m-adams
Copy link

m-adams commented May 7, 2020

This would be good. I've been discussing with users how to be able to report on their rules e.g. in a dashboard, canvas etc which would need to aggregate based on alert parameters. You can just about do something with scripted fields at the moment but it's not ideal.

@pmuellr
Copy link
Member

pmuellr commented May 8, 2020

Also see this issue: Request for alerting internal tags structure. The gist is to add a new "tags" property to alerts, but would be separate from the existing tags property, which is used by customers. Let's call this new tags property internalTags. It's meant to be used by alert implementors to add their own tags, for whatever purpose they want, without conflicting with the tags customers use or having the customers see them in the UI.

Would having this be good enough to solve the requirements? I think it depends on how/what you want to search on.

I don't really want to go down the path of adding mappings for alertType-specific data, seems like the migration problem would be ... messy. And not sure what the other options are.

@FrankHassanabad
Copy link
Contributor

For some other efforts outside of alerting where we use tags both external and internal within a saved object structure we decided on using a leading underscore to designate that the internal tags should remain internal.

tags
_tags

fwiw. That Saved Object has nothing to do with alerting but figured I should mention it.

@gmmorris
Copy link
Contributor

gmmorris commented May 11, 2020

I've only just began looking into this but I want to summarise what we currently know as it doesn't look like this will be easy to support and I want to make sure we have a good understanding of the context.

Current State

Before we talk about the need, lets just describe what we currently have.
Backing each alert we create a Saved Object with the following shape:

interface RawAlert extends SavedObjectAttributes {
  enabled: boolean;
  name: string;
  tags: string[];
  alertTypeId: string;
  consumer: string;
  schedule: SavedObjectAttributes;
  actions: RawAlertAction[];
  params: SavedObjectAttributes;
  scheduledTaskId?: string;
  createdBy: string | null;
  updatedBy: string | null;
  createdAt: string;
  apiKey: string | null;
  apiKeyOwner: string | null;
  throttle: string | null;
  muteAll: boolean;
  mutedInstanceIds: string[];
}

The fields we want to make searchable as part of this issue are params and action which both implement the SavedObjectAttributes type which means these are string based key-value records which can be deeply nested.

In the face of it querying by these internals objects is straight forward, but as this Saved Object needs to support all types of alerts and actions, we have a challenge as each alert type can have different shapes to these fields. To support these multiple shapes we tell Elasticsearch not to create a mapping for these objects - which means that querying by their shape isn't actually possible.

The Need

That said, we'd like to be able to query for specific Alerts based values that are stored in these fields, so that we can:

  1. Sort by them
  2. Filter by them
  3. Aggregate over alerts using these fields in some manner

Possible Solutions

So, the reason we can't currently support querying against these fields is clear, but there are a few approaches we could take to make these requirements possible- none of which are straight forward, so some discussion is needed to understand the cost-value ratio.

One assumption that I'm making for all of these is that we want to rely on ES for this and not do any of these operations in memory as that would make it inefficient and hard to support pagination.

Create a Saved Object type for each AlertType

This approach would mean that whenever a new AlertType is created we generate a brand new type of SavedObject with its own mapping.

There are a few challenges with this approach:

  1. We would have to handle an unknown number of SavedObjects types, all of whom need to be administered via the Alerting Framework.
  2. We will likely have to make changes to the SavedObjects Client APIs as they currently assume that you're only ever operating at the level of a single SO type, not multiple types.

There's also a clear limitation to this approach:
This will still not allow us to query based on the action field as these would still be different shapes across the different SavedObject types and would have top remain with a disabled mapping.

@mikecote has already told me that there is a danger here of a mapping explosion which I need to investigate further.

Enable dynamic mapping + create a deep object

Instead of splitting the SavedObject types between the AlertTypes, we can take an approach similar to what SavedObjects itself does.
While we would have one single SavedObject type of alert (as we do now) we can store each alert type's params and each action type's config under a corresponding key based on their unique ID and keep the mapping set to dynamic so that we can query against the internal shapes.

For example, this would mean that given an alert of type "example.always-firing" with an action of type ".index" you would store the data like so:

{
          "alert" : {
            "consumer" : "alerting",
            "alertTypeId" : "example.always-firing",
            "params" : { 
                "example.always-firing" : { 
                  "numOfInstances" : 5
               },
            },
            "actions" : [
              {
                "actionTypeId" : ".index",
                "params" : {
                    ".index" : {
                      "documents" : [
                        {
                          "val" : "{{alertInstanceId}}"
                        }
                      ]
                    },
                },
                "actionRef" : "action_0",
                "group" : "default"
              }
            ],
            /// ... other fields
          },
          "type" : "alert",
          "references" : [
            {
              "name" : "action_0",
              "id" : "9aab14cd-87e1-43ca-98f3-1caa9723ce98",
              "type" : "action"
            }
          ],
          "updated_at" : "2020-05-11T14:48:15.017Z"
        }

Instead of what we currently do which is this:

{
          "alert" : {
            "consumer" : "alerting",
            "alertTypeId" : "example.always-firing",
            "params" : { 
              "numOfInstances" : 5
            },
            "actions" : [
              {
                "actionTypeId" : ".index",
                "params" : {
                  "documents" : [
                    {
                      "val" : "{{alertInstanceId}}"
                    }
                  ]
                },
                /// ... other fields
              }
            ],
            /// ... other fields
          },
          "type" : "alert",
          "references" : [
            {
              "name" : "action_0",
              "id" : "9aab14cd-87e1-43ca-98f3-1caa9723ce98",
              "type" : "action"
            }
          ],
          "updated_at" : "2020-05-11T14:48:15.017Z"
        }

You may note how we have the addition of the "example.always-firing" and ".index" keys as appropriate under the params fields which would allow us to enable Dynamic Mapping in these fields as their shape will no longer slash across types.

But this too has challenges:

  1. Dynamic mapping uses the first type it encounters, so supporting fields that might change type between instances will be very tricky and might require AlertTypes to preemptively provide us with mappings of their own at setup time which we would then need to manually merge.
  2. It's not yet clear how migrations might work in such a model and needs further investigation.
  3. Each AlertType will likely have to provide us with some mappings constraints of their own though, as they might have some fields in their params that they do not want to create mappings for as they might change often (such as the document field in the .index action which will be different on every single instance!
  4. Here too there's a danger of a mappings explosion that needs to be investigated.

Static mapping + flattened object

Another option is to standardise the shape of params across all AlertTypes such that each AlertType will specify the exact shape and types of their params and we'll merge these shapes together into one static shape which will be used to define the mappings of these params.

This is the simplest solution in terms of the mapping, but introduces a whole set of challenges in the framework:

  1. What do we do what two AlertTypes use the same name for a field?
  2. How do we handle migrations within a specific AlertType? And what about across all types?
  3. What happens if the shape is wrong? Do we validate ourselves? Rely on plugins?
  4. If this means dynamically merging mappings and types on the "way into the framework" and then exposing it as "portions" of this type on the "way out of the framework" back to the solution - are we introducing a lot of complexity that will be hard to maintain in the future?

Challenges across the board

All of the above options will require changes in the SavedObjectsClient as you can only sort/filter by a rood field at the moment, and supporting "deep" fields inside of "objects" isn't currently
supported and doing so will require some further research.

Next steps

As you can see, none of these options are straight forward and there isn't a clear winner.

This all requires a lot more investigation, and playing around with the code locally I think that the second option (Enable dynamic mapping + create a deep object) produces the most maintainable option, but it has potential issues that still need investigation which is what I'll likely be looking into next.

If anyone has thoughts or concerns on these options (or perhaps a 4th option we can investigate) I'm all ears. :)

@pmuellr
Copy link
Member

pmuellr commented May 13, 2020

I'm worried about any solution that introduces new mappings for alertType-specific data, due to all the challenges pointed out ^^^.

It's not completely clear that we need an ES solution here; from the SIEM issue #50222:

Either that or a plain API (even if slower like a table scan) to abstract us away so we can natively to the actions/alerting objects would make it to where we don't have write our own hand rolled solutions.

I read "table scan" as "do as much of a query as you can with ES, then do the remaining filtering/mapping/aggs on the results in JS". That will work if the number of alerts is "reasonable", which I don't know if it is.

I think we should also find out if having an "internal tags" via #58417 would be good enough for now. This would presumably be a parallel of the current tags structure, but not editable (or probably viewable) in the UI, only programmatically. Not nearly the same as ES fields, but may be useful enough for common needs.

There was also mention of scripted fields in the discussion above, but I'm not sure how we might use them.

@gmmorris
Copy link
Contributor

@FrankHassanabad Would being able to query/filter/sort by a set of internal tags be enough for you?
It looks like making the params/config actually searchable would a significant piece of work that we'd need to be cautious before picking up.

@m-adams
Copy link

m-adams commented May 13, 2020 via email

@gmmorris
Copy link
Contributor

gmmorris commented May 13, 2020

FYI, scripted fields seems to work and maybe at least better than doing it in js.

Sorry but could you be more specific?
Given the limitations we have on making these fields searchable, how would you go about using scripted fields to achieve what SIEM are looking for?

@gmmorris gmmorris moved this from In Progress to Discuss in Make it Action May 13, 2020
@FrankHassanabad
Copy link
Contributor

table scan

I lifted that term from relation DB's where the definition is:

(also known as a sequential scan) is a scan made on a database where each row of the table is read in a sequential (serial) order and the columns encountered are checked for the validity of a condition. Full table scans are usually the slowest method of scanning a table due to the heavy amount of I/O reads required from the disk which consists of multiple seeks as well as costly disk to memory transfers.[1]

[1] Ref: https://en.wikipedia.org/wiki/Full_table_scan

Don't know if there is a ES adapted term but basically anytime I have to read all of the Saved Objects into memory either through buffering or streaming I count that as a table scan and since it's done through network calls from ES -> Kibana it is adds to the expense.

"do as much of a query as you can with ES, then do the remaining filtering/mapping/aggs on the results in JS"

Yeah that's our ultimate goal. We are hoping to avoid any operation that causes us to iterate over all the alerts in memory from ES -> Kibana due to:

  • Ugly boiler plate code we have to write
  • Increases odds of race conditions
  • More network bandwidth issues and more network error conditions to manage
  • performance issues
  • memory usage increase within Kibana

Would being able to query/filter/sort by a set of internal tags be enough for you?

Can we get aggs as well with arrays in Elastic Search? The use case is that sometimes we need to do unique counts of items and display them. If this PR is merged we might have it?
#64002

If we were able to query/filter/sort/aggs I think that covers all of our use cases to avoiding table-like scans.

As an aside:

If you still are going to allow mapping I would suggest combining two parts of your approaches above.

This part:

While we would have one single SavedObject type of alert (as we do now) we can store each alert type's params and each action type's config under a corresponding key based on their unique ID and keep the mapping set to dynamic so that we can query against the internal shapes.

and this part:

Another option is to standardise the shape of params across all AlertTypes such that each AlertType will specify the exact shape and types of their params and we'll merge these shapes together into one static shape which will be used to define the mappings of these params.

So that it becomes this:

While we would have one single SavedObject type of alert (as we do now) we can store each alert type's params and each action type's config under a corresponding key based on their unique ID. Each AlertType will specify the exact shape and types of their params and we'll merge these shapes together into one static shape which will be used to define the mappings of these params.

Then you have exact mappings and avoid mapping explosions and conflicts and the Saved Objects migration system takes over when we update our mappings. Then we are responsible as a team for migrating our mappings altogether.

For what is worth also ... On the community forums and community slack, users are opening up their .kibana saved objects by granting permissions to it and then creating dashboards our rules which includes the params:
https://elasticstack.slack.com/archives/CNRTGB9A4/p1589221644238900

I know that might not be exactly what we may want this quickly but it is something we have to keep in mind that users are granting each other privileges to their saved objects index so they can write their own dashboards against static SIEM rules.

Since things are "beta" I think they would be ok with updates but might become frustrated if their dashboards are no longer possible. On the flip side, if this brings more features they couldn't have before such as sorting/filtering/querying/aggs then they will be very delighted even if we have to advise them on how to update there existing dashboards.

This might be an unsupported or discouraged thing that users are opening up saved object indexes? However I want to point it out as it's already happened.

@m-adams
Copy link

m-adams commented May 13, 2020

FYI, scripted fields seems to work and maybe at least better than doing it in js.

Sorry but could you be more specific?
Given the limitations we have on making these fields searchable, how would you go about using scripted fields to achieve what SIEM are looking for?

If, for example, you create a filtered alias to just pick the rules from .kibana then add an index pattern in kibana with this scripted field using painless

def mitre = [];
for (value in params['_source']['alert']['params']['threat']) {mitre.add(value['tactic']['id'])} return mitre

You can then aggregate on the tactic ID for reporting.
Matthew

@kobelb
Copy link
Contributor

kobelb commented May 19, 2020

Are sorting and aggregations absolutely necessary? If not, elastic/elasticsearch#33003 is a potential solution worth evaluating.

@mikecote
Copy link
Contributor Author

mikecote commented May 25, 2020

New possible solution mentioned in option 4 of #67290. That issue explores options to solve Elasticsearch merging objects on update when the mapping has enabled: false.

This may not be a good approach but worth mentioning. Probably doesn't solve sorting or aggregations which would make elastic/elasticsearch#33003 more a solution worth evaluating as @kobelb mentioned.

Change object structure to array

I noticed the alert's actions[x].params attribute doesn't have this problem. There is a possibility that storing alert params and action configs into an array structure would solve this problem and possibly also make them searchable.

The params value structure could be something like the following:

[
 {
   "name": "index",
   "value": "foo"
 },
 {
   "name": "timeField",
   "value": "@timestamp"
 }
]

This would allow a consistent mapping of name and value where value can be enabled: false but won't be impacted by this issue (due to being within an array). This would also require a saved object migration.

The step further that would be required to make the values searchable would be do split the values into different mapped fields. This would require orchestration between field name and value field.

Screen Shot 2020-05-25 at 1 47 15 PM

To query this data, it would look something like this:
Screen Shot 2020-05-25 at 2 00 19 PM

@mikecote
Copy link
Contributor Author

mikecote commented Feb 1, 2021

This approach in Mike's POC ^^^ seems workable. Is making this an opt-in for alert types something we want to leave in long-term? Or will we make it mandatory at some point. Thinking it would be nice to not have two paths, at least long-term.

I think if we go opt-in for now, making it mandatory long-term would be great. Similar to validation as it is opt-in right now. Maybe 8.0 if we go this path?

@mikecote
Copy link
Contributor Author

mikecote commented Feb 3, 2021

I've also worked on a POC that splits the alert saved object type by alert type. This is the same as the "Create a Saved Object type for each AlertType" approach that @gmmorris mentioned. This approach allows having full parameter mappings for each alert type.

Here are the good things I discovered during the POC:

  • Migrations are defined by alert type by design. We can create wrappers if we wish to migrate something high level across all alerts.
  • KQL filtering works naturally by doing something like rule_type_index_threshold.attributes.params.index:foo
  • We can use the saved objects searchFields and search from the find API to do the searching.
  • We may plan to change the saved object type from alert to rule already, so splitting by alert type could be done simultaneously.

Here are the bad things I discovered during the POC:

  • Sorting is pretty limited. Every field in alert or params has to be sorted via script if you're searching across different alert types (due to null values). This script also needs to know what type it's dealing with, which I think is limited to string or number.
  • We would need to know the alertTypeId in every API call. Knowing some new APIs are coming (rename alert -> rule), we have the opportunity to require this alertTypeId to be part of the URL as well.
  • The mappings explode quite easily by repeating the common fields
  • All our find calls that do filtering (KQL) now need to repeat the filter for every alert type

I want to explore creating a separate saved object for the params next and see how that goes. There are still other options outside of these, but it is good to know each's pros/cons to make sure we're making the right investment for the long term.

@mikecote
Copy link
Contributor Author

mikecote commented Feb 3, 2021

I want to explore creating a separate saved object for the params next and see how that goes.

I took a quick look at this approach. It would rely on using the join field type, and has_child queries. Since the requirements are to use the same field across both saved object types, it doesn't seem like a good approach since saved object attributes are stored in different fields in Elasticsearch by design. There may be alternatives in this approach, but they don't seem worth investigating to me at this time.

@mikecote
Copy link
Contributor Author

mikecote commented Feb 3, 2021

After looking at all the options that are on the table, I want to propose going with the "deep object with static mappings" approach mentioned here.

Usage

This approach checks the boxes for filtering, searching, and sorting by doing something like the following on my POC:

  • ✅ Filtering:

GET https://elastic:changeme@localhost:5601/pjf/api/alerts/_find?filter=alert.attributes.params.__index-threshold.aggType:count

  • ✅ Searching:

GET https://elastic:changeme@localhost:5601/oix/api/alerts/_find?search=dog&search_fields=["params.__index-threshold.description"]

  • ✅ Sorting:

GET https://elastic:changeme@localhost:5601/oix/api/alerts/_find?sort_field=params.__index-threshold.timeWindowSize&sort_order=asc

Addressing the concerns

I have raised a few down points, but most of them are not an issue if I think long-term. Below is how I see each being resolved:

  1. This approach would require pre-save / post-read logic in the alerts client to nest/unnest the parameters.

This would be a good time to split the data access logic out of the alerts client and create a separate file. This is where the pre-save and post-read logic can live, and it would also keep the alerts client smaller as more code moves over.

  1. Using the alert type id as the sub-property name, it can't have . in them. We would have to convert . to something else (ex: siem.signals to siem_signals). Also, add protection to prevent overlapping names.

However we decide to map these parameters, we will have to distinguish them by alert type id. Hence, I think long term, we may want to prevent using . in the ids but in the meantime, come up with a common function that can convert an alertTypeId to something consumable by Elasticsearch.

  1. We may have to bump up the field mappings limit in the .kibana index. I spoke with @rudolf, and it seems ok to bump the limit for now.

This approach is not as bad as the saved object type per alert type as only the parameters are added to the mappings and only the alert types that opt-in for now. In the future, and maybe as we rename alert saved object type to rule, we can move them into their own .kibana_rules index.

  1. A migration would be required to opt-in an alert type to have their parameters searchable.

This problem is short-term until we would make all the alert types require mappings for their parameters. I wouldn't make a design decision based on this. If we allow alert types to define migrations (#50216), a simple no-op migration would move their params from the default enabled: false spot to their mapped version.

  1. This could create migration errors at the mappings level. Alert types would have to make sure their mappings work for what they allow storing in parameters.

This problem will exist for any approach that requires mapping. There is a high likelihood that errors here will happen, and we should work with the core team to see what we can do to ensure Kibana can still start when this migration fails. Alert types that use config-schema are already one step ahead at making sure the data equals the mappings, and we can make the validation mandatory for all alert types first.

  1. Alert types that don't opt-in for custom mappings get their parameters stored in a generic variable mapped enabled: false.

This problem is short-term as well. I think long term, we should force all alert types to define their mappings. Here as well, I wouldn't make a design decision based on this.

  1. Searching, sorting and filtering these parameters would require API users to understand the underlying structure that these parameters are stored. Changing this down the line could break their requests. There may be a way to create some limitations here and handle the underlying structure ourselves.

Here we could do something like the saved objects API does where it requires attributes in the filter but then replaces it with the saved object type. We could do something like that for params.value instead of params.{alertTypeId}.value. But this may prevent filtering across different alert types. But we could also expand it into a series of OR statements.

  1. The saved objects service would need to support dynamic mappings (see: Saved Objects registerType support for dynamic mappings #88988). In my POC, I have an approach the core team is happy with.

I have something worked out with the core team that would solve this problem and seems in line with what other teams ask for. This approach will also work for us to support migrations per alert type.

Next steps

I will set up a design meeting to ensure the @elastic/kibana-alerting-services stands by this proposal before starting implementation. After, I will circle back with the Security and Observability teams who requested this to confirm the approach solves what they're asking for.

@mikecote
Copy link
Contributor Author

mikecote commented Feb 4, 2021

I spoke with @romseygeek from the ES Search team and came to the same conclusion of my proposal where having a type associated with each param so we can build mappings is the best way forward for us. 👍

@mikecote
Copy link
Contributor Author

After a design discussion and further iterations on a POC, @ymao1 and I have something ready for approval. The underlying fundamentals have changed. If you're interested, feel free to read on. Otherwise, @spong @sqren, we'll be in touch soon to get your 👍 before making a PR to master.

@elastic/kibana-alerting-services it was hard to find time to do a follow-up session, so I'm opting for async to keep momentum on this issue. The details are below.

TL;DR

  • Alert types can provide param mappings and are limited on how many fields are searchable (say 5-10)
  • Mappings are merged flat across different alert types
  • An error is thrown if mapping conflicts exist and are trying to set different mappings on the same field
  • The merged mapping is set under params, where it now uses dynamic: false to allow Elasticsearch to store the remaining fields.
  • The framework injects ignore_malformed: true to all param mappings to prevent write failures if there are mismatches from other alert types that don't use searchable params.
"params": {
  "dynamic": false,
  "type": "object",
  "properties": {
    // From .index-threshold
    "aggType": {
      "type": "keyword"
    },
    // From siem.rules
    "riskScore": {
      "type": "number",
      "ignore_malformed": true
    }
  }
},

Approach

The approach we have settled on is Static mapping + flattened object from @gmmorris proposal here: #50213 (comment). We've done many iterations to make alert parameters searchable, sortable and filterable across different alert types. We felt tying our underlying data structure to the user's KQL filters didn't seem right. It also didn't work well to sort/search/filter across alert types.

There are some decent good sides going with this approach:

  • We are not tying KQL to our underlying structure too much. It remains params.{something}
  • Migrations are not necessary to make a field searchable. It just works after the next Kibana migration
  • It works for searching/sorting/filtering across alert types
  • No pre-save / post-read logic necessary
  • No issues if alert types have . in their id
  • No migration errors, though it would mean bad data silently becomes unsearchable (this should be caught in the params validation phase)

Relatively, there are not as many downsides going with this approach:

  • We set a limit of searchable fields per alert type to prevent further mapping explosion in .kibana
  • We merge the mappings at a flat level and throw mapping conflicts where it exists
  • We may still have to bump the field limit of the .kibana index until we move rules out into their own index
  • We need to inject ignore_malformed everywhere in the mappings to prevent failures on save if ever there is a mismatch with any alert

Answering the earlier concerns

What do we do what two AlertTypes use the same name for a field?
We will throw an error if the mappings differ. If they are the same, it would be ok. This allows us to move forward while having this limitation for now.

How do we handle migrations within a specific AlertType? And what about across all types?
Migrations will behave the same as we have now since the data structure didn't change. Migration by alert type coming soon.

What happens if the shape is wrong? Do we validate ourselves? Rely on plugins?
The usage of ignore_malformed: true and dynamic: false will protect us in this case. We will also limit the number of fields each alert type can map so we're not merging full schemas.

If this means dynamically merging mappings and types on the "way into the framework" and then exposing it as "portions" of this type on the "way out of the framework" back to the solution - are we introducing a lot of complexity that will be hard to maintain in the future?
Since the data structure didn't change, things going in and out will remain the same. Though only a portion of the fields will be mapped and searchable.

Future thinking

In the future, it seems best to create an Elasticsearch index per alert type. This will allow cross-index queries, sorting, and filtering properly while letting Elasticsearch handle issues when mappings differ between indices at search time. The queries would be done the same way as the approach indicated above, and it feels like the approach we should do when writing alerts as data (index per alert type). This approach brings an implementation similar to where we want to be while also addressing the problem for alerts as data.

@mikecote
Copy link
Contributor Author

I had a quick chat with @kobelb about the approach, one protection worth investigating is when an alert type doesn't use searchable params but ends up being searchable because another alert type with similar fields that defined the mappings. It could be worth adding some protections to the _find API.

@sorenlouv
Copy link
Member

sorenlouv commented Feb 18, 2021

What do we do what two AlertTypes use the same name for a field?
We will throw an error if the mappings differ. If they are the same, it would be ok. This allows us to move forward while having this limitation for now.

Are we catching this during a build step (CI) or is there a risk that these issues will go undiscovered until end users notify us?

In the future, it seems best to create an Elasticsearch index per alert type

+1

@mikecote
Copy link
Contributor Author

@sqren it would be done during a build step to notify ASAP when something is wrong.

@mikecote
Copy link
Contributor Author

We have discovered a blocker going with the approach stated above where ignore_unmappted doesn’t work as expected. When giving an object to a number field, it will still fail. Not good (see: https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-malformed.html#json-object-limits).

We have made many attempts at solving this problem. All the options turned into a not-so-great idea for the team to implement and support in the long run. We have decided to abort this issue and revisit if ever the saved object types can be created in their own Elasticsearch index (see #70471 (comment)).

In the meantime, we will try to make the alert parameters filterable by using Elasticsearch’s “flattened” type (see #92010). We opened an issue (see #92011) to explore supporting free-text searching on alert information (metadata). We will prioritize this issue once we have some requests. For numbers, we won’t be able to do something at this time. Potentially elastic/elasticsearch#61550 could solve the problem.

Solutions who cannot wait will have to create their sidecar objects with alerts and do filtering, sorting and searching within those instead. The lessons learned here apply to the upcoming alert instance as data story to denormalize alert parameters and make them appropriately indexed in Elasticsearch.

@sorenlouv
Copy link
Member

Fixed in #92036

@gmmorris
Copy link
Contributor

I've decided to reopen this issue, as I know this is still a high priority request.
We chose to close this in February as it was clear that this would necessitate completely rethinking our entire approach to how we store rules and there were serious questions with regards to the ROI of such a change.

I feel like we should still keep this issue open as a sort of open and unsolved problem statement.

@gmmorris gmmorris reopened this Oct 27, 2021
@gmmorris gmmorris added the impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. label Oct 27, 2021
@gmmorris gmmorris moved this from Done (Ordered by most recent) to Backlog in Kibana Alerting Oct 27, 2021
@banderror
Copy link
Contributor

Yes, we absolutely need this based on what I know about our product backlog. Great news!

@XavierM XavierM removed this from Backlog in Kibana Alerting Jan 6, 2022
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss enhancement New value added to drive a business result Feature:Alerting impact:high Addressing this issue will have a high level of impact on the quality/strength of our product. R&D Research and development ticket (not meant to produce code, but to make a decision) research Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
No open projects
Development

Successfully merging a pull request may close this issue.