[Logs Explorer] Global search across integrations/datasets on DataSourceSelector #177731

tonyghiani · 2024-02-23T15:26:48Z

📓 Summary

🛑 Blocked by #177696

Searching datasets on the selector is a critical user journey to identify the logs users need as fast as possible.
To enhance this experience, we want to switch to a more complete search experience to handle a more flexible input and search across both integrations and datasets.

🎨 Design

The design is almost unchanged from the current one, we'll just remove the sorting option as they will be delegated to column headers. There is the design proposal on this Figma page that is going to be tested before being ready for implementation.

✔️ Acceptance Criteria

Searching should match any integration by its name, dataset by name/title and uncategorized dataset as well.
Should support searches case-insensitive for greater matching.
Should display a "No results" prompt in case no results match the search.

💡 Implementation hints

Here are the codebase parts that will probably be affected by these changes:

For this work will be necessary to make a new branch starting from this open PR #179413 to build on top of the latest design changes.

elasticmachine · 2024-02-23T15:26:49Z

Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs)

flash1293 · 2024-04-08T07:59:45Z

@tonyghiani I looked into this and the following complication exists:

The es_index_patterns property in the epm-packages saved object is mapped as dynamic: false object. This makes it not searchable.

Some options:

Using a runtime field during the search - this is not supported by the saved objects client at the moment - would need to check with the core team ([saved objects] add support for runtime fields in Saved Object find() requests #113152)
Changing the mapping. A flattened field type would probably work, but it's not possible to directly update the mapping, which means the shape of the saved object needs to be changed via a migration - would need to check with the fleets team
Move this logic to the client - don't push this kind of search down to Elasticsearch, but execute it on the client. Load the whole list of installed integrations and search them client side. As the number of integrations will probably never exceed thousands, this seems viable, but that's definitely a gotcha
The installed_es property contains something that's almost the same thing:

{
  "id": "logs-1password.audit_events",
  "type": "index_template"
}

As these objects are already mapped as nested, a filter could be set on the id of index_template and try to match the suffix. This last option kind of works as we expected with only a change to the search method on the server, but it has downsides - The suffix of the id does match the pattern, but I'm not sure whether it has to - in any way it seems like we are abusing this field - it's really not meant for fulltext search in any way.

None of these options seem great to me - out of all of them, the mapping change with migration seems like the best option - we could introduce a new property datastreams that's mapped as keyword and contains the keys of es_index_patterns, then search on that in a proper way.

Downside of that is that it would either introduce some redundancy in the saved object attributes or be a bigger refactoring in case we would change es_index_patterns completely (e.g. making it look like installed_es, as an array of nested objects)

flash1293 · 2024-04-08T08:17:41Z

@kpollich what do you think about the above? More concretely, what I'm seeing is the follwing:

Add a new keyword field to the epm-packages saved object called datastreams
Add a migration that adds all keys from es_index_patterns as values
When adding new epm-package saved objects, make sure datastreams is populated properly in the same way

This will allows us to properly search on these datastream names, along with the name of the integration

kpollich · 2024-04-08T12:58:27Z

Yes I think your approach is sound. If we have the data we want in a property today, but it's not indexed, the best path forward to me would be to move it into an index mapping and backfill that new property with a migration. I'm not sure if there's a formal way to deprecate the es_index_patterns mapping, or how much changing that property will cascade out, but worst case we can continue writing to both properties and follow-up on a deprecation separately.

tonyghiani · 2024-04-08T14:29:19Z

Thanks for the deep analysis @flash1293, I agree on the scarcity of great options, all of them seems to add a layer of complexity that is not a quick win.

Regarding the proposed option of adding a new field to epm-packages that is derived from the es_index_patterns, what challenges does it introduce to keep them aligned?
The case I have in mind is about the custom packages, where for examples we might want to update them in future to include more data streams as per user configuration (this comes from the original idea of "Managing datasets" and create custom integrations). Do you see any drawback in maintaining those fields aligned?

flash1293 · 2024-04-08T14:53:18Z

Unfortunately there is no central place where all updates/create calls go through, which makes it brittle to maintain fields with redundant information. OTOH, there seems to be just a handful of places where es_index_patterns are used today. It will touch a bunch of places though:

I can take a stab at this - I would change it to:

es_index_patterns: [
  { name: "test_logs", pattern: "logs-all_assets.test_logs-*" }
]

mapped as:

"es_index_patterns": {
  "type": "object",
  "properties": {
     "name": { "type": "keyword" },
     "pattern": { "type": "keyword" }
  }
}

For the existing usage, the current mapping logic to work with the data would only change in a trivial way, while we could do regular search on epm-packages.es_index_patterns.name.

Wdyt @tonyghiani @kpollich ?

flash1293 · 2024-04-11T12:57:31Z

To keep this up to date - the approach chosen above is not as simple as I thought because this kind of migration isn't really supported for serverless. Still seems like the best option to me, just more work than expected.

Moving the search logic to the client seems like the easy way out, but I'm not sure about scalability. Historically this hasn't really been a problem (with data views there was a similar case and the 10k limit worked well so far for all use cases), so I would be OK to go into this direction if necessary.

tonyghiani added Team:obs-ux-logs Observability Logs User Experience Team Feature:LogsExplorer Logs Explorer feature labels Feb 23, 2024

tonyghiani mentioned this issue Mar 25, 2024

[Logs Explorer] Display integrations & datasets count on integrations tab #179318

Open

tonyghiani self-assigned this Mar 28, 2024

tonyghiani mentioned this issue Mar 28, 2024

[Logs Explorer] Unify integrations & uncategorized tabs #179413

Open

flash1293 self-assigned this Apr 8, 2024

This was referenced Apr 11, 2024

[Fleet] Refactor es_index_patterns for searchability #180553

Closed

[Logs Explorer] Use human readable dataset names from the manifest files #180450

Closed

[Logs Explorer] Search across datasets integrated with search #180563

Draft

This was referenced Apr 11, 2024

[Fleet] Add dataset human readable names to epm-packages SO #180618

Closed

[Fleet] Introduce searchable integation name and data streams (Phase I) #180684

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Logs Explorer] Global search across integrations/datasets on DataSourceSelector #177731

[Logs Explorer] Global search across integrations/datasets on DataSourceSelector #177731

tonyghiani commented Feb 23, 2024 •

edited

Loading

elasticmachine commented Feb 23, 2024

flash1293 commented Apr 8, 2024 •

edited

Loading

flash1293 commented Apr 8, 2024

kpollich commented Apr 8, 2024

tonyghiani commented Apr 8, 2024

flash1293 commented Apr 8, 2024

flash1293 commented Apr 11, 2024 •

edited

Loading

[Logs Explorer] Global search across integrations/datasets on DataSourceSelector #177731

[Logs Explorer] Global search across integrations/datasets on DataSourceSelector #177731

Comments

tonyghiani commented Feb 23, 2024 • edited Loading

📓 Summary

🎨 Design

✔️ Acceptance Criteria

💡 Implementation hints

elasticmachine commented Feb 23, 2024

flash1293 commented Apr 8, 2024 • edited Loading

flash1293 commented Apr 8, 2024

kpollich commented Apr 8, 2024

tonyghiani commented Apr 8, 2024

flash1293 commented Apr 8, 2024

flash1293 commented Apr 11, 2024 • edited Loading

tonyghiani commented Feb 23, 2024 •

edited

Loading

flash1293 commented Apr 8, 2024 •

edited

Loading

flash1293 commented Apr 11, 2024 •

edited

Loading