Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Logs Explorer] Global search across integrations/datasets on DataSourceSelector #177731

Open
tonyghiani opened this issue Feb 23, 2024 · 7 comments
Assignees
Labels
Feature:LogsExplorer Logs Explorer feature Team:obs-ux-logs Observability Logs User Experience Team

Comments

@tonyghiani
Copy link
Contributor

tonyghiani commented Feb 23, 2024

📓 Summary

🛑 Blocked by #177696

Searching datasets on the selector is a critical user journey to identify the logs users need as fast as possible.
To enhance this experience, we want to switch to a more complete search experience to handle a more flexible input and search across both integrations and datasets.

🎨 Design

The design is almost unchanged from the current one, we'll just remove the sorting option as they will be delegated to column headers. There is the design proposal on this Figma page that is going to be tested before being ready for implementation.

307330872-11f60c41-a9f8-4f38-9326-8d2d7faf321d

✔️ Acceptance Criteria

  • Searching should match any integration by its name, dataset by name/title and uncategorized dataset as well.
  • Should support searches case-insensitive for greater matching.
  • Should display a "No results" prompt in case no results match the search.

💡 Implementation hints

Here are the codebase parts that will probably be affected by these changes:

For this work will be necessary to make a new branch starting from this open PR #179413 to build on top of the latest design changes.

@tonyghiani tonyghiani added Team:obs-ux-logs Observability Logs User Experience Team Feature:LogsExplorer Logs Explorer feature labels Feb 23, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/obs-ux-logs-team (Team:obs-ux-logs)

@flash1293
Copy link
Contributor

flash1293 commented Apr 8, 2024

@tonyghiani I looked into this and the following complication exists:

The es_index_patterns property in the epm-packages saved object is mapped as dynamic: false object. This makes it not searchable.

Some options:

  • Using a runtime field during the search - this is not supported by the saved objects client at the moment - would need to check with the core team ([saved objects] add support for runtime fields in Saved Object find() requests #113152)
  • Changing the mapping. A flattened field type would probably work, but it's not possible to directly update the mapping, which means the shape of the saved object needs to be changed via a migration - would need to check with the fleets team
  • Move this logic to the client - don't push this kind of search down to Elasticsearch, but execute it on the client. Load the whole list of installed integrations and search them client side. As the number of integrations will probably never exceed thousands, this seems viable, but that's definitely a gotcha
  • The installed_es property contains something that's almost the same thing:
{
  "id": "logs-1password.audit_events",
  "type": "index_template"
}

As these objects are already mapped as nested, a filter could be set on the id of index_template and try to match the suffix. This last option kind of works as we expected with only a change to the search method on the server, but it has downsides - The suffix of the id does match the pattern, but I'm not sure whether it has to - in any way it seems like we are abusing this field - it's really not meant for fulltext search in any way.

None of these options seem great to me - out of all of them, the mapping change with migration seems like the best option - we could introduce a new property datastreams that's mapped as keyword and contains the keys of es_index_patterns, then search on that in a proper way.

Downside of that is that it would either introduce some redundancy in the saved object attributes or be a bigger refactoring in case we would change es_index_patterns completely (e.g. making it look like installed_es, as an array of nested objects)

@flash1293
Copy link
Contributor

@kpollich what do you think about the above? More concretely, what I'm seeing is the follwing:

  • Add a new keyword field to the epm-packages saved object called datastreams
  • Add a migration that adds all keys from es_index_patterns as values
  • When adding new epm-package saved objects, make sure datastreams is populated properly in the same way

This will allows us to properly search on these datastream names, along with the name of the integration

@kpollich
Copy link
Member

kpollich commented Apr 8, 2024

Yes I think your approach is sound. If we have the data we want in a property today, but it's not indexed, the best path forward to me would be to move it into an index mapping and backfill that new property with a migration. I'm not sure if there's a formal way to deprecate the es_index_patterns mapping, or how much changing that property will cascade out, but worst case we can continue writing to both properties and follow-up on a deprecation separately.

@tonyghiani
Copy link
Contributor Author

Thanks for the deep analysis @flash1293, I agree on the scarcity of great options, all of them seems to add a layer of complexity that is not a quick win.

Regarding the proposed option of adding a new field to epm-packages that is derived from the es_index_patterns, what challenges does it introduce to keep them aligned?
The case I have in mind is about the custom packages, where for examples we might want to update them in future to include more data streams as per user configuration (this comes from the original idea of "Managing datasets" and create custom integrations). Do you see any drawback in maintaining those fields aligned?

@flash1293
Copy link
Contributor

Unfortunately there is no central place where all updates/create calls go through, which makes it brittle to maintain fields with redundant information. OTOH, there seems to be just a handful of places where es_index_patterns are used today. It will touch a bunch of places though:
Screenshot 2024-04-08 at 16 50 24

I can take a stab at this - I would change it to:

es_index_patterns: [
  { name: "test_logs", pattern: "logs-all_assets.test_logs-*" }
]

mapped as:

"es_index_patterns": {
  "type": "object",
  "properties": {
     "name": { "type": "keyword" },
     "pattern": { "type": "keyword" }
  }
}

For the existing usage, the current mapping logic to work with the data would only change in a trivial way, while we could do regular search on epm-packages.es_index_patterns.name.

Wdyt @tonyghiani @kpollich ?

@flash1293
Copy link
Contributor

flash1293 commented Apr 11, 2024

To keep this up to date - the approach chosen above is not as simple as I thought because this kind of migration isn't really supported for serverless. Still seems like the best option to me, just more work than expected.

Moving the search logic to the client seems like the easy way out, but I'm not sure about scalability. Historically this hasn't really been a problem (with data views there was a similar case and the 10k limit worked well so far for all use cases), so I would be OK to go into this direction if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:LogsExplorer Logs Explorer feature Team:obs-ux-logs Observability Logs User Experience Team
Projects
None yet
Development

No branches or pull requests

4 participants