Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transforms] Graceful handling for _preview when source indices are missing #87074

Open
dolaru opened this issue May 24, 2022 · 8 comments
Open
Labels
>enhancement :ml/Transform Transform Team:ML Meta label for the ML team

Comments

@dolaru
Copy link
Member

dolaru commented May 24, 2022

Context

As an example, the Endpoint Security integration creates a couple of transforms when it's set up, but the source indices for these transforms are not available until the integration starts collecting data from agents.

Until the indices are created and data starts coming in, the users will see this in the Preview tab:
image

These errors come from the _preview endpoint.

Enhancement

The _preview endpoint should either return 0 documents or return an error message that describes the situation in a way that's more helpful to the user.

@dolaru dolaru added >enhancement :ml/Transform Transform needs:triage Requires assignment of a team area label labels May 24, 2022
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label May 24, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@benwtrent
Copy link
Member

Honestly, I am not 100% sure about this. We should totally indicate VERY clearly that there is no data in the _preview.

@przemekwitek
Copy link
Contributor

or return an error message that describes the situation in a way that's more helpful to the user.

Honestly, I am not 100% sure about this. We should totally indicate VERY clearly that there is no data in the _preview.

IIUC these 2 do not contradict. I agree we should return an error and make the error message understandable.

@sophiec20
Copy link
Contributor

One approach would be to match what _search does. For example, if a wildcard is used in the source index then zero docs are returned without errors. If a concrete index name is used, and it does not exist then an error is shown. Whilst it might not be the most intuitive in all cases, at least following the behaviour of _search introduces a consistency within the stack.

@benwtrent
Copy link
Member

One approach would be to match what _search does.

The problem I have with this, is that it gives no indication as to why no docs were matched. In one case it could be that there are no indices (e.g. the wrong pattern could have been chosen, or the user knows about this and its fine), or the user who is creating the transform has no permissions to read the data. One of those the user should be able to act on. If we do not distinguish the "why" behind no docs, we make the experience worse for the user.

@sophiec20
Copy link
Contributor

I agree and perhaps that should be fixed in _search.

It would be good to also consider the consistency between the behaviour of _preview for datafeeds and transforms as the same team supports both.

@droberts195
Copy link
Contributor

or the user who is creating the transform has no permissions to read the data

One of the principles behind the way index security works is that it shouldn't be possible for a user who doesn't have permission to read the cluster state to determine which indices exist by getting "permission denied" errors.

So for example if an index top-secret-index exists and a user who doesn't have permission to read top-secret-index searches it then they should get the same response as if top-secret-index didn't exist. Similarly if they search top-secret*.

It would be good to also consider the consistency between the behaviour of _preview for datafeeds and transforms as the same team supports both.

In both transforms and datafeeds we're trying to check up-front what a later search will be able to do. The only tool we have for this is the _has_privileges API. But actually this behaves in a subtly different way to the actual _search API.

If you ask _has_privileges whether a user can search .fleet-agents* then it will respond with false unless the user has sufficiently wildcarded permissions to search any possible future index that might match the pattern .fleet-agents*. If they have permission to search .fleet-agents-2022* then it will respond with false because they can't search .fleet-agents-does-not-exist-and-never-will. In contrast the _search API will happily return results from .fleet-agents-2022* and silently ignore .fleet-agents-2021* that the user doesn't have permissions for.

This discrepancy between _search and _has_privileges is where we're getting ourselves in a mess.

At the moment our rule is that if you want to configure a transform or datafeed to search .fleet-agents* then you have to have permissions over that whole wildcarded namespace. Maybe that rule is no longer fit for purpose. But then, given the way _has_privileges works, all we could do is let the transform or datafeed start and retrieve data from the subset of that namespace that the user is actually allowed to search (which could be anywhere from the entire subset that actually exists to nothing).

@hendrikmuhs
Copy link
Contributor

Regarding the "no feedback" issue:

Search provides hits and it returns _shards in the result object and therefore gives an indication if e.g. no indices could be resolved due to a misspelling in the config and if you made a mistake in the query, you can see 0 hits but e.g. that 1 shard has matched.

We lack those information in _preview, so one way to think about this is exposing more data from the search in _preview, we have it, we just ignore it right now. Indicating the "size of a transform" seems like a good addition, too.

(However _shards seems a bit too technical (is it legacy?), as a user I rather prefer to get the number of resolved indices, if we get that information)

Regarding privileges: I agree we can't disclose the existence of an index, therefore it seems to me, we can't do anything if wildcards are involved. All the indices we talk about seem to be data streams, it would be better if they switch to data streams instead of using the old school syntax.

LBNL the error is about a system transform, longer term we should hide that transform in the default view. It should at least require ticking a checkbox to see it.

@stu-elastic stu-elastic removed the needs:triage Requires assignment of a team area label label May 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml/Transform Transform Team:ML Meta label for the ML team
Projects
None yet
Development

No branches or pull requests

8 participants