Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[APM] Allow kibana to collect APM telemetry in background task #52917

Merged

Conversation

ogupte
Copy link
Contributor

@ogupte ogupte commented Feb 27, 2020

Required for elastic/kibana#50757. Allows the kibana user to collect APM telemetry in a background task by giving the kibana_system reserved role the read, read_cross_cluster, view_index_metadata privileges on apm-*

@ogupte ogupte added the v7.7.0 label Feb 27, 2020
@ogupte ogupte force-pushed the apm-50757-kibana-background-task-telemetry branch from 2e3e83f to a959378 Compare February 28, 2020 07:54
@tvernum tvernum added v8.0.0 :Security/Authorization Roles, Privileges, DLS/FLS, RBAC/ABAC >enhancement labels Mar 3, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security (:Security/Authorization)

@tvernum tvernum self-requested a review March 3, 2020 00:03
@dgieselaar
Copy link
Member

@ogupte For posterity, I'm doing the following calls:

apm-*: _search, _stats
.ml-anomalies-*: _search

The latter is to check for the presence of APM ML jobs. That could also be done via a Kibana API IIRC but I guess it would face the same restrictions.

@legrego
Copy link
Member

legrego commented Mar 4, 2020

For posterity, I'm doing the following calls:
apm-: _search, _stats
.ml-anomalies-
: _search

Does this PR need to add privileges to .ml-anomalies-* as well then?

@ogupte ogupte force-pushed the apm-50757-kibana-background-task-telemetry branch from a959378 to c8f2608 Compare March 10, 2020 05:07
@ogupte
Copy link
Contributor Author

ogupte commented Mar 10, 2020

@elasticmachine test this please

@ogupte
Copy link
Contributor Author

ogupte commented Mar 10, 2020

For posterity, I'm doing the following calls:
apm-_: _search, stats
.ml-anomalies-
: _search

Does this PR need to add privileges to .ml-anomalies-* as well then?

You're right @legrego, I amended this change by adding those privileges as well.

Copy link
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a question about the index names. Otherwise, LGTM

RoleDescriptor.IndicesPrivileges.builder()
.indices(".apm-custom-link").privileges("all").build(),
// APM telemetry queries APM & ML anomalies indices in kibana task runner
RoleDescriptor.IndicesPrivileges.builder()
.indices("apm-*")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the indices referenced in this file is prefixed with a dot. So just wanna make sure this one is indeed without it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it's intentionally referring to the apm data indices

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogupte In a previous discussion, of which you have been part of, see #50051 (comment), Brandon laid out the design of security privileges.

According to it, system roles such as kibana_system, which are used by Kibana system tasks/jobs for internal house keeping, must not be granted privileges against end user data. The administrator must at all times be in control of which users and services have access to the end user's data.

If a system task needs to deal with user data, it must get consent from the user, and afterwards obtain API keys on the user's behalf (see #52886). The end users should be able to revoke the system user's access to his data at any time.

IIUC this PR grants privileges to user data to the kibana_system role, going against the restrictions described above.

Copy link
Contributor

@albertzaharovits albertzaharovits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this PR grants access to user data to system users which goes against best practices.

RoleDescriptor.IndicesPrivileges.builder()
.indices(".apm-custom-link").privileges("all").build(),
// APM telemetry queries APM & ML anomalies indices in kibana task runner
RoleDescriptor.IndicesPrivileges.builder()
.indices("apm-*")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogupte In a previous discussion, of which you have been part of, see #50051 (comment), Brandon laid out the design of security privileges.

According to it, system roles such as kibana_system, which are used by Kibana system tasks/jobs for internal house keeping, must not be granted privileges against end user data. The administrator must at all times be in control of which users and services have access to the end user's data.

If a system task needs to deal with user data, it must get consent from the user, and afterwards obtain API keys on the user's behalf (see #52886). The end users should be able to revoke the system user's access to his data at any time.

IIUC this PR grants privileges to user data to the kibana_system role, going against the restrictions described above.

.privileges("read", "read_cross_cluster", "view_index_metadata").build(),
RoleDescriptor.IndicesPrivileges.builder()
.indices(".ml-anomalies-*")
.privileges("read", "read_cross_cluster", "view_index_metadata").build(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think the kibana_system user should access ML data directly, but I need more context.

Instead I propose, similarly to the above, the the APM kibana system user calls the ML APIs using the credentials of the user (via an API key). That's because ML users do get access to the .ml-anomalies* indices which indicates that the .ml-anomalies* indices contain user data.

Maybe @droberts195 has a better suggestion of how APM should access ML data ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albertzaharovits As far as I understand @kobelb was OK with this exception, see e.g. elastic/kibana#50757 (comment). I can't find anything more formal or explicit than this, so maybe Brandon can confirm here just to be sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After quite a few discussions regarding this topic, I came to the conclusion that allowing the kibana_system role to read from the apm-* indices only for collecting telemetry was tolerable from the security perspective. The telemetry data is aggregate in nature, and unauthorized end-users will be unable to see these aggregates. Since apm-* is a "data index" which users have complete control over, there's the risk that these indices contain documents that aren't created by the APM server for use within the APM application in Kibana, which could potentially lead to other bugs. I can only speculate on the likelihood and impact of these other bugs.

I don't believe I was part of a prior discussion regarding .ml-anomalies*. From the security perspective, as long as we're only using this for telemetry it also seems tolerable for the aforementioned reasons. I'll defer to @droberts195 on the risk of other bugs which might occur from Kibana reading from these indices directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From elastic/kibana#50757 (comment) it looks like the requirement is to find out:

  • whether the cluster has APM-specific ML indices

But actually given the way the check is being done - see https://github.com/elastic/kibana/pull/51612/files#diff-876010cd4a64003f0f8c1ded433c1c72R371 - it is finding out whether indices whose names match .ml-anomalies-*-high_mean_response_time contain at least one document.

The output in the telemetry is this:

        integrations: {
           ml: {
             has_anomalies_indices: boolean
           }
         }

Is the real requirement to find out if the ML jobs created by APM have done any work? It seems to be wrong to be doing this via some internal implementation detail that could change (and could return an incorrect answer with the current implementation if somebody has created some other job not related to APM whose job ID ends with -high_mean_response_time).

Since the APM telemetry code seems to know the job IDs of ML jobs APM might have created it would be better to just use the get job stats API. That would mean giving the kibana_system role the manage_ml cluster privilege, but that's what we plan to do anyway for the "ML in Spaces" project so it wouldn't hurt if you did it now. You could look for data_counts.processed_record_count > 0 in the job stats for the APM jobs to see if they've been used.

Even if you really do want to report on ML indices (the internal implementation detail) rather than ML jobs (the public interface), I am not convinced it's appropriate to grant the read_cross_cluster privilege. Would APM ever really need to access ML indices in a different cluster?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added elastic/kibana#50757 (comment) on the Kibana issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albertzaharovits I think we want to avoid requiring our users to initiate background tasks like this (IIUC this is necessary even with the linked PR) because we want a solution that works out of the box. We previously discussed API keys but decided against it for this reason.

I do agree it should be a temporary solution for the reasons you've mentioned, but I'm not sure which solution that would be. For now we would accept data loss if the user has reconfigured their indices.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albertzaharovits I think we want to avoid requiring our users to initiate background tasks like this (IIUC this is necessary even with the linked PR) because we want a solution that works out of the box. We previously discussed API keys but decided against it for this reason.

@dgieselaar This is not entirely accurate. For example, using secondary authentication (see #52093 - I now realize I made a mistake in the above "#52886 and #52886") the query against the apm-* indices can be made by the end user currently interacting with the APM UI. Also, if changing the apm-* index pattern is at all possible, it should also be feasible to obtain an ES API key for the configured index pattern for the user doing the configuration (although I don't have any idea about the multi-tenancy aspect of the data in the apm indices).

Another point is that there is an apm_user system user already, and it's not clear to me why the same privileges have to be assigned to the kibana system user as well.

Ultimately it seems I lack the context, and the PR description does not provide it, so I feel I'm not the person to review the changes. Although I'm open to learn about it and contribute options, that's not necessary, and we can rely on the reviews from @kobelb and @legrego since Kibana is doing most of the the authorization around APM anyway.

Copy link
Contributor

@kobelb kobelb Mar 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think further thought needs to be given to how we enable solutions like APM to report their telemetry data. Per elastic/kibana#50757 (comment), @TinaHeiligers and the Pulse team will be taking this into consideration when fleshing out the improvements which they're going to be making to our existing telemetry infrastructure.

For the time being, using Kibana to access the hard-coded apm-* indices is tolerable, but far from ideal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed we only want to know whether any APM jobs have been created

If you don't care if they've been used or not, just literally if they've been created, then the most appropriate API would be get jobs. You can choose one of these two ways to use it:

  1. Call it with your wildcard ID pattern and check whether you get an empty array back. Empty array => no APM jobs, non-empty array => APM jobs exist.
  2. Call it with your wildcard and ?allow_no_match=false. Then 404 status => no APM jobs and 200 status => APM jobs exist.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed your suggestions in the PR @droberts195, thanks!

@dgieselaar
Copy link
Member

@ogupte @legrego @ywangd @droberts195 I'm now using GET /_ml/anomaly_detectors/<job_id> instead of querying the ML indices directly. I've created a dummy user on our development environment to test the changes, that has the kibana_system role assigned and a new role, which I've named kibana_system_extended_permissions. I've then created a user called dummy_kibana_system and applied both the kibana_system and the kibana_system_extended_permissions role.

Without any index privileges, it is not able to read anything from the apm-* indices. It is however able to both get index statistics (indices.stats) for the apm-* indices, and /_ml/anomaly_detectors/<job_id> returns the correct count as well.

If I add a read privilege for apm-* to the kibana_system_extended_permissions role, reading from the index works. So it looks like that (and probably read_cross_cluster) is the only thing we need. Does that make sense or did I mess up somewhere in my testing approach?

Allows the kibana user to collect APM telemetry in a background task.
@ogupte ogupte force-pushed the apm-50757-kibana-background-task-telemetry branch from c8f2608 to efa0963 Compare March 19, 2020 06:53
@droberts195
Copy link
Contributor

and /_ml/anomaly_detectors/<job_id> returns the correct count as well.

Oh, of course, the kibana_system role already has the monitor privilege which allows this. So it makes sense that you already have permission to call this endpoint and don't need to add manage_ml as I erroneously recommended before. Thanks for testing this out.

@ogupte ogupte merged commit 41ba993 into elastic:master Mar 24, 2020
ogupte added a commit that referenced this pull request Mar 25, 2020
… (#54106)

* Required for elastic/kibana#50757.
Allows the kibana user to collect APM telemetry in a background task.

* removed unnecessary priviledges on `.ml-anomalies-*` for the `kibana_system` reserved role
yyff pushed a commit to yyff/elasticsearch that referenced this pull request Apr 17, 2020
…ic#52917)

* Required for elastic/kibana#50757.
Allows the kibana user to collect APM telemetry in a background task.

* removed unnecessary priviledges on `.ml-anomalies-*` for the `kibana_system` reserved role
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants