Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage Collection] Extend the APIs provided to the fetch method #75875

Closed
afharo opened this issue Aug 25, 2020 · 15 comments · Fixed by #83413
Closed

[Usage Collection] Extend the APIs provided to the fetch method #75875

afharo opened this issue Aug 25, 2020 · 15 comments · Fixed by #83413
Assignees
Labels
Feature:Telemetry Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@afharo
Copy link
Member

afharo commented Aug 25, 2020

Describe the feature:

As originally mentioned in #74840, it might be interesting to provide the new ES client and a scoped Saved Objects client in addition to the current callCluster we provide to the fetch method.

@chrisronline recently suggested it would be very useful for them to know the req: KibanaRequest themselves when triggering their collector, so they can properly scope other APIs like the Alerting one.

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-telemetry (Team:KibanaTelemetry)

@Bamieh
Copy link
Member

Bamieh commented Aug 25, 2020

+1 for adding the ES client and SO client helpers to the collector.

We trigger the collector from the server side thus most cases we are not bound to a requesting user.

Generally I'd say its not a good design to send user scoped data back to telemetry. I'd love to hear more about the usecase you have in mind for this @chrisronline

@chrisronline
Copy link
Contributor

We'd love to collect telemetry on what monitoring-specific alerts exist for users. We are interested in an overall count, but in order to get this data from the alerting APIs, we need a valid user-scoped request object. While I understand that this type of data is user-specific (and could be different from user to user), we still think it's valuable.

Right now, the alerting team collects some telemetry data that looks like:

"alerts" : {
  "count_total" : 7,
  "count_by_type" : {
    "monitoring_alert_cluster_health" : 1,
    "monitoring_alert_logstash_version_mismatch" : 1,
    "monitoring_alert_license_expiration" : 1,
    "monitoring_alert_kibana_version_mismatch" : 1,
    "monitoring_alert_nodes_changed" : 1,
    "monitoring_alert_elasticsearch_version_mismatch" : 1,
    "monitoring_alert_cpu_usage" : 1
  },
  "schedule_time" : {
    "avg" : "60s",
    "min" : "60s",
    "max" : "60s"
  },
  "connectors_per_alert" : {
    "min" : 0,
    "avg" : 1,
    "max" : 1
  },
  "count_active_total" : 7,
  "count_active_by_type" : {
    "monitoring_alert_cluster_health" : 1,
    "monitoring_alert_logstash_version_mismatch" : 1,
    "monitoring_alert_kibana_version_mismatch" : 1,
    "monitoring_alert_license_expiration" : 1,
    "monitoring_alert_nodes_changed" : 1,
    "monitoring_alert_elasticsearch_version_mismatch" : 1,
    "monitoring_alert_cpu_usage" : 1
  },
  "throttle_time" : {
    "min" : "86400s",
    "avg" : "86400s",
    "max" : "86400s"
  },
  "runs" : 8,
  "count_disabled_total" : 0
}

It does offer a total count, but as you can see, the monitoring alerts span across multiple types and we'd need to use a scripted field in the telemetry Kibana to properly sum those up. However, that scripted field would also need to know the names of all the monitoring alert types, which will grow large over time.

We currently maintain the list of alerts explicitly so for us, it'd be nice to be able call the alerting APIs from our usage collector, provide the array of alert types, and get back a count of which ones are enabled.

In the future, we'd also like to get configuration details from each one (like if the user changed the default threshold to something else).

@afharo
Copy link
Member Author

afharo commented Aug 25, 2020

@Bamieh from the technical POV, even on the server side, the Alert API requires the request context in the API they expose on their Start contract:

export interface PluginStartContract {
listTypes: AlertTypeRegistry['list'];
getAlertsClientWithRequest(request: KibanaRequest): PublicMethodsOf<AlertsClient>;
}

I wonder if we can create a fake KibanaRequest when the collection is server-driven (maybe the @elastic/kibana-platform team can drive us in the right direction here?)

@chrisronline
Copy link
Contributor

FWIW, the alerting team themselves have a need for something like:

I wonder if we can create a fake KibanaRequest when the collection is server-driven

in #59813 so it might be worth collaborating for a common solution.

@Bamieh
Copy link
Member

Bamieh commented Aug 25, 2020

Thanks for explaining. I argee, I think we'd need to collaborate with @elastic/kibana-alerting-services for a solution

@mikecote
Copy link
Contributor

mikecote commented Sep 9, 2020

I added a note here: #59813 (comment) so we consider or revisit this issue at the same time.

@TinaHeiligers TinaHeiligers self-assigned this Sep 30, 2020
@TinaHeiligers
Copy link
Contributor

Related issue: #74840

@afharo
Copy link
Member Author

afharo commented Nov 9, 2020

In #82638 (comment) we've identified another possible use case for the need for KibanaRequest. In this case, ideally, it would be provided only when it's available (there's an actual unencrypted API request triggering the collection).

This doesn't change the situation with the Alerting APIs (since they still require it for every request and KibanaRequest does not always exist).

@pgayvallet
Copy link
Contributor

I wonder if we can create a fake KibanaRequest when the collection is server-driven (maybe the @elastic/kibana-platform team can drive us in the right direction here?)

That's part of #39430. It has been opened for quite a long time now, but we are supposed to tackle this during 7.12.

@legrego
Copy link
Member

legrego commented Nov 9, 2020

I'm not a fan of exposing the KibanaRequest to the usage collectors, but it's possible that I'm lacking some context here.

Do we really need to execute these requests with the credentials of the KibanaRequest, or do we merely need the request in order to properly create Kibana's various service clients?

If the former, then I need some help understanding the benefits. It seems like you'd end up with wildly inconsistent usage data if it's all scoped to the current user's privileges, and so I'm having a hard time seeing the benefit to capturing that at all. Is bad data better than no data?

If the latter (we want service clients created), then my preference is to wait for #39430, where we can solve this properly.

Inconsistent data/data quality issues aside, my other primary concern is that consumers will not consider the end user's credentials when building their usage collectors. It's very possible that the user "chosen" to transmit telemetry won't have access to much of anything themselves. As an example, you can create a user who only has read access to Advanced Settings, and nothing else. If you accidentally or intentionally used their KibanaRequest to calculate usage data, then you might very well end up with authorization errors, and/or an empty data set.

@afharo
Copy link
Member Author

afharo commented Nov 10, 2020

Thank you @legrego for your comments. I feel like we are missing a bit of context in this issue to fully describe the current behaviour and the reason why we are considering exposing the KibanaRequest:

At the moment, the fetch method in each collector receive some common clients: ES client (both, the new one and the deprecated legacy one) and the SO client. All of them are properly scoped.

Why do we scope the clients?

In our efforts to be transparent about the data that is sent to our remote telemetry service, we offer the users a way to retrieve the telemetry sample that we would send via the Advanced Settings page:
image

There is the risk that some users with limited access might gain access to some information they shouldn't see (i.e.: if the user doesn't have access to Stack Monitoring, it shouldn't be able to see any Monitoring-related stats). For that purpose, when requesting the sample in this way, we scope the response to what that user can actually see.

However, when requesting the telemetry payload that will be sent to the remote telemetry service, the clients are scoped to the Kibana Internal user. This payload is encrypted on the server-side and decrypted on the remote telemetry service. This way we mask the data when using the browser sending mechanism for those users with limited access.

Why do we need to provide KibanaRequest?

You might be wondering: if fetch is already receiving the scoped clients, why should we provide KibanaRequest as well?

The answer to that: We've noticed a couple of situations when the collectors need custom clients (i.e.: using the alerting APIs, or as recently discovered, some Monitoring collectors are using their own custom ES client with access to the .monitoring index).

my other primary concern is that consumers will not consider the end user's credentials when building their usage collectors.

That statement totally resonates with us, and we are considering what's the best way to discourage their use in a way that it's only used when "the dev knows what they're doing".

Either via documentation (README and TSDocs or even extending the API to request for an additional config parameter in order to include it in the context (plus still documenting why devs should only use it with caution).

I hope this additional context helps understand the whys. The question still remains though: Should we take that risk? Or are we OK with exposing that data to the users that might not be allowed to see that data?

@legrego
Copy link
Member

legrego commented Nov 10, 2020

@afharo I appreciate the additional context, thanks for taking the time to explain this to me.

There is the risk that some users with limited access might gain access to some information they shouldn't see (i.e.: if the user doesn't have access to Stack Monitoring, it shouldn't be able to see any Monitoring-related stats). For that purpose, when requesting the sample in this way, we scope the response to what that user can actually see.

That makes sense, and I'm happy to see that we're taking this into consideration. Based on my limited exposure to the usage collectors that plugins register, I assume that this is handled transparently by the usage collection service? In other words, the individual collectors don't need to know when they're retrieving data for the unencrypted sample payload vs the actual encrypted payload?

That statement totally resonates with us, and we are considering what's the best way to discourage their use in a way that it's only used when "the dev knows what they're doing".

Either via documentation (README and TSDocs or even extending the API to request for an additional config parameter in order to include it in the context (plus still documenting why devs should only use it with caution).

❤️ Happy to hear this. I'm a fan of forcing consumers to explicitly request the KibanaRequest via an additional config parameter (and documenting when it's safe to use). Docs are immensely helpful if you seek them out, but they don't provide any safeguards.

The question still remains though: Should we take that risk? Or are we OK with exposing that data to the users that might not be allowed to see that data?

My understanding is that this wouldn't expose data to unauthorized users. If a usage collector creates a scoped client using the KibanaRequest, then it shouldn't be possible to return data that this user can't see. I think the larger fear is that an engineer will "test" this by looking at the unencrypted payload, and assume that it'll also work for the encrypted payload. To be fair, we run that risk today, but the risk is contained to a single location since you're providing the scoped clients to all consumers, instead of asking the clients to create their own.

If my understanding is correct (that we don't expose data to unauthorized users), then I don't have a strong objection from a security standpoint.

I'm certainly not one to dictate the design, and you may have already considered this, but this sounds like a problem that the ContextContainer was designed to solve. (cc @elastic/kibana-platform). This would allow plugins like Alerting to provide scoped clients to all usage collectors, so they wouldn't have to create their own.

@chrisronline
Copy link
Contributor

For Stack Monitoring, I think the root cause of our issue is that the data we want to report on isn't really user-specific, but it's contained in a system which is only user-specific. I know the alerting team is working on making alerts that exist outside the scope of a user so once that is better supported, we might have a better way of getting this data without needing a request object.

@afharo afharo linked a pull request Nov 16, 2020 that will close this issue
3 tasks
@lukeelmers lukeelmers added the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Oct 1, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Telemetry Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants