[Telemetry] Move Monitoring collection strategy to a collector #82638

afharo · 2020-11-04T17:51:15Z

Summary

As agreed in previous discussions, this PR is migrating the telemetry collection strategy registered by Monitoring to a StatsCollector.

This change allows us to evolve the Telemetry and the Monitoring plugins without affecting each other as much as they currently do. It also improves the amount and quality of data we collect, fixing use cases that would skip sending telemetry about production monitoring clusters, or missing fields only sent by local collectors.

For full details about the discussions, refer to the RFC.

Planned follow-up PRs

As stated in my initial self-review, for the sake of easier review, I think that we need to implement a few follow-up PRs (already added to the teams' roadmap):

Change the behaviour of telemetryCollectionManager to use this X-Pack strategy as a replacement of the OSS strategy (comment)
Remove Monitoring's Kibana Usage collection ([Telemetry/Monitoring] Monitoring should stop collecting Kibana Usage in the .monitoring indices #83521)
Remove the parameter timestamp in /api/telemetry/v2/clusters/_stats (comment)
Provide the KibanaRequest (only when available) in the fetch method's context so collectors can scope their requests when possible (comment)

Checklist

Delete any items that are not applicable to this PR.

Unit or functional tests were updated or added to match the most common scenarios

For maintainers

This was checked for breaking API changes and was labeled appropriately

afharo

Self-review explaining a few of my thoughts (and planned follow-up PRs) about some of these changes.

afharo · 2020-11-06T09:53:08Z

x-pack/plugins/monitoring/kibana.json

@@ -13,7 +13,6 @@
  ],
  "optionalPlugins": [
    "infra",
-    "telemetryCollectionManager",


The monitoring plugin no longer depends on the telemetryCollectionManager 🎉

afharo · 2020-11-06T09:54:21Z

x-pack/plugins/monitoring/server/plugin.ts

-    // TODO: For the telemetry plugin to work, we need to provide the new ES client.
-    // The new client should be inititalized with a similar config to `this.cluster` but, since we're not using
-    // the new client in Monitoring Telemetry collection yet, setting the local client allows progress for now.
-    // The usage collector `fetch` method has been refactored to accept a `collectorFetchContext` object,
-    // exposing both es clients and the saved objects client.
-    // We will update the client in a follow up PR.
-    this.telemetryElasticsearchClient = elasticsearch.client;
-    this.telemetrySavedObjectsService = savedObjects;


These clients can be removed now that the monitoring plugin does not need to catch up with the changes in the telemetryCollectionManager 🎉

afharo · 2020-11-06T09:59:52Z

x-pack/plugins/monitoring/server/plugin.ts

@@ -174,6 +157,11 @@ export class Plugin {
      });

      registerCollectors(plugins.usageCollection, config, cluster.callAsInternalUser);
+      registerMonitoringTelemetryCollection(
+        plugins.usageCollection,
+        cluster.callAsInternalUser,


Ideally, if we add to the fetch context the KibanaRequest (only when available), the collector itself can decide to scope the request as needed (it would also apply to the other collectors registered in registerCollectors), and avoid exposing data to users that shouldn't see it. We can revisit it in a follow-up PR.

@afharo we had decided to hold off on adding the KibanaRequest to the fetch context until there was a need for it. Is there a new need for that now and, if so, how urgently do we need to address that?

@TinaHeiligers, thank you for confirming that. I am aware of that current action being held until a need is identified. That's also why I preferred to deal with this issue on a separate PR. So we can properly discuss about it.
The need I can see in this situation (both in lines 159 and 162) is that those collectors are always using the internal user so, if a user with fewer permissions requests the telemetry sample flyout, we'll expose data about the monitored clusters, potentially exposing data that user shouldn't see.
In this monitoring use-case, its collectors can't use the esClient/callCluster provided in the fetch context because they need the cluster client instead (pointing to the actual monitoring data). If we decide to provide the KibanaRequest (only when available), these collectors (the new one I did and the existing ones) can properly scope their request to avoid any unwanted exposure of data.
Regarding the urgency, I guess it's up to us to decide. Personally, I'd vote for having it sorted before 7.11's FF because of the potential leak of information. Once we have this PR merged, I can build a quick POC to identify the implications it may have, if you think that's OK 🙂

Adding a reference to the existing issue: #75875 🙂

I've created the PR #83413 to fix this. Once it's merged, I'll rebase this one and update accordingly.

The PR has been merged and I've applied the changes in this one to use it

afharo · 2020-11-06T10:18:19Z

x-pack/plugins/telemetry_collection_xpack/server/telemetry_collection/get_stats_with_xpack.ts

+        delete stats.stack_stats.kibana!.plugins.monitoringTelemetry;
+      }
+      return [...acc, stats, ...(monitoringTelemetry || [])];
+    }, [] as TelemetryAggregatedStats[]);


It may feel like we are mixing concepts, and that this telemetry collection strategy needs to know about other existing collectors to reshape the format that is ultimately sent to the telemetry service. But, afaik, that's one of the duties in the telemetry collection strategies (the OSS strategy also extracts some collectors to shape them into something else).

On an unrelated note, I'm planning on a follow-up PR that will change the behaviour of the telemetryCollectionManager: It will use this X-Pack strategy as a replacement of the OSS strategy (instead of the current priority-based loop, trying the next lower-priority strategy if the previous one fails).

afharo · 2020-11-06T10:20:09Z

x-pack/test/api_integration/apis/telemetry/telemetry.js

+        const { body } = await supertest
+          .post('/api/telemetry/v2/clusters/_stats')
+          .set('kbn-xsrf', 'xxx')
+          .send({ timestamp, unencrypted: true })


The timestamp parameter was only used for the monitoring strategy to retrieve the right time-frame of telemetry data. Now it's useless and confusing. I'll create a follow-up PR to remove it.

afharo · 2020-11-09T08:13:35Z

@elasticmachine merge upstream

chrisronline

I'm wondering we can de-couple even more than this. What concerns me is that there is still a good chunk of telemetry code within the monitoring plugin.

What does the telemetry code need to function outside of the monitoring plugin? I'm guessing at a minimum, it needs access to the monitoring cluster, or at least an exposed callCluster that is connected to it.

Does anyone else feel that this level of separation will be ideal and attainable?

TinaHeiligers · 2020-11-10T21:56:25Z

Does anyone else feel that this level of separation will be ideal and attainable?

@chrisronline ideally, Monitoring will just be another usage collector and provide the data that it needs to for telemetry purposes. I'll leave it up to @afharo to discuss the details but in short, there are a number of cases that we can't really get around easily right now.

chrisronline · 2020-11-11T15:23:50Z

I just worry that there is code inside of the monitoring plugin that my team doesn't really own. I feel like the telemetry team owns this code and it should live in a plugin that they also own. If we agree this is desirable for both teams, maybe we can make it work by exposing various helper functions from the monitoring plugin?

I know we previously agreed upon this solution, but after seeing it in action, I wonder if we get to a better endgame.

afharo · 2020-11-16T12:25:57Z

I'm wondering we can de-couple even more than this. What concerns me is that there is still a good chunk of telemetry code within the monitoring plugin.

@chrisronline I'm sure we can. For starters, bulk_uploader can stop requesting the bulkFetch for the UsageCollectors (I added that to the list of follow-up PRs).

The only reason I didn't implement it here is to make the PRs as concise and easy to review as possible. It's been proven in the past that too-long PRs are easier to introduce bugs. However, I'm happy to push all the changes to this PR if you think it'll be clearer to see the final stage 🙂

I just worry that there is code inside of the monitoring plugin that my team doesn't really own. I feel like the telemetry team owns this code and it should live in a plugin that they also own.

I feel your concerns, and I agree we need to be cautious so we don't break things downstream. On the flip side, I don't think that code can be fully owned by the telemetry team: we don't fully understand the content of the .monitoring indices. And, once usageCollectors' data is not reported to monitoring, we can't maintain what's collected about other products like logstash, beats, app_search, ...

That's part of the reason for us now to report collectionSource: 'monitoring'. In the telemetry cluster, it'll receive data from both collection sources and it can prioritise the data and reported properties from one or the other. This new collector reports "this is the telemetry I've got from the monitoring POV".

I also agree, though, that it shouldn't be fully owned by the monitoring team. It feels to me more like a joined effort.

If we agree this is desirable for both teams, maybe we can make it work by exposing various helper functions from the monitoring plugin?

I'm not a big fan of this solution for this specific purpose. Other plugins can exploit it to gain access to .monitoring. I'd be cautious if we follow that approach.

chrisronline · 2020-11-16T14:16:33Z

@afharo Thanks for the thoughts, much appreciated. Let's move forward with this PR as-is!

…ove-monitoring-strategy

kibanamachine · 2020-11-18T10:37:03Z

💚 Build Succeeded

continuous-integration/kibana-ci/pull-request
Commit: f8fda8a

Metrics [docs]

✅ unchanged

History

💚 Build #88260 succeeded 15542de
💚 Build #86649 succeeded 741b350
💛 Build #86339 was flaky a448d23
💚 Build #86202 succeeded 2eea853
💔 Build #86090 failed b2c9661

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

chrisronline

LGTM!

chrisronline

LGTM!

…82638) (#83686) Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

… into add-logs-to-node-details * 'add-logs-to-node-details' of github.com:phillipb/kibana: (87 commits) [Maps] Add 'crossed' & 'exited' events to tracking alert (elastic#82463) Updating code-owners to use new core/app-services team names (elastic#83731) Add Managed label to data streams and a view switch for the table (elastic#83049) [Maps] Add query bar inputs to geo threshold alerts tracked points & boundaries (elastic#80871) fix(NA): search examples kibana version declaration (elastic#83182) Fixed console error, which appears when saving changes in Edit Alert flyout (elastic#83610) [Alerting] Add `alert.updatedAt` field to represent date of last user edit (elastic#83578) Not resetting server log level if level is defined (elastic#83651) disable incremenetal build for legacy tsconfig.json (elastic#82986) [Workplace Search] Migrate SourceLogic from ent-search (elastic#83593) [Workplace Search] Port Box changes from ent-search (elastic#83675) [APM] Improve router types (elastic#83620) Bump flat to v4.1.1 (elastic#83647) Bump y18n@5 to v5.0.5 (elastic#83644) Bump jsonpointer to v4.1.0 (elastic#83641) Bump is-my-json-valid to v2.20.5 (elastic#83642) [Telemetry] Move Monitoring collection strategy to a collector (elastic#82638) Update typescript eslint to v4.8 (elastic#83520) [ML] Persist URL state for Anomaly detection jobs using metric function (elastic#83507) [ML] Performance improvements to annotations editing in Single Metric Viewer & buttons placement (elastic#83216) ...

…ic#82638) Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

afharo added release_note:skip Skip the PR/issue when compiling release notes v7.11.0 labels Nov 4, 2020

afharo force-pushed the telemetry/remove-monitoring-strategy branch 3 times, most recently from b2c9661 to 2eea853 Compare November 5, 2020 18:09

[Telemetry] Move Monitoring collection strategy to a collector

a448d23

afharo force-pushed the telemetry/remove-monitoring-strategy branch from 2eea853 to a448d23 Compare November 6, 2020 09:52

afharo commented Nov 6, 2020

View reviewed changes

afharo added the v8.0.0 label Nov 6, 2020

afharo marked this pull request as ready for review November 6, 2020 11:40

afharo requested a review from a team as a code owner November 6, 2020 11:40

afharo requested a review from a team November 6, 2020 11:40

afharo mentioned this pull request Nov 6, 2020

Telemetry & Monitoring: Kibana Monitoring & BulkUploader #68998

Closed

4 tasks

afharo linked an issue Nov 6, 2020 that may be closed by this pull request

Telemetry & Monitoring: Kibana Monitoring & BulkUploader #68998

Closed

4 tasks

Merge branch 'master' into telemetry/remove-monitoring-strategy

741b350

This was referenced Nov 9, 2020

[Usage Collection] Extend the APIs provided to the fetch method #75875

Closed

Kibana logs license contains unrecognized parameters on OSS logs #82963

Closed

chrisronline reviewed Nov 10, 2020

View reviewed changes

afharo mentioned this pull request Nov 16, 2020

[UsageCollection] Expose KibanaRequest to explicitly opted-in collectors #83413

Merged

3 tasks

afharo mentioned this pull request Nov 17, 2020

[Telemetry/Monitoring] Monitoring should stop collecting Kibana Usage in the .monitoring indices #83521

Closed

3 tasks

Merge branch 'master' of github.com:elastic/kibana into telemetry/rem…

15542de

…ove-monitoring-strategy

afharo mentioned this pull request Nov 17, 2020

[Monitoring] Stop collecting Kibana Usage in bulkUploader #83546

Merged

2 tasks

afharo added 2 commits November 18, 2020 09:45

Merge branch 'master' of github.com:elastic/kibana into telemetry/rem…

6e8fd46

…ove-monitoring-strategy

Use kibanaRequest to scope the client

f8fda8a

This was referenced Nov 18, 2020

telemetryCollectionManager to use X-Pack strategy as a replacement of the OSS strategy #83622

Closed

Remove the parameter timestamp in /api/telemetry/v2/clusters/_stats #83625

Closed

[Meta][Telemetry] Monitoring decoupling #83626

Closed

Bamieh approved these changes Nov 18, 2020

View reviewed changes

chrisronline approved these changes Nov 18, 2020

View reviewed changes

afharo merged commit 37636f3 into elastic:master Nov 18, 2020

afharo deleted the telemetry/remove-monitoring-strategy branch November 18, 2020 18:16

afharo mentioned this pull request Nov 18, 2020

[7.x] [Telemetry] Move Monitoring collection strategy to a collector (#82638) #83686

Merged

chrisronline pushed a commit to chrisronline/kibana that referenced this pull request Nov 19, 2020

[Telemetry] Move Monitoring collection strategy to a collector (elast…

67b496e

…ic#82638) Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Telemetry] Move Monitoring collection strategy to a collector #82638

[Telemetry] Move Monitoring collection strategy to a collector #82638

afharo commented Nov 4, 2020 •

edited

afharo left a comment

afharo Nov 6, 2020

afharo Nov 6, 2020

afharo Nov 6, 2020

TinaHeiligers Nov 6, 2020

afharo Nov 6, 2020

afharo Nov 6, 2020

afharo Nov 16, 2020 •

edited

afharo Nov 18, 2020

afharo Nov 6, 2020

afharo Nov 6, 2020

afharo commented Nov 9, 2020

chrisronline left a comment

TinaHeiligers commented Nov 10, 2020 •

edited

chrisronline commented Nov 11, 2020

afharo commented Nov 16, 2020

chrisronline commented Nov 16, 2020

kibanamachine commented Nov 18, 2020

chrisronline left a comment

chrisronline left a comment

[Telemetry] Move Monitoring collection strategy to a collector #82638

[Telemetry] Move Monitoring collection strategy to a collector #82638

Conversation

afharo commented Nov 4, 2020 • edited

Summary

Planned follow-up PRs

Checklist

For maintainers

afharo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afharo Nov 16, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

afharo commented Nov 9, 2020

chrisronline left a comment

Choose a reason for hiding this comment

TinaHeiligers commented Nov 10, 2020 • edited

chrisronline commented Nov 11, 2020

afharo commented Nov 16, 2020

chrisronline commented Nov 16, 2020

kibanamachine commented Nov 18, 2020

💚 Build Succeeded

Metrics [docs]

History

chrisronline left a comment

Choose a reason for hiding this comment

chrisronline left a comment

Choose a reason for hiding this comment

afharo commented Nov 4, 2020 •

edited

afharo Nov 16, 2020 •

edited

TinaHeiligers commented Nov 10, 2020 •

edited