Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Monitoring] Add cluster metadata to cluster_stats docs (#33860) #34023

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Sep 25, 2018

Backport of #33860 and #34040.

This PR teaches Monitoring to collect cluster metadata, if any is set, and index it into cluster_stats docs in .monitoring-es-*.

After this PR, cluster_stats docs in .monitoring-es-* will contain an additional top-level cluster_settings field like so:

{
   ...
   "cluster_settings": {
     "cluster": {
       "metadata": {
         ...
       }
     }
   }
}

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@tlrx
Copy link
Member

tlrx commented Sep 26, 2018

Do we really want to index all user metadata cluster settings? I mean, it may contain sensitive data?

@ycombinator
Copy link
Contributor Author

ycombinator commented Sep 28, 2018

Do we really want to index all user metadata cluster settings? I mean, it may contain sensitive data?

If a user sets sensitive data in metadata cluster settings, wouldn't that also be visible via the GET _cluster/settings API? If so, then I don't know if it is less secure for monitoring to collect this data.

[EDIT] Also, it looks like when any cluster setting is set, we emit an INFO level log line about it in the ES log, for example:

[2018-09-28T07:54:26,984][INFO ][o.e.c.s.ClusterSettings  ] [Shaunaks-MBP-2] updating [cluster.metadata.my_bank_password] from [] to [supersecret123abc]

@pickypg
Copy link
Member

pickypg commented Sep 28, 2018

Perhaps we should expose a flag that allows this to be disabled. Local access could be blocked for unauthorized users, but then anyone with access to the monitoring data still would. We can warn that it means the display name cannot be changed but it’s a trade off they can be aware of.

@gwbrown
Copy link
Contributor

gwbrown commented Sep 28, 2018

Maybe we should add a note in the docs that the cluster metadata is not a place to store sensitive information? It really wasn't intended for anything that needs to be secure.

@ycombinator
Copy link
Contributor Author

Personally, I'm in favor of the documentation approach. If the intent of cluster metadata was never to allow sensitive information to be stored, then we should make that clear in our docs anyway, IMO. Especially since, as I mentioned in my edited comment above, this problem isn't specific to monitoring; a user could also get the same information by calling the Cluster Settings API.

@pickypg Could you clarify what you meant by "expose a flag that allows this to be disabled"? Specifically, could you expand on what you meant by "this" in that line?

The reason I ask is that I'm also thinking ahead to when Metricbeat will need to poll the Cluster Settings API to retrieve cluster metadata. So would the flag you are suggesting affect this API or just the cluster_stats collector within ES monitoring?

@tlrx
Copy link
Member

tlrx commented Oct 1, 2018

+1 for adding a warning in documentation to not store any sensitive data in the cluster metadata.

The reason I ask is that I'm also thinking ahead to when Metricbeat will need to poll the Cluster Settings API to retrieve cluster metadata.

For my education, what kind of cluster metadata is retrieved by Metricbeat? Would it make sense to only collect and index metadata related to metric beat? Like display_name and filters out everything else?

@ycombinator
Copy link
Contributor Author

ycombinator commented Oct 1, 2018

what kind of cluster metadata is retrieved by Metricbeat? Would it make sense to only collect and index metadata related to metric beat? Like display_name and filters out everything else?

Yeah, we could do this, both in the internal ES collector code (in this PR + against master) and in the corresponding metricbeat code as well (elastic/beats#8445). I'm okay with this for now; we can always relax the restriction to include other specific cluster metadata keys or even all cluster metadata keys in the future. @pickypg WDYT?

@ycombinator
Copy link
Contributor Author

ycombinator commented Oct 10, 2018

Hey @tlrx, @pickypg actually originally suggested only collecting this specific metadata key (display_name):

The easiest approach would be to explicitly grab that key and include it in the cluster_stats, or possibly include all of the cluster.metadata key/value pairs and simply have the UI pluck the one that is interesting.

To which, @jasontedor responded that he would prefer to collect all metadata keys, so ES had no special knowledge of that specific key:

I think we should pull all the metadata keys, rather than a specific one that doesn't have meaning to Elasticsearch.

@jasontedor Please see the concerns in this PR that @tlrx has raised about the possibility of collecting sensitive information stored by users in metadata keys. In light of these concerns, are you okay with just collecting the display_name metadata key, at least for now?

@rjernst rjernst removed the review label Oct 10, 2018
@pickypg
Copy link
Member

pickypg commented Oct 23, 2018

@pickypg WDYT?

Both work for me. There's two sides:

On one hand, we risk pulling out some secrets that someone unfortunately put in a place that shouldn't have secrets. We should document not to do that in the first place. On the other hand, we can avoid that by explicitly grabbing the display_name, which forces some knowledge onto ES of what using it.

Of those two, I think the second one is less wrong because the ES collector's already know how the other side is using them because otherwise it wouldn't work whenever we add semi-arbitrary fields that ES/Metricbeat has the ability to more easily fetch than the Monitoring UI, which either has to do something expensive in some cases, or literally has no chance to reach the monitored cluster.

@ycombinator
Copy link
Contributor Author

In the interest of moving forward on this PR, unless I hear differently in the next day or so, I'm going to go with @pickypg's suggestion:

Of those two, I think the second one is less wrong because the ES collector's already know how the other side is using them because otherwise it wouldn't work whenever we add semi-arbitrary fields that ES/Metricbeat has the ability to more easily fetch than the Monitoring UI, which either has to do something expensive in some cases, or literally has no chance to reach the monitored cluster.

Meaning, I will update this PR (and the corresponding code in master via a separate PR later) so that the ClusterStatsCollector only indexes the display_name property, for now.

* WIP

* Adding cluster metadata to cluster stats monitoring doc

* Fixing checkstyle errors

* Adding missing license header

* Updating tests

* Getting cluster settings from cluster state

* Removing more unnecessary changes

* Adding cluster metadata settings to cluster_stats docs

* Updating test to include cluster metadata

* Fixing checkstyle

* Guarding against NPE

* Updating test fixture
@ycombinator ycombinator force-pushed the x-pack/monitoring/collect-cluster-metadata/6.x branch from a2095a9 to e4689df Compare November 1, 2018 23:28
Copy link
Member

@pickypg pickypg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ycombinator
Copy link
Contributor Author

jenkins, test this

@ycombinator ycombinator merged commit ee5aacb into elastic:6.x Nov 5, 2018
@ycombinator ycombinator deleted the x-pack/monitoring/collect-cluster-metadata/6.x branch November 5, 2018 18:36
ycombinator added a commit that referenced this pull request Nov 6, 2018
This is a forward port of some of the changes made in #8445, specifically the change mentioned in #34023 (comment).

Currently, in master, the `cluster_stats` collector collects _all_ cluster metadata and indexes it into `.monitoring-es-*`. However, per the discussion linked to above, we decided to collect _only_ the `display_name` cluster metadata setting for now. This PR makes this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants