Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubernetes SD: Optionally attach namespace and node metadata to targets #9510

Open
rfratto opened this issue Oct 14, 2021 · 12 comments
Open

Comments

@rfratto
Copy link
Member

rfratto commented Oct 14, 2021

Proposal

Kubernetes SD should be extended so the service, pod, endpoints, and ingress roles can optionally attach extra meta labels based on the namespace and node objects that the targets are discovered in (where appropriate; ingress wouldn't support node meta labels, etc.).

This could be used within Prometheus, but also within Grafana Agent (for traces) and Promtail (for logs) to use this new meta information in similar ways.

Use case. Why is this important?
There are a few use cases for where this would be useful. One such use case has been documented in grafana/agent#980.

Another use case is for filtering targets to scrape to those that exist within specific namespaces. In this scenario, managed namespaces are dynamically created on behalf of users where they have full control over the resources deployed in the environment. The namespace, which users do not have permissions to modify, would have labels or annotations that determine whether scraping is available. This makes checking labels/annotations on the pods inappropriate as users have full control over their contents.

@rfratto
Copy link
Member Author

rfratto commented Oct 14, 2021

The obvious workaround in my second use case is to design some kind of admission controller which will force a specific set of labels/annotations on pods based on the namespace. I think it's a reasonable workaround, but I believe that more meta labels based on the location of an object can be used in enough interesting ways to make this feature justifiable.

@gouthamve
Copy link
Member

An example of the labels exposed: __meta_kubernetes_namespace_<labelname>, __meta_kubernetes_pod_node_<labelname>, etc.

@rfratto
Copy link
Member Author

rfratto commented Oct 14, 2021

An example of the labels exposed: _meta_kubernetes_namespace, _meta_kubernetes_pod_node, etc.

Thanks, good idea to include examples. There's probably a lot of ways this could be configured but I imagine we would add something similar to the following for kubernetes_sd_config:

attach_metadata:
  # Attaches node metadata to discovered targets. Only valid for role: pod, endpoints. 
  # When set to true, Prometheus must have permissions to get Nodes. 
  [ node: <boolean> | default = false ]
  # Attaches namespace metadata to discovered targets. Invalid for role: node.
  # When set to true, Prometheus must have permissions to get Namespaces.
  [ namespace: <boolean> | default = false ]

@brancz
Copy link
Member

brancz commented Oct 14, 2021

I think this makes sense. The only thing that might scare me that we should think through is: can we potentially expose information to users to exploit? It has happened in the past in kube-state-metrics for example that we accidentally exposed all annotations by default of secrets, which leaked secret content of the kubectl last-applied into an annotation. Just want to make sure we think this through, otherwise I can think of lots of useful use cases for this.

@rfratto
Copy link
Member Author

rfratto commented Oct 14, 2021

Security considerations are a great point. Off the top of my head, I don't know if there's any sensitive information that lives inside the metadata for nodes and namespaces, but I agree with being cautious before accepting the proposal.

Generally speaking, what is Prometheus' approach to security concerns when adding new features? Does the potential to exploit something (even if that something is disabled by default) generally lead to not adding the new functionality?

@roidelapluie
Copy link
Member

We already have node labels because we have a node role

Available meta labels:

__meta_kubernetes_node_name: The name of the node object.
__meta_kubernetes_node_label_<labelname>: Each label from the node object.
__meta_kubernetes_node_labelpresent_<labelname>: true for each label from the node object.
__meta_kubernetes_node_annotation_<annotationname>: Each annotation from the node object.
__meta_kubernetes_node_annotationpresent_<annotationname>: true for each annotation from the node object.
__meta_kubernetes_node_address_<address_type>: The first address for each node address type, if it exists.

@roidelapluie
Copy link
Member

Generally speaking, what is Prometheus' approach to security concerns when adding new features? Does the potential to exploit something (even if that something is disabled by default) generally lead to not adding the new functionality?

I have made a PuppetDB service discovery and I have added:

# Whether to include the parameters as meta labels.
# Due to the differences between parameter types and Prometheus labels,
# some parameters might not be rendered. The format of the parameters might
# also change in future releases.
#
# Note: Enabling this exposes parameters in the Prometheus UI and API. Make sure
# that you don't have secrets exposed as parameters if you enable this.
[ include_parameters: <boolean> | default = false ]

@thejosephstevens
Copy link

Security considerations are a great point. Off the top of my head, I don't know if there's any sensitive information that lives inside the metadata for nodes and namespaces, but I agree with being cautious before accepting the proposal.

Just looking at my nodes (checked one in each of GKE, EKS, and AKS), and the labels and annotations are all non-sensitive. The closest thing I could construe as sensitive is network metadata (CIDR ranges), internal names (resource names can include things like company and customer references), but all things that live within the standard realm of metadata you would associate with environment telemetry for the use in queries. I'm interested in what happened with kube-state-metrics here, it seems like labels and annotations are strongly considered by kubernetes to be non-sensitive identifying information.

For a bit of context on my use case (from the grafana-agent issue), we have clusters split up into pools, and we're interested in running queries on resource utilization metrics grouped by pool (currently not possible for any metrics gathered by something like kubernetes_sd pod, svc, or endpoint role). I think the proposal here makes a lot of sense as a generic approach so users can quickly enrich their targets and handle the relabeling to get what they actually want as normal. It would definitely fulfill the needs of my use case.

@rfratto
Copy link
Member Author

rfratto commented Oct 15, 2021

We already have node labels because we have a node role

Available meta labels:

__meta_kubernetes_node_name: The name of the node object.
__meta_kubernetes_node_label_<labelname>: Each label from the node object.
__meta_kubernetes_node_labelpresent_<labelname>: true for each label from the node object.
__meta_kubernetes_node_annotation_<annotationname>: Each annotation from the node object.
__meta_kubernetes_node_annotationpresent_<annotationname>: true for each annotation from the node object.
__meta_kubernetes_node_address_<address_type>: The first address for each node address type, if it exists.

I imagine we'd want something similar for namespaces, though there's a small wart here:

__meta_kubernetes_namespace_name: The name of the Namespace object.
__meta_kubernetes_namespace_label_<labelname>: Each label from the node object.
__meta_kubernetes_namespace_labelpresent_<labelname>: true for each label from the node object.
__meta_kubernetes_namespace_annotation_<annotationname>: Each annotation from the node object.
__meta_kubernetes_namespace_annotationpresent_<annotationname>: true for each annotation from the node object.

__meta_kubernetes_namespace_name might feel a little awkward since __meta_kubernetes_namespace already exists for many targets.

@krya-kryak
Copy link
Contributor

krya-kryak commented Oct 18, 2021

We already have node labels because we have a node role

Available meta labels:

__meta_kubernetes_node_name: The name of the node object. __meta_kubernetes_node_label_<labelname>: Each label from the node object. __meta_kubernetes_node_labelpresent_<labelname>: true for each label from the node object. __meta_kubernetes_node_annotation_<annotationname>: Each annotation from the node object. __meta_kubernetes_node_annotationpresent_<annotationname>: true for each annotation from the node object. __meta_kubernetes_node_address_<address_type>: The first address for each node address type, if it exists.

I honestly think those are nice for a cluster admin, but don't facilitate k8s-hosted service developer as his pods and services are usually discovered with endpoint discovery. Another use-case we have:
Say, you have a cross-az k8s cluster with some pods of SUPER_APP running across some of the nodes. When SUPER_APP suddenly gets a surge of erroneous responses, person responsible for it should have a simple way to find out if it's a global issue, or it's local to some AZ / network pod / particular rack / particular node / etc. Since k8s the app in k8s pod doesn't have a sane way of knowing where it runs (and why should it know?) it's unable to report any of those things. That info is contained in node labels, and there is no way to propagate them to pod: kubernetes/kubernetes#40610
Therefore, I think prometheus should be able to add those labels to discovered endpoint, enabling the user to simply query for rate(errors_total{zone="us-east-1", rack="123"}[1m]).
That is not limited to topology, of course, lots of useful things find its way to node labels: hw configurations, some sysctl settings, soft maintenance flags, etc

@Whyeasy
Copy link

Whyeasy commented Apr 13, 2022

We already have node labels because we have a node role
Available meta labels:
__meta_kubernetes_node_name: The name of the node object.
__meta_kubernetes_node_label_<labelname>: Each label from the node object.
__meta_kubernetes_node_labelpresent_<labelname>: true for each label from the node object.
__meta_kubernetes_node_annotation_<annotationname>: Each annotation from the node object.
__meta_kubernetes_node_annotationpresent_<annotationname>: true for each annotation from the node object.
__meta_kubernetes_node_address_<address_type>: The first address for each node address type, if it exists.

I imagine we'd want something similar for namespaces, though there's a small wart here:

__meta_kubernetes_namespace_name: The name of the Namespace object. __meta_kubernetes_namespace_label_<labelname>: Each label from the node object. __meta_kubernetes_namespace_labelpresent_<labelname>: true for each label from the node object. __meta_kubernetes_namespace_annotation_<annotationname>: Each annotation from the node object. __meta_kubernetes_namespace_annotationpresent_<annotationname>: true for each annotation from the node object.

__meta_kubernetes_namespace_name might feel a little awkward since __meta_kubernetes_namespace already exists for many targets.

For me this would be a great help to have this capability for namespaces.

valyala added a commit to VictoriaMetrics/VictoriaMetrics that referenced this issue Apr 22, 2022
…s and annotations to discovered pod targets in the same way as Prometheus 2.35 does

See prometheus/prometheus#9510
and prometheus/prometheus#10080
valyala added a commit to VictoriaMetrics/VictoriaMetrics that referenced this issue Apr 22, 2022
…s and annotations to discovered pod targets in the same way as Prometheus 2.35 does

See prometheus/prometheus#9510
and prometheus/prometheus#10080
@rongshengfang
Copy link

rongshengfang commented May 3, 2022

It looks like PR #10080 only applies to targets discovered through the pod role. Is there any plan to expand this feature to the endpoints role? For example, we'd like to have the node metadata attached to the node level metrics such as node_memory_MemAvailable_bytes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants