Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multiple ports in annotations #3756

Closed
vaijira opened this issue Jan 29, 2018 · 45 comments
Closed

Support for multiple ports in annotations #3756

vaijira opened this issue Jan 29, 2018 · 45 comments

Comments

@vaijira
Copy link

vaijira commented Jan 29, 2018

The use case is if you have for example several containers in a pod, currently with the port annotation it seems you can only specify one port. The only way i see to scrape several ports is removing the port annotation, the problem is that ports that i dont wanna get scraped are scraped.

@brian-brazil
Copy link
Contributor

This is really a feature request for Kubernetes to have per-port labels/annotations, we can only work off the metadata that k8 provides to us.

@vaijira
Copy link
Author

vaijira commented Jan 29, 2018

Thanks for the fast answer, perhaps i'm saying nonsense but it wouldn't be possible to have an annotation like prometheus.io/ports: "9101,9102" in prural?

@brian-brazil
Copy link
Contributor

Of course you can have such an annotation, Kubernetes permits it.
That's not going to be much use in Prometheus though if you're using it for what I think you're trying to use it for, which is attempting a workaround for there not being per-port annotations in Kubernetes.

@matthiasr
Copy link
Contributor

@brian-brazil you're not being helpful here. Per-port annotations are one way to make this possible, a more expressive relabelling language would be another. You don't want the latter, so you're just pushing for the former, which is not going to happen either because fundamentally ports are not annotatable objects and never will be.

@vaijira There is a way to do this already, but it comes with some caveats.

Because of the deliberately limited relabelling language, you cannot support an arbitrary number of ports, but you can support ports up to some limit of your choice (up to 3 ports, up to 10 ports, any number you choose). The way you can do this is to have one job definition for each of the possible positions of ports in the list, and then for example for the 2nd port match something like [0-9]+,([0-9]+)(:?,[0-9]+)*.

The other consequence is higher load on the Kubernetes API, because Prometheus will open a separate connection and watch for each job description (there is an issue for avoiding that somewhere). For this reason, we default to supporting 1 port only, and raise this limit only for the few apps that actually need more than one.

@brian-brazil
Copy link
Contributor

there is an issue for avoiding that somewhere

#2191

@tomwilkie
Copy link
Member

One other alternative is to key SD off of port names - for instance, we only scrape pod ports who's name ends with -metrics. This allows you to scrape multiple port per pod quite easily. Heres our Prometheus config:

https://github.com/kausalco/public/blob/master/prometheus-ksonnet/lib/prometheus-config.libsonnet#L72

@josdotso
Copy link

@tomwilkie This approach is brilliant. Thanks!

Defined as a second pod job to what is at kubernetes/charts, with some tweaks, I was able to achieve multi-port scrapes without annotations. The port name suffix I'm using is xp because go text/template is sensitive to hyphens, which might become an issue in some of the stock charts.

@bwplotka
Copy link
Member

bwplotka commented May 3, 2018

Yea, we use port naming as an indicator what port should be scraped for pod, but it is limited:

  • port name has limit of 15 chars
  • you need to have same scheme and __metric_path__ for all containers within pod

It's fine for us for now, but can cause some troubles if you don't have any way to change metric_path on which metrics are exposed for both containers within pod.

Started some discussion here: https://groups.google.com/forum/#!topic/prometheus-users/ihMUWtX477Q Sorry I missed this issue here, my topic is kind of duplicate.

@bwplotka
Copy link
Member

bwplotka commented May 3, 2018

There is another, more verbose explanation of solution mentioned in begginning (from google group's topic):

My post actually made me think about more explicit, but flexible solution - that we can just do annotation like in the example but with key like this:
prometheus.io/0/{port, scrape, scheme, path} and add additional scrape config for each potential sidecar. So if you have 3 containers that you want to scrape you can put in the pod appropriate annotations:
prometheus.io/0/{port, scrape, scheme, path}, prometheus.io/1/{port, scrape, scheme, path} and prometheus.io/2/{port, scrape, scheme, path} and have 3 separate scrape jobs (pods k8s SD) that scrape first, second and third container. This will not automatically work for 4-container pods, but for most users, you never have more than 3 you want to monitor.

@bwplotka
Copy link
Member

bwplotka commented Mar 5, 2019

How crazy would be to extend relabelling to add some kind of conditional logic? For example:

    - source_labels: [a, b, c, d]
      regex:         "^(.*);(x.*);(something.*);.*"
      target_label:  new_label
      replacement:  "$1-$3"
      # New relabelling field for comparison. Rule is executed only if those N args string matches.
      if:  ["eq", "$1", "$2"]

That would solve this issue immidiately and will allow for more flexibility for different providers.

Any thoughts?

So then you can have annotations like:

example.com/9090metric_path="/_metrics"
example.com/9093metric_path="/metrics"
example.com/9099metric_path="/status?format=prometheus"

@discordianfish
Copy link
Member

Not tested but I think another alternative is using a service for each port and use the annotations there. But I'd also happy if we had a one-to-n mapping in relabeling.

@bwplotka
Copy link
Member

bwplotka commented Apr 5, 2019

So.. again, there is no good solution here.

Other option and it seems recommended & ultimate way is to do CRD based scrape configuration, which quite neat and insane in the same time. This is exactly what Prometheus Operator does here, however I am not necessarily like sticking to Endpoints.

I wonder if the better way would be to allow pointing to pods directly. I might be missing some important discussion that lead to decision for Endpoints. If I am not wrong, Endpoints are created only by Service that has some selectors, otherwise you need to create those manually. Selecting pods would need probably need some even more magic scrape configuration.

Anyway, all of this unfortunately involves using some kind of operator, like Prom Operator itself. Since there is no custom resource definition API in scrape Kubernetes discovery (why?), the only solution is genertaing Prometheus configuration on the fly, based on CRDs from KubeAPI and reload it in runtime to given Prometheus. It's quite insane ! Also kudos to Prometheus Operator maintainers for implementing this and delivering so well.

All of this would be not needed if the scrape configuration would be improved in Prometheus itself ):

@discordianfish
Copy link
Member

discordianfish commented Apr 6, 2019

@bwplotka Prometheus discovers pods, but what I and @vaijira want is finding multiple targets per single pod. That's what isn't possible because there is only one target in SD (the pod) and no way to create two targets from one target with relabeling.

@bwplotka
Copy link
Member

bwplotka commented Apr 7, 2019

@discordianfish You are missing important thing here. Prometheus discovers Pod's ports not just pods. However because of:

  • pattern for reusing Pod annotations and limitation of relabel I described in 4 comments above
  • lack of port annotations for Kubernetes

.. Prometheus discovery gives false impression of just one target/port per pod, as annotation happend to be (accidently) a popular pattern.

There is ongoing discussion for years how to resolve this in easiest way and I think we should start maintain some table of potential solutions and their pros/cons ;p

@bwplotka
Copy link
Member

bwplotka commented Apr 7, 2019

Added some table of existing solutions for starting point: https://docs.google.com/document/d/1S6O1czHtjR2DGfK2zeZDLr88wxyLRXB2JpH5g7YM1J8/edit?usp=sharing

@pbenefice
Copy link

pbenefice commented Apr 16, 2019

Hi,

I'm also facing the need to scrape two container within a single pod and gave a shot at the "Multiple Pod annotations" solution within the google doc

I though I'd share my issues :

  • This is minor but I think the prometheus.io/0/{port, scrape, scheme, path} won't work as is (see here).

    Valid annotation keys have two segments: an optional prefix and name, separated by a slash
    Instead, maybe we could go for :

    0.prometheus.io/{port, scrape, scheme, path}
    1.prometheus.io/{port, scrape, scheme, path}
    
  • Unless I'm wrong this solution imply using the prometheus deduplication of exact same targets. When I use for example :

    #Annotations in kubernetes (within a StatefulSet exposing two container, and one port per container) : 
    0.prometheus.io/scrape: "true"
    0.prometheus.io/port: "9090"
    0.prometheus.io/path: "/metrics"
    1.prometheus.io/scrape: "true"
    1.prometheus.io/port: "8080"
    1.prometheus.io/path: "/metrics"
    
    #jobs definitions :
    - job_name: "kubernetes-containers-0"
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_annotation_0_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_0_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
    
    - job_name: "kubernetes-containers-1"
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_annotation_1_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_1_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
    

    Unless I'm mistaking, prometheus discover two targets for each jobs, but the two targets are exactly the same.
    Therefore Prometheus keeps only one target for each job and I end up "well"
    But, except for the job name, the labels will be identical for the two series, I would have wanted to add the __meta_kubernetes_pod_container_name with relabelling,
    but if I do so I lose the unicity of labels and end up with duplicated targets : prometheus won't deduplicate those anymore.

    One more downside to this solution is that you can't have labels that differ (other than the job name) from a container to another, in my case the name.

@pbenefice
Copy link

Hi @bwplotka, (related to my comment just above)

I don't know if this idea is similar to the if you already mentionned but how crazy would it be to be able to use labels within the regex ?
In my case I guess it would solve the thing :

#Annotations in kubernetes (within a StatefulSet exposing two container, and one port per container) : 
0.prometheus.io/port: "9090"
0.prometheus.io/path: "/metrics"
1.prometheus.io/port: "8080"
1.prometheus.io/path: "/metrics"

#job definition :
- job_name: "kubernetes-containers-0"
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_container_port_number]
    action: keep
    regex: __meta_kubernetes_pod_annotation_0_prometheus_io_scrape

- job_name: "kubernetes-containers-1"
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - source_labels: [__meta_kubernetes_pod_container_port_number]
    action: keep
    regex: __meta_kubernetes_pod_annotation_1_prometheus_io_scrape

The idea here would be : "If the port defined within the annotations match the port of the pod discovered : keep the target, otherwise drop it"

@bwplotka
Copy link
Member

Unless I'm mistaking, prometheus discover two targets for each jobs, but the two targets are exactly the same.

Why? If you target address you will get two targets per JOB (if you have single Pod with only 2 containers). Since the address is exactly the same K8s service will use just one.

would have wanted to add the __meta_kubernetes_pod_container_name with relabelling,
but if I do so I lose the unicity of labels and end up with duplicated targets : prometheus won't deduplicate those anymore.

Yes! You have couple of options, I will add them to the doc. Also please comment there.

  • Add instance label with ip:port
  • (Better): map job to be app+container_name

@bwplotka
Copy link
Member

Your second idea is kind of if -> relabelling does not allow to use regex build from values from another labels. Regexp is build on config reload time.

@pbenefice
Copy link

Your second idea is kind of if -> relabelling does not allow to use regex build from values from another labels. Regexp is build on config reload time.

I did'nt expect that quick of answer, but yeah, I know it's not possible, I was just saying this feature would solve my case, as the Port Annotations from kubernetes would solve it, but I understand it's not available. (for now ? 😜 )

@ardenxu
Copy link

ardenxu commented Jul 5, 2019

we run into same issue when monitoring both kube-dns + dnsmasq, and they were in same POD. any further update on this chain?

@oppai
Copy link

oppai commented Jul 5, 2019

I resolved multi port scraping problem to use exporter that merged any ports.
https://github.com/rebuy-de/exporter-merger

@nikicat
Copy link

nikicat commented Oct 2, 2019

👍 for kube-dns + dnsmasq monitoring

@thernstig
Copy link

thernstig commented Nov 14, 2022

I believe this has been solved now as part of my proposal in #11556, which ended up in the solution at #11564. (The solution is slightly simplified compared to the proposal, but the solution works to solve the problem stated in this issue I believe.)

The solution adds these two new relabel actions:

keepequal: Drop targets for which the concatenated source_labels do not match target_label.
dropequal: Drop targets for which the concatenated source_labels do match target_label.

Here are two ways to solve this now (caveat 1: I wrote this quickly so apologize for any incorrectness. caveat 2: The setting of speed/interval might not work as I do here. If so separate jobs is needed "per speed"):

Solution 1

# Pod annotations
metadata:
 annotations:            # containername:scheme:port:path:speed
   prometheus.io/scrape_1: "server:https:8585:/metrics:normal"
# Relabel config for Prometheus
- job_name: k8s-pod-1
  scheme: https
  tls_config:
    ca_file: /run/secrets/cacert/cacertbundle.pem
    cert_file: /run/secrets/clicert/clicert.pem
    key_file: /run/secrets/clicert/cliprivkey.pem
    server_name: certified-scrape-target
    insecure_skip_verify: false
  kubernetes_sd_configs:
    - role: pod
      namespaces:
        names:
          - {{ .Release.Namespace }}
  relabel_configs:
    # Extract the wanted target parameters from the annotation
    - source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape_1
      action: replace
      target_label: __tmp_containername
      regex: "([^:]+):[^:]+:[^:]+:[^:]+:[^:]+"
    - source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape_1
      action: replace
      target_label: __scheme__
      regex: "[^:]+:([^:]+):[^:]+:[^:]+:[^:]+"
    - source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape_1
      action: replace
      target_label: __tmp_port
      regex: "[^:]+:[^:]+:([^:]+):[^:]+:[^:]+"
    - source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape_1
      action: replace
      target_label: __metrics_path__
      regex: "[^:]+:[^:]+:[^:]+:([^:]+):[^:]+"
    - source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape_1
      action: replace
      target_label: __tmp_speed
      regex: "[^:]+:[^:]+:[^:]+:[^:]+:([^:]+)"
    # Keep only targets that match the annotation's 'containername' and 'port'
    - source_labels:
        - __tmp_containername
      target_label: __meta_kubernetes_pod_container_name
      action: keepequal
    - source_labels:
        - __tmp_port
      target_label: __meta_kubernetes_pod_container_port_number
      action: keepequal
    # Set some extra config
    - source_labels:
        - __address__
        - __meta_kubernetes_pod_container_port_number
      action: replace
      regex: ((?:\[.+\])|(?:.+))(?::\d+);(\d+)
      replacement: $1:$2
      target_label: __address__
    # This is EXPERIMENTAL, see docs for __scrape_interval__  and __scrape_timeout__  
    - source_labels:
        - __tmp_speed
      action: replace
      regex: normal
      replacement: "15s"
      target_label: __scrape_interval__
    - source_labels:
        - __tmp_speed
      action: replace
      regex: normal
      replacement: "15s"
      target_label: __scrape_timeout__

Solution 2

# Pod annotations
metadata:
 annotations:
   prometheus.io/scrape_1: "true"
   prometheus.io/container_1: "server"
   prometheus.io/scheme_1: "https"
   prometheus.io/port_1: "8585"
   prometheus.io/path_1: "/metrics"
   prometheus.io/speed_1: "normal"
- job_name: k8s-pod-1
  scheme: https
  tls_config:
    ca_file: /run/secrets/cacert/cacertbundle.pem
    cert_file: /run/secrets/clicert/clicert.pem
    key_file: /run/secrets/clicert/cliprivkey.pem
    server_name: certified-scrape-target
    insecure_skip_verify: false
  kubernetes_sd_configs:
    - role: pod
      namespaces:
        names:
          - {{ .Release.Namespace }}
  relabel_configs:
    # Keep only targets that match the annotations needed to keep a target
    - source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape_1
      action: keep
      regex: "true"
    - source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_container_1
      target_label: __meta_kubernetes_pod_container_name
      action: keepequal
    - source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_port_1
      target_label: __meta_kubernetes_pod_container_port_number
      action: keepequal
    # Set some extra config
    - source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scheme_1
      action: replace
      target_label: __scheme__
    - source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_path_1
      action: replace
      target_label: __metrics_path__
    - source_labels:
        - __address__
        - __meta_kubernetes_pod_container_port_number
      action: replace
      regex: ((?:\[.+\])|(?:.+))(?::\d+);(\d+)
      replacement: $1:$2
      target_label: __address__
    # This is EXPERIMENTAL, see docs for __scrape_interval__  and __scrape_timeout__  
    - source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_speed_1
      action: replace
      regex: normal
      replacement: "15s"
      target_label: __scrape_interval__
    - source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_speed_1
      action: replace
      regex: normal
      replacement: "15s"
      target_label: __scrape_timeout__

@thernstig
Copy link

@bwplotka do you believe the above solves the situation described in https://docs.google.com/document/d/1S6O1czHtjR2DGfK2zeZDLr88wxyLRXB2JpH5g7YM1J8/edit#heading=h.7gs8bmrdm10n ?

valyala added a commit to VictoriaMetrics/VictoriaMetrics that referenced this issue Dec 22, 2022
…ng actions

These actions are supported by Prometheus starting from v2.41.0

See prometheus/prometheus#11564 ,
prometheus/prometheus#11556
and prometheus/prometheus#3756

Side note:

It's a pity that Prometheus developers decided inventing `keepequal` and `dropequal`
relabeling actions instead of adding support for `keep_if_equal` and `drop_if_equal` relabeling
actions supported by VictoriaMetrics since June 2020 - see 2a39ba6 .
valyala added a commit to VictoriaMetrics/VictoriaMetrics that referenced this issue Dec 22, 2022
…ng actions

These actions are supported by Prometheus starting from v2.41.0

See prometheus/prometheus#11564 ,
prometheus/prometheus#11556
and prometheus/prometheus#3756

Side note:

It's a pity that Prometheus developers decided inventing `keepequal` and `dropequal`
relabeling actions instead of adding support for `keep_if_equal` and `drop_if_equal` relabeling
actions supported by VictoriaMetrics since June 2020 - see 2a39ba6 .
@RamakrishnaHande
Copy link

Thanks for the fast answer, perhaps i'm saying nonsense but it wouldn't be possible to have an annotation like prometheus.io/ports: "9101,9102" in prural?

Hi Is that a question ?? I am looking for a similar trick for one of my projects. Did it work for you that way ?

@thernstig
Copy link

@roidelapluie @juliusv I believe this can be closed now as it is fixed in #11564

@RamakrishnaHande
Copy link

@roidelapluie @juliusv I believe this can be closed now as it is fixed in #11564

Please let me know what the solution is : @thernstig

@thernstig
Copy link

@roidelapluie @juliusv I believe this can be closed now as it is fixed in #11564

Please let me know what the solution is : @thernstig

#3756 (comment)

@RamakrishnaHande
Copy link

@thernstig

- job_name: k8s-pod-1

I am sorry if my question is too basic, but which file above entry should belong to ?? Also what should be the path of this file ?

Also

metadata:
annotations:
prometheus.io/scrape_1: "true"
prometheus.io/container_1: "server"
prometheus.io/scheme_1: "https"
prometheus.io/port_1: "8585"
prometheus.io/path_1: "/metrics"
prometheus.io/speed_1: "normal"

These should be part of kubernetes deployment.yaml right ??

@thernstig
Copy link

thernstig commented Jan 12, 2023

@RamakrishnaHande this is unfortunately not a general support forum so you would have to turn somewhere else to understand how to setup Prometheus in general.

@RamakrishnaHande
Copy link

Thanks for the fast answer, perhaps i'm saying nonsense but it wouldn't be possible to have an annotation like prometheus.io/ports: "9101,9102" in prural?

Hello, did you manage to solve this problem ?? Could you please share one such working example

@putrasattvika
Copy link

putrasattvika commented Jul 20, 2023

Please correct me if I'm wrong, but I think having some way to compare the value of two (or more) labels should solve this problem.

The approach posted here does seem to work, but it still requires multiple scrape configs (which depends on the maximum number of ports we want to scrape from the same pod). Personally I think this is not ideal as (1) config duplication does not seem like a good practice, and (2) the Kubernetes Pod service discovery already generates one target per container port.

What I'm thinking is that we can set up a Pod annotation like this:

metadata:
  annotations:
    playground.prometheus.io/endpoints: ":8001/-/metrics,:8002/actuator/prometheus"

And then for each target (i.e. container port) discovered by the Pod SD, the relabel configs will have to check whether its port number (from the __meta_kubernetes_pod_container_port_number meta label) is present in the playground.prometheus.io/endpoints annotation.

AFAIK Prometheus does not have any mechanism to compare values of two different labels, so I'm wondering whether supporting template rendering of label values in the regex_template field would be a good alternative. The text/template template in regex_template will be rendered first before compiled to regexp.Regexp.

relabel_configs:
  ...

  # Keep targets where the value of label1 matches the concatenation of label2 and label3
  - action: keep

    source_labels:
      - label1

    regex_template: "{{ .label2 }}{{ .label3 }}"

  # The relabel step above can also be written like this:
  - action: keep

    source_labels:
      - label2
      - label3

    separator: ""

    regex_template: "{{ .label1 }}"

The whole scraping job can be implemented like this:

job_name: "kubernetes"

kubernetes_sd_configs:
  - role: pod

relabel_configs:
  # Keep targets which port number is present on the playground.prometheus.io/endpoints annotation
  - action: keep

    source_labels:
      - __meta_kubernetes_pod_annotation_id_sattvika_prometheus_endpoints

    regex_template: ".*(?::{{ .__meta_kubernetes_pod_container_port_number }}(\\/[^,]+)).*"

  # Rewrite __address__ with the pod's IP and container port
  - action: replace

    source_labels:
      - __meta_kubernetes_pod_ip
      - __meta_kubernetes_pod_container_port_number

    separator: ":"
    regex: "(.+)"

    replacement: $1
    target_label: __address__

  # Rewrite __metrics_path__ with the path specified in the playground.prometheus.io/endpoints
  # annotation
  - action: replace

    source_labels:
      - __meta_kubernetes_pod_annotation_id_sattvika_prometheus_endpoints

    regex_template: ".*(?::{{ .__meta_kubernetes_pod_container_port_number }}(\\/[^,]+)).*"

    replacement: $1
    target_label: __metrics_path__

  # Rewrite instance with the pod's name and the container port we're scraping
  - action: replace

    source_labels:
      - __meta_kubernetes_pod_name
      - __meta_kubernetes_pod_container_port_number

    separator: ":"
    regex: "(.+)"

    replacement: $1
    target_label: instance

I have a very hacky PoC implemented in cermati@508db2d and with the following Deployment, I managed to scrape some of Pod's exposed ports without using multiple scraping configs:

Screenshot_20230720_143720

Deployment YAML
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: testing-app-2
  name: testing-app-2

spec:
  replicas: 2

  selector:
    matchLabels:
      app: testing-app-2

  template:
    metadata:
      annotations:
        playground.prometheus.io/endpoints: ":8001/-/metrics,:8002/actuator/prometheus"

      labels:
        app: testing-app-2

    spec:
      terminationGracePeriodSeconds: 2

      containers:
        - name: testing-container
          image: giantswarm/tiny-tools:3.10

          command:
            - sh
            - -c
            - tail -f /dev/null

          ports:
            - containerPort: 80
              name: http
              protocol: TCP

            - containerPort: 443
              name: https
              protocol: TCP

            - containerPort: 9145
              name: http-metrics
              protocol: TCP

        - name: testing-sidecar-1
          image: giantswarm/tiny-tools:3.10

          command:
            - sh
            - -c
            - tail -f /dev/null

          ports:
            - containerPort: 8001
              name: http-metrics-2
              protocol: TCP

        - name: testing-sidecar-2
          image: giantswarm/tiny-tools:3.10

          command:
            - sh
            - -c
            - tail -f /dev/null

          ports:
            - containerPort: 8002
              name: http-metrics-3
              protocol: TCP

It does look hacky though, esp. if the regex pattern is very complex. But, if the maintainers are fine with this, I'm willing to submit a proper PR for this addition.


Update: I just realized that this approach is very similar to #11556, just with a slightly different syntax that makes it easier to implement.

@beorn7
Copy link
Member

beorn7 commented Feb 27, 2024

Hello from the bug scrub.

This has been simmering for a while. We would like to bring it to some conclusion, but it's hard to make call. We'll bring this up in the dev-summit.

@thernstig
Copy link

@beorn7 does #3756 (comment) not solve it?

@beorn7
Copy link
Member

beorn7 commented Feb 28, 2024

@thernstig I think you are right. In the bug scrub, we didn't take the time to read carefully through all the comments. Now that I look through the whole thing, it looks indeed is if we could have closed this a while ago. Thank you very much. I'll close this now and remove the dev-summit agenda item, but if anyone believes there is still something to do here, feel free to update this issue.

@beorn7 beorn7 closed this as completed Feb 28, 2024
@dudicoco
Copy link

dudicoco commented Apr 2, 2024

Hi @thernstig,

There are a few issues with your proposed solutions, (your second solution was already proposed years ago in the comments here, your first solution is basically the same just merges the annotations into one annotation):

  1. Both solutions still require multiple scrape jobs, one per port, as mentioned by @putrasattvika. This means that if I create two jobs and in one week someone needs to scrape 3 ports I will need to duplicate the job again.
  2. Each job creates another watch on the k8s API server, which could add a higher load on it. As mentioned by @matthiasr.
  3. How do you deal with duplicate metric names between two ports? Also mentioned in the comments of this issue.

@beorn7
Copy link
Member

beorn7 commented Apr 3, 2024

@dudicoco thanks for your points.

My current understanding of the K8s SD isn't deep enough to make a call here. We could bring this to the dev-summit after all. But maybe @brancz, as the K8s SD maintainer, could you vet @dudicoco's points and advise about reasonable next steps?

I could see various outcomes:

  1. Those points aren't that relevant in practice and the situation is good enough.
  2. There are some feasible work-arounds for those points.
  3. We should actually do something about it. But what?

@tejaswiniVadlamudi
Copy link

tejaswiniVadlamudi commented Apr 7, 2024

I appreciate the goal to reduce the number of scrape jobs and the load on the Prometheus Server.

I've one use case and a few ideas for a standard k8s-based scraping use case. It is tricky to limit the number of scrape jobs if each metrics web server (or Prometheus Target) has its own design choices.

  • Not all users use Prometheus Operator which enables the creation of microservice-specific scrape job(s) automatically.
  • Using Service Mesh Istio/LinkerD integration can enable aggregated metrics collection. But not all users rely on Service Mesh.
  • Having common scrape jobs to scrape all the targets in a k8s namespace or across the cluster is still usable for the standard users
    - More than one metrics port in a Pod (1 or more containers) is becoming more common in the real world.
    - Expectations on metrics web servers in a Pod vary based on the service design & usage scenarios
    * A web server from container-1 in a Pod is expected to be scraped for every 15s over cleartext with a scrape timeout of 10s on /v1/metrics endpoint of port 8089
    * A web server from container-2 in a Pod is expected to be scraped for every 1m over TLS with a scrape timeout of 15s on /metrics endpoint of port 9099
    * Per each set of port, path, TLS choice, and scrape interval settings a separate scrape job is needed today

My idea is to use a common scrape job for each metrics web server in the Pod and let the microservice declare its scrape choice using annotations.

For example, a service can be declared below:

prometheus.io/scrape_interval_1: '15s'
prometheus.io/scrape_timeout_1: '10s'
prometheus.io/scheme_1: 'http'
prometheus.io/port_1: 8089
prometheus.io/path_1: "/v1/metrics"
prometheus.io/scrape_interval_2: '1m'
prometheus.io/scrape_timeout_2: '15s'
prometheus.io/scheme_2: 'https'
prometheus.io/port_2: 8089
prometheus.io/path_2: "/metrics"

The list of annotations would expand linearly based on the number of metrics web servers in the Pod. The above example needs two different scrape jobs.

If scrape_interval can be configured using annotations at the relabeling phase, it can be seen as a great benefit while reducing the number of scrape jobs.

For the port naming topic, based on k8s port naming convention it can be - => http-metrics or https-metrics. The port names have a limited no of characters in k8s.

In the case of a Pod with 3 metrics web servers:
containers:
- name: container-1
ports:
- name: http-metrics1
containerPort: 7777
protocol: TCP
...
- name: container-2
ports:
- name: http-metrics2
containerPort: 8989
protocol: TCP
...
- name: container-3
ports:
- name: http-metrics3
containerPort: 8976
protocol: TCP

In the above case, the maximum number of metrics web servers in a Pod could be 9 (based on max allowed character length by k8s). Does anyone have a way to scale this better?

Please let me know if there is a possibility to discuss this idea in the dev summit or the normal sync call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests