Scrape external service with FQDN #3204

shay-berman · 2020-05-08T14:15:16Z

What happened?
Cannot scrape a service with its FQDN that is out side of the k8s cluster. It only works if you set the service IP, but I prefer not to use IP(s) which may change.

See the prometheus UI that just show the servicemonitor name but nothing inside the endpoints list:

Here is the yaml that define the Prometheus CR + MonitorService + External service with the SERVICE-FQDN. But when you open prometheus it will not scrape the external service.

apiVersion: v1
kind: Service
metadata:
  name: rs1
  labels:
    app: prometheus1 # ServiceMonitor match this label

spec:
  externalName: <SERVICE-FQDN>
  type: ExternalName
  ports:
  - name: https
    protocol: TCP
    port: 8070
    targetPort: 8070

---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: prometheus1
  labels:
    app: prometheus1 # This is what the prometheus1 CR looking for match
spec:
  endpoints:
  - path: /
    scheme: https
    name: https
    tlsConfig:
      insecureSkipVerify: true 
  jobLabel: jobName
  selector:
    matchLabels:
      app: prometheus1


---

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    app: prometheus-operator-prometheus1
  name: prometheus-operator-prometheus1
spec:
  replicas: 1
  serviceAccountName: chart1-prometheus-operator-prometheus
  alerting:
    alertmanagers:
    - apiVersion: v2
      name: chart1-prometheus-operator-alertmanager
      namespace: default
      pathPrefix: /
      port: web
  image: quay.io/prometheus/prometheus
  version: v2.15.2
  logLevel: debug
  portName: web
  routePrefix: /
  retention: 10d
  serviceMonitorSelector:
    matchLabels:
      app: prometheus1
  ruleNamespaceSelector: {}
  thanos:
    baseImage: thanosio/thanos
    version: v0.12.0

The only way to scrape the SERVICE-FQDN is by adding also Endpoints object that point to the SERVICE-FQDN specific IP(s). Only then you can see the target in prometheus working. But the whole point it to use only with SERVICE-FQDN and not with specific IPs.

Did you expect to see something different?
I would expect to have an option to scrape also by SERVICE-FQDN and not only by the IPs.
Here are some blogs that explain how to scrape external service but again only with endpoints with specific IPs:

But again none of them use the FQDN, and I would expect to have such way.

How to reproduce it (as minimally and precisely as possible):
Just use the yaml above and you will see that the service (ExternalName) is not visible as target in promethes.

Environment

Prometheus Operator version:
quay.io/coreos/prometheus-operator:v0.37.0

`

kubectl describe deployment chart1-prometheus-operator-operator
Name: chart1-prometheus-operator-operator
Namespace: default
CreationTimestamp: Wed, 06 May 2020 21:31:34 +0300
Labels: app=prometheus-operator-operator
app.kubernetes.io/managed-by=Helm
chart=prometheus-operator-8.12.12
heritage=Helm
release=chart1
Annotations: deployment.kubernetes.io/revision: 2
meta.helm.sh/release-name: chart1
meta.helm.sh/release-namespace: default
Selector: app=prometheus-operator-operator,release=chart1
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=prometheus-operator-operator
chart=prometheus-operator-8.12.12
heritage=Helm
release=chart1
Service Account: chart1-prometheus-operator-operator
Containers:
prometheus-operator:
Image: quay.io/coreos/prometheus-operator:v0.37.0
Port: 8080/TCP
Host Port: 0/TCP
Args:
--manage-crds=true
--kubelet-service=kube-system/chart1-prometheus-operator-kubelet
--logtostderr=true
--localhost=127.0.0.1
--prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.37.0
--config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
--config-reloader-cpu=100m
--config-reloader-memory=25Mi
--log-level=debug
Environment:
Mounts:
tls-proxy:
Image: squareup/ghostunnel:v1.5.2
Port: 8443/TCP
Host Port: 0/TCP
Args:
server
--listen=:8443
--target=127.0.0.1:8080
--key=cert/key
--cert=cert/cert
--disable-authentication
Environment:
Mounts:
/cert from tls-proxy-secret (ro)
Volumes:
tls-proxy-secret:
Type: Secret (a volume populated by a Secret)
SecretName: chart1-prometheus-operator-admission
Optional: false
Conditions:
Type Status Reason
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets:
NewReplicaSet: chart1-prometheus-operator-operator-746d86bbb7 (1/1 replicas created)
Events:

`

Kubernetes version information:

+ kubectl version Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.0", GitCommit:"70132b0f130acc0bed193d9ba59dd186f0e634cf", GitTreeState:"clean", BuildDate:"2019-12-07T21:20:10Z", GoVersion:"go1.13.4", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"14+", GitVersion:"v1.14.10-gke.27", GitCommit:"145f9e21a4515947d6fb10819e5a336aff1b6959", GitTreeState:"clean", BuildDate:"2020-02-21T18:01:40Z", GoVersion:"go1.12.12b4", Compiler:"gc", Platform:"linux/amd64"}
Kubernetes cluster kind:

GKE

Manifests:

insert manifests relevant to the issue

Prometheus Operator Logs:

unfortunately don't see much log related. Its hard to debug why its not working.

Anything else we need to know?:

Slack discussion -> https://kubernetes.slack.com/archives/CFFDS2Z7F/p1588873823381600 (still no detail how to use FQDN.
Another option is to use additional scrap config but its the old style (not k8s style).

- job_name: job
  scrape_interval: 30s
  scrape_timeout: 30s
  metrics_path: /
  scheme: https
  tls_config:
    insecure_skip_verify: true
  static_configs:
    - targets:
      - <SERVICE-FQDN>:port

this will work, but again i want to do it k8s way, by setting externalName service with its FQDN.

The text was updated successfully, but these errors were encountered:

sebarys · 2020-05-09T09:57:21Z

I think externalNames are not supported there is few issues about it e.g. #218

shay-berman · 2020-05-09T18:42:07Z

Thanks @sebarys for direct me to a similar issue #218.

BUT it looks like this is very long thread without a solution:
Issue #218 was closed and redirect to ---> #372, which was closed and then redirect to --> prometheus/prometheus#2791, which was closed without any formal solution for ExternalName service in k8s.

The last summary of this thread are #218 (comment) and prometheus/prometheus#2791 (comment) but again no formal solution.

If I understood correctly to scrape external service is to use EndpointIP as mentioned #834 (comment) (or in blog), But its not help if you need to use serviceFQDN.

Another way to solve it is just to use the old way(not the k8s way) define additional scrape configs with regular static_configs But again this way you don't use the k8s concept of ExternalName service.

  static_configs:
    - targets:
      - SERVICE-FQDN

@gouthamve \ @brancz \ @sebarys
So can you please provide detail what is the best practice of how to scrape an external serviceFQDN (without using Endpoint IP, just by the FQDN of the service)? And I think it should be official documented.

shay-berman · 2020-05-13T09:25:46Z

@gouthamve \ @brancz \ @sebarys - can anyone help on this please?

sebarys · 2020-05-13T16:32:03Z

In our project we've added this using static_configs as it looks that there is no plan for now to have this feature in prometheus-operator

brancz · 2020-05-13T16:51:13Z

You cannot in a meaningful way monitor external services as prometheus needs to scrape each instance/process individually. That’s why you need to use a separate discovery mechanism that actually does discover all processes.

shay-berman · 2020-05-16T14:22:12Z

ok so based on your feedback it looks like there is no plan to support scrape ExternalName k8s service type. And your recommendation is to use static_configs to define external service.

Should I close this ticket?

brancz · 2020-05-18T18:04:32Z

The way the issue is created it won't happen. That said, we have thought about having more generic scrape configs available through some new CRD in the prometheus operator. That could be something that could be used for that. As far as I know there is no one working on this currently though.

elsbrock · 2020-06-21T20:09:23Z

You cannot in a meaningful way monitor external services as prometheus needs to scrape each instance/process individually. That’s why you need to use a separate discovery mechanism that actually does discover all processes.

One use case I see is federation, where I want to configure another Prometheus instance as target to be scraped. It'd be great if that were possible by means of a ServiceMonitor.

brancz · 2020-06-23T13:12:18Z

@elsbrock for a Prometheus in the same cluster this is perfectly possible. The federation endpoint on prometheus is no different from any other /metrics endpoint, so all you need to do is change the path to scrape from to /federate instead of /metrics which you can specify using the path field in the ServiceMonitor endpoint definition.

elsbrock · 2020-06-24T16:59:04Z

Right, but in our case the Prometheus instance is running in an entirely different network segment, so we either need to use the global config (which I don't find to be nice from dependency point of view, a ServiceMonitor seems much better) or set up a reverse proxy pod that I can then scrape using a ServiceMonitor.

brancz · 2020-06-24T18:38:55Z

Yes for those cases an additionalScrapeConfig is best.

mrueg · 2020-07-03T18:02:07Z

@brancz I wonder if there's a way to provide additionalScrapeConfigs as a Kubernetes CRD object?
Often multiple teams share a single prometheus(-operator) deployment and that would enable self-service for scrapeconfigs.

brancz · 2020-07-04T06:17:38Z

This is not possible today, but I would like to get there one day. I would like to essentially introduce a lower level CRD "ScrapeConfig", which all the other config generation CRs are ultimately converted to. The difficulty is maintaining the types for such a CR, this would need to be automated, by inspecting the types from Prometheus and converting.All of this is not impossible but it will need some non trivial amount of work, which I currently don't have time for. If anyone from the community would like to invest time into this though I'd be happy to discuss possible designs and caveats that I think of.

mircohacker · 2020-07-17T15:49:26Z

@brancz I would be interested to look into implementing this CRD. How should we proceed?

brancz · 2020-07-17T17:55:47Z

I think a design doc would be in order, as what I'm imagining would involve synchronizing types from the prometheus repo.

angeloskaltsikis · 2020-11-20T12:14:35Z

Any news on this feature? (or the design doc?)

jasonstitt · 2020-12-09T16:09:53Z

Running into this trying to scrape AWS MSK (see: https://docs.aws.amazon.com/msk/latest/developerguide/open-monitoring.html). MSK provides a FQDN for each broker, and we also have them aliased to consistent in-cluster names using ExternalName services. The underlying IP addresses might be stable, but I don't see sufficient documentation to rely on that, and in any case they would have to be hardcoded per cluster.

So now the options are (a) bypass the CRD setup and use config files (aka additionalScrapeConfigs) or (b) set up reverse proxies just to scrape existing endpoints that are available to scrape.

You cannot in a meaningful way monitor external services as prometheus needs to scrape each instance/process individually. That’s why you need to use a separate discovery mechanism that actually does discover all processes.

This is an example of a case in which you can (as there is an FQDN provided per instance).

i9 · 2021-01-21T06:26:57Z

Running into this trying to scrape AWS MSK (see: https://docs.aws.amazon.com/msk/latest/developerguide/open-monitoring.html). MSK provides a FQDN for each broker, and we also have them aliased to consistent in-cluster names using ExternalName services. The underlying IP addresses might be stable, but I don't see sufficient documentation to rely on that, and in any case they would have to be hardcoded per cluster.

So now the options are (a) bypass the CRD setup and use config files (aka additionalScrapeConfigs) or (b) set up reverse proxies just to scrape existing endpoints that are available to scrape.

You cannot in a meaningful way monitor external services as prometheus needs to scrape each instance/process individually. That’s why you need to use a separate discovery mechanism that actually does discover all processes.

This is an example of a case in which you can (as there is an FQDN provided per instance).

@jasonstitt have you tried

static_configs:
      - targets: ['msk-alias-1.namespace:11001']

it worked for us

lilic · 2021-01-21T10:48:55Z

If anyone is willing to do a design doc for this they are more than welcome to create a PR for this! 🎉

alexisph · 2021-05-25T13:23:36Z

Hi. I came across this issue in our OpenShift clusters. Here's how I solved it:

Create service with externalName as the public URL
Create endpoints with service IP address and same name as service above
Create servicemonitor for this new service which replaces the __address__ label

...
spec:
  endpoints:
  - path: /metrics
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
    relabelings:
      - sourceLabels: [__address__]
        targetLabel: __address__
        regex: (.*)
        replacement: "$FQDN:$PORT"
        action: replace
...

I was then able to scrape the FQDN!

miguel-callejas-coderoad-com · 2021-06-01T21:59:04Z

@alexisph that's great. Can you share a little bit more of your configuration. I'm trying configuring a Service with ExternalName property to reach the FQDN outside Kubernetes with your recommendations, but had no luck.

kind: "Service"
apiVersion: "v1"
metadata:
  namespace: workload
  name: nfs-centralus-001
  labels:
    workload.stateful: nfs-centralus-001
spec:
  type: ExternalName
  externalName: nfs-centralus-001.c.saas-workload-io.internal
  selector:
    workload.stateful: nfs-centralus-001

and the Service Monitor looks like

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    ops.workload.io/component: nfs-centralus-001
    ops.workload.io/category: infrastructure
  name: nfs-centralus-001
  namespace: workload
spec:
  endpoints:
  - path: /metrics
    interval: 15s
    targetPort: 9100
    scheme: http
    relabelings:
      - sourceLabels: [__address__]
        targetLabel: __address__
        regex: (.*)
        replacement: "$FQDN:$PORT"
        action: replace
  jobLabel: ops.workload.io/nfs-centralus-001
  namespaceSelector:
    matchNames:
    - workload
  selector:
    matchExpressions:
      - key: workload.stateful
        operator: In
        values: ["nfs-centralus-001"]

alexisph · 2021-06-23T17:31:08Z

@miguel-callejas-coderoad-com , you're missing the Endpoints resource. Based on your example:

apiVersion: v1
kind: Endpoints
metadata:
  name: nfs-centralus-001
  namespace: workload
  labels:
    workload.stateful: nfs-centralus-001
subsets:
- addresses:
  - ip: 1.2.3.4
  - ip: 1.2.3.5
  ports:
  - name: metrics
    port: 9100
    protocol: TCP

marratj · 2021-08-02T12:21:41Z

A quick note on this, as I was struggling with finding the same config:

You don't even need to specify the real IP of your destination FQDN in the Endpoint, it can be just any IP, because by relabeling __address__ you're instructing Prometheus to actually scrape what is specified in the __address__ label.

This would be usually populated by the IP address that is defined in the Endpoint, but we're replacing it with a whole different address here, so the actual IP in the Endpoint doesn't even matter to Prometheus itself anymore.

elsbrock · 2021-09-13T20:55:31Z

Can you give an example?

hryamzik · 2021-09-15T22:23:26Z

@alexisph @miguel-callejas-coderoad-com do you literally use "$FQDN:$PORT" as replacement?

It works for me if I put real values there (in.e. "myhost.example.com:8080") but "$FQDN:$PORT" produces instance=":".

alexisph · 2021-09-16T16:55:05Z

@hryamzik , just use the FQDN and port of the service you'd want to scrape, like in your example.

hryamzik · 2021-09-16T18:26:20Z

That's what I already do, hoped for a more elegant solution as externalName already contains it. Got it, ty!

r0bj · 2021-10-02T20:08:31Z

This issues became more important after k8s 1.22 in which write access to Endpoints has been disabled by default in admin roles due to CVE-2021-25740:
kubernetes/kubernetes#103675
https://kubernetes.io/docs/reference/access-authn-authz/rbac/#write-access-for-endpoints

paulfantom · 2021-10-27T12:38:43Z

The main solution for this would be implementing Generic ScrapeConfig CRD described in #2787. Contributions welcome.

cuchac · 2021-11-20T02:35:34Z

@hryamzik I found a better solution that does not require to duplicate domains.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
...
spec:
  endpoints:
  - interval: 30s
    path: /_prometheus/metrics/
    port: web
    relabelings:
    - action: replace
      regex: (.*)
      replacement: $1
      sourceLabels:
      - __meta_kubernetes_endpoint_node_name
      targetLabel: __address__
  selector:
     ...

apiVersion: v1
kind: Endpoints
metadata:
  ...
subsets:
- addresses:
  - ip: 1.2.3.5
    nodeName: www.example.com
  ports:
...

or you can use __meta_kubernetes_service_name or __meta_kubernetes_endpoints_name in sourceLabels to get hostname from service or endpoints name. I use __meta_kubernetes_endpoint_node_name so that i can have more domains inside one endpoint.

ig-matsz · 2022-02-08T07:47:20Z

Hi.
Based on the previous posts we created a similar workaround. Let me share:

apiVersion: v1
kind: Service
metadata:
  name: external-dev-prometheus
  namespace: monitoring
  labels:
    k8s-app: external-dev-prometheus
spec:
  type: ExternalName
  externalName: FQDN
  ports:
  - name: metrics
    port: 443
    protocol: TCP
    targetPort: 443

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: external-dev-prometheus
  namespace: monitoring
  labels:
    k8s-app: external-dev-prometheus
spec:
  endpoints:
  - port: metrics
    interval: 30s
    honorLabels: true
    scheme: https
    path: /metrics
    tlsConfig:
      insecureSkipVerify: true
    relabelings:
      - sourceLabels: [__address__]
        targetLabel: __address__
        regex: (.*)
        replacement: "FQDN:443"
        action: replace
  selector:
    matchLabels:
      k8s-app: external-dev-prometheus
  namespaceSelector:
    matchNames:
    - monitoring

apiVersion: v1
kind: Endpoints
metadata:
  name: external-dev-prometheus
  namespace: monitoring
  labels:
    k8s-app: external-dev-prometheus
subsets:
- addresses:
  - ip: 1.2.3.4
  ports:
  - name: metrics
    port: 443
    protocol: TCP


apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  labels:
    app: kube-prometheus-stack
    release: kube-prometheus-stack
  name: external-dev-prometheus-endpoint
  namespace: monitoring
spec:
  groups:
    - name: critical-external
      rules:
        - alert: PrometheusTargetMissing
          expr: up {endpoint="metrics",  namespace="monitoring", service="external-dev-prometheus"} == 0
          for: 1m
          labels:
            severity: critical
          annotations:
            message: Prometheus target missing (instance {{ $labels.instance }})
            description: "A Prometheus target has disappeared. An exporter might be crashed.\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

This solved the issue and everything was perfect. Unfortunately we noticed that after some short (few hours) but random time targets are disappearing in effect disabling this monitoring setup.

Here's an illustration of the event:

We have 10 endpoints from prometheus operator such as node exporters, alert manager and kube-prometheus-stack. We are adding 8 custom endpoints as described above. You can see that all our custom external endpoints are gone at the same time.

Our setup:
kube-prometheus-stack-13.10.0
But we also veryfied this on kube-prometheus-stack-31.0.1
with the same result.

What we verified:

all objects that I pasted are still present in the cluster
we weren't able to find any meaningful log messages indicating problems, for example with WALs
the file /etc/prometheus/config_out/prometheus.env.yaml of the prometheus pod has this endpoint config
reapplying those objects brings back the target, but then it is gone again after some random time

Has anyone seen anything similar? Can anyone give some hints into what else can we check? Thanks, appreciate for any feedback.

simonpasquier · 2022-02-15T09:14:17Z

Closing this issue in favor of #2787 (generic scrape config CRD) which should resolve the original request eventually.

…metheus-operator#3204

shay-berman added the kind/bug label May 8, 2020

shay-berman mentioned this issue May 9, 2020

Kubernetes SD; Support external name services prometheus/prometheus#2791

Closed

AntonSmolkov mentioned this issue Apr 26, 2021

Support the endpointslice role for Prometheus #3862

Open

paulfantom added kind/feature and removed kind/bug labels Jun 7, 2021

matofeder mentioned this issue Jun 17, 2021

MySQL dashboard should have hostname dNationCloud/kubernetes-monitoring#118

Closed

simonpasquier closed this as completed Feb 15, 2022

h3mmy added a commit to h3mmy/bloopySphere that referenced this issue Jun 14, 2022

(home-assistant) Add endpoint object based on prometheus-operator/pro…

284de35

…metheus-operator#3204

EvgenySamoylov mentioned this issue Dec 27, 2022

[doc] Scrape external metrics with dns name as target deckhouse/deckhouse#3299

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scrape external service with FQDN #3204

Scrape external service with FQDN #3204

shay-berman commented May 8, 2020 •

edited

sebarys commented May 9, 2020

shay-berman commented May 9, 2020

shay-berman commented May 13, 2020

sebarys commented May 13, 2020

brancz commented May 13, 2020

shay-berman commented May 16, 2020

brancz commented May 18, 2020

elsbrock commented Jun 21, 2020

brancz commented Jun 23, 2020

elsbrock commented Jun 24, 2020

brancz commented Jun 24, 2020

mrueg commented Jul 3, 2020

brancz commented Jul 4, 2020

mircohacker commented Jul 17, 2020

brancz commented Jul 17, 2020

angeloskaltsikis commented Nov 20, 2020

jasonstitt commented Dec 9, 2020 •

edited

i9 commented Jan 21, 2021

lilic commented Jan 21, 2021

alexisph commented May 25, 2021

miguel-callejas-coderoad-com commented Jun 1, 2021

alexisph commented Jun 23, 2021

marratj commented Aug 2, 2021 •

edited

elsbrock commented Sep 13, 2021

hryamzik commented Sep 15, 2021

alexisph commented Sep 16, 2021

hryamzik commented Sep 16, 2021

r0bj commented Oct 2, 2021

paulfantom commented Oct 27, 2021

cuchac commented Nov 20, 2021 •

edited

ig-matsz commented Feb 8, 2022 •

edited

simonpasquier commented Feb 15, 2022 •

edited

Scrape external service with FQDN #3204

Scrape external service with FQDN #3204

Comments

shay-berman commented May 8, 2020 • edited

sebarys commented May 9, 2020

shay-berman commented May 9, 2020

shay-berman commented May 13, 2020

sebarys commented May 13, 2020

brancz commented May 13, 2020

shay-berman commented May 16, 2020

brancz commented May 18, 2020

elsbrock commented Jun 21, 2020

brancz commented Jun 23, 2020

elsbrock commented Jun 24, 2020

brancz commented Jun 24, 2020

mrueg commented Jul 3, 2020

brancz commented Jul 4, 2020

mircohacker commented Jul 17, 2020

brancz commented Jul 17, 2020

angeloskaltsikis commented Nov 20, 2020

jasonstitt commented Dec 9, 2020 • edited

i9 commented Jan 21, 2021

lilic commented Jan 21, 2021

alexisph commented May 25, 2021

miguel-callejas-coderoad-com commented Jun 1, 2021

alexisph commented Jun 23, 2021

marratj commented Aug 2, 2021 • edited

elsbrock commented Sep 13, 2021

hryamzik commented Sep 15, 2021

alexisph commented Sep 16, 2021

hryamzik commented Sep 16, 2021

r0bj commented Oct 2, 2021

paulfantom commented Oct 27, 2021

cuchac commented Nov 20, 2021 • edited

ig-matsz commented Feb 8, 2022 • edited

simonpasquier commented Feb 15, 2022 • edited

shay-berman commented May 8, 2020 •

edited

jasonstitt commented Dec 9, 2020 •

edited

marratj commented Aug 2, 2021 •

edited

cuchac commented Nov 20, 2021 •

edited

ig-matsz commented Feb 8, 2022 •

edited

simonpasquier commented Feb 15, 2022 •

edited