Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for configuring CA, cert, and key via secret or configmap. #3249

Merged
merged 1 commit into from Jul 15, 2020

Conversation

ybettan
Copy link
Contributor

@ybettan ybettan commented May 27, 2020

Added support for configuring RemoteWrite TLS via Secret or Configmap.

Now we can configure the operator to use mTLS RemoteWrite by referencing
the CA, cert and key directly from k8s Secrets/ConfigMaps.

If the key and the cert are both Secrets, they can exist as a single
Secret which contain both 'cert.pem' and 'key.pem' otherwise they can
exist as 2 different Secrets (or a Secret for the key and ConfigMap for
the cert).

Signed-off-by: Yoni Bettan ybettan@redhat.com

Issue: #3118

@brancz
Copy link
Contributor

brancz commented May 29, 2020

Looks like an awesome start! Let us know if you need any input, otherwise looking forward to reviewing the full PR! :)

@ybettan ybettan force-pushed the devel branch 4 times, most recently from 4b7a627 to d1ec23e Compare June 8, 2020 11:15
//"PromGetAuthSecret": testPromGetAuthSecret,
//"PromArbitraryFSAcc": testPromArbitraryFSAcc,
//"PromTLSConfigViaSecret": testPromTLSConfigViaSecret,
"PromRemoteWriteWithTLS": testPromRemoteWriteWithTLS,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi these are all subtests so you can run them without uncommenting the rest with:

make test-e2e TEST_RUN_ARGS="-run TestAllNS/y/PromRemoteWriteWithTLS"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I didn't know that.

@lilic
Copy link
Contributor

lilic commented Jun 11, 2020

As discussed on slack: I would start by fixing the unit tests they might lead to you to finding what is wrong with e2e tests. https://travis-ci.org/github/coreos/prometheus-operator/jobs/695974726

As for running locally make sure to run the same version as we use in travis, seems like our docs are a bit of out date https://github.com/coreos/prometheus-operator#running-end-to-end-tests-on-local-minikube-cluster (feel free to fix that :) )We use 1.18.2 https://github.com/coreos/prometheus-operator/blob/master/scripts/create-minikube.sh#L12

@ybettan
Copy link
Contributor Author

ybettan commented Jun 11, 2020

Thanks for you replay @lilic.

I am not talking about the full e2e tests, I am talking about the only e2e test that run in this PR.
It used to work locally until I run the full e2e-tests, which failed and left my cluster corrupted, so I re-install my cluster (minikube) and since then testPromRemoteWriteWithTLS isn't working anymore.

Prometheus pods run correctly, prometheus scraping is working as well (checked with grafana) but I no longer get the logs describing a successful send or a "bad request error" according to the test variants.

The only thing that looks bugged and make me start with it in order to understand the problem is the output of the command kubectl get pod/prometheus-test-0 -n allns-y-promremotewritewithtls-qbr9j2-0 -o yaml which output:

apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2020-06-11T09:40:24Z"
  generateName: prometheus-test-
  labels:
    app: prometheus
    controller-revision-hash: prometheus-test-6cc8745cd7
    prometheus: test
    statefulset.kubernetes.io/pod-name: prometheus-test-0
  name: prometheus-test-0
  namespace: allns-y-promremotewritewithtls-qbr9j2-0
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: StatefulSet
    name: prometheus-test
    uid: c3504b70-ac2f-4c40-860c-4cd7f3d3f665
  resourceVersion: "16259"
  selfLink: /api/v1/namespaces/allns-y-promremotewritewithtls-qbr9j2-0/pods/prometheus-test-0
  uid: b7720bd6-4a50-4003-923e-796ad08149c4
spec:
  containers:
  - args:
    - --web.console.templates=/etc/prometheus/consoles
    - --web.console.libraries=/etc/prometheus/console_libraries
    - --config.file=/etc/prometheus/config_out/prometheus.env.yaml
    - --storage.tsdb.path=/prometheus
    - --storage.tsdb.retention.time=24h
    - --web.enable-lifecycle
    - --storage.tsdb.no-lockfile
    - --web.route-prefix=/
    image: quay.io/prometheus/prometheus:v2.16.0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 6
      httpGet:
        path: /-/healthy
        port: web
        scheme: HTTP
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 3
    name: prometheus
    ports:
    - containerPort: 9090
      name: web
      protocol: TCP
    readinessProbe:
      failureThreshold: 120
      httpGet:
        path: /-/ready
        port: web
        scheme: HTTP
      periodSeconds: 5
      successThreshold: 1
      timeoutSeconds: 3
    resources:
      requests:
        memory: 400Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /etc/prometheus/config_out
      name: config-out
      readOnly: true
    - mountPath: /prometheus
      name: prometheus-test-db
    - mountPath: /etc/prometheus/rules/prometheus-test-rulefiles-0
      name: prometheus-test-rulefiles-0
    - mountPath: /etc/prometheus/secrets/key-cert-ca
      name: secret-key-cert-ca
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: prometheus-token-x4n9c
      readOnly: true
  - args:
    - --log-format=logfmt
    - --reload-url=http://localhost:9090/-/reload
    - --config-file=/etc/prometheus/config/prometheus.yaml.gz
    - --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
    command:
    - /bin/prometheus-config-reloader
    env:
    - name: POD_NAME
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.name
    image: quay.io/coreos/prometheus-config-reloader:50ff6810
    imagePullPolicy: IfNotPresent
    name: prometheus-config-reloader
    resources:
      limits:
        cpu: 100m
        memory: 25Mi
      requests:
        cpu: 100m
        memory: 25Mi
12:46
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /etc/prometheus/config
      name: config
    - mountPath: /etc/prometheus/config_out
      name: config-out
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: prometheus-token-x4n9c
      readOnly: true
  - args:
    - --webhook-url=http://localhost:9090/-/reload
    - --volume-dir=/etc/prometheus/rules/prometheus-test-rulefiles-0
    image: jimmidyson/configmap-reload:v0.3.0
    imagePullPolicy: IfNotPresent
    name: rules-configmap-reloader
    resources:
      limits:
        cpu: 100m
        memory: 25Mi
      requests:
        cpu: 100m
        memory: 25Mi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: FallbackToLogsOnError
    volumeMounts:
    - mountPath: /etc/prometheus/rules/prometheus-test-rulefiles-0
      name: prometheus-test-rulefiles-0
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: prometheus-token-x4n9c
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: prometheus-test-0
  nodeName: minikube
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: prometheus
  serviceAccountName: prometheus
  subdomain: prometheus-operated
  terminationGracePeriodSeconds: 600
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: config
    secret:
      defaultMode: 420
      secretName: prometheus-test
  - emptyDir: {}
    name: config-out
  - configMap:
      defaultMode: 420
      name: prometheus-test-rulefiles-0
    name: prometheus-test-rulefiles-0
  - name: secret-key-cert-ca
    secret:
      defaultMode: 420
      secretName: key-cert-ca
  - emptyDir: {}
    name: prometheus-test-db
  - name: prometheus-token-x4n9c
    secret:
      defaultMode: 420
      secretName: prometheus-token-x4n9c
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-06-11T09:40:24Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-06-11T09:40:53Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-06-11T09:40:53Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-06-11T09:40:24Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://a4fd69b3bef546c0aa1a303432e2dead1045bd512ec75da4d56bd4b52ab030e7
    image: quay.io/prometheus/prometheus:v2.16.0
    imageID: docker-pullable://quay.io/prometheus/prometheus@sha256:e4ca62c0d62f3e886e684806dfe9d4e0cda60d54986898173c1083856cfda0f4
    lastState:
      terminated:
        containerID: docker://d8268131f3e6a1933fdbe673ff7759d6fdecffc2c76a8a827cb58e2c49f79211
        exitCode: 1
        finishedAt: "2020-06-11T09:40:39Z"
        message: |2
           caller=main.go:661 msg="Starting TSDB ..."
          level=info ts=2020-06-11T09:40:39.604Z caller=web.go:508 component=web msg="Start listening for connections" address=0.0.0.0:9090
          level=info ts=2020-06-11T09:40:39.620Z caller=head.go:577 component=tsdb msg="replaying WAL, this may take awhile"
          level=info ts=2020-06-11T09:40:39.621Z caller=head.go:625 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
          level=info ts=2020-06-11T09:40:39.622Z caller=main.go:676 fs_type=EXT4_SUPER_MAGIC
          level=info ts=2020-06-11T09:40:39.622Z caller=main.go:677 msg="TSDB started"
          level=info ts=2020-06-11T09:40:39.622Z caller=main.go:747 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
          level=info ts=2020-06-11T09:40:39.622Z caller=main.go:530 msg="Stopping scrape discovery manager..."
          level=info ts=2020-06-11T09:40:39.622Z caller=main.go:544 msg="Stopping notify discovery manager..."
          level=info ts=2020-06-11T09:40:39.622Z caller=main.go:566 msg="Stopping scrape manager..."
          level=info ts=2020-06-11T09:40:39.622Z caller=main.go:540 msg="Notify discovery manager stopped"
          level=info ts=2020-06-11T09:40:39.622Z caller=main.go:526 msg="Scrape discovery manager stopped"
          level=info ts=2020-06-11T09:40:39.622Z caller=manager.go:845 component="rule manager" msg="Stopping rule manager..."
          level=info ts=2020-06-11T09:40:39.623Z caller=manager.go:851 component="rule manager" msg="Rule manager stopped"
          level=info ts=2020-06-11T09:40:39.623Z caller=main.go:560 msg="Scrape manager stopped"
          level=info ts=2020-06-11T09:40:39.688Z caller=notifier.go:598 component=notifier msg="Stopping notification manager..."
          level=info ts=2020-06-11T09:40:39.688Z caller=main.go:731 msg="Notifier manager stopped"
          level=error ts=2020-06-11T09:40:39.690Z caller=main.go:740 err="error loading config from \"/etc/prometheus/config_out/prometheus.env.yaml\": couldn't load configuration (--config.file=\"/etc/prometheus/config_out/prometheus.env.yaml\"): open /etc/prometheus/conf
ig_out/prometheus.env.yaml: no such file or directory"
        reason: Error
        startedAt: "2020-06-11T09:40:39Z"
    name: prometheus
    ready: true
    restartCount: 1
    started: true
    state:
      running:
        startedAt: "2020-06-11T09:40:47Z"
  - containerID: docker://a8511bd2df1dd0463be72578e5b40b8e78fcf037a9ed1c6d5ee83a0982962970
    image: quay.io/coreos/prometheus-config-reloader:50ff6810
    imageID: docker://sha256:acb88791248593818338b5b709e74c241db81b3383480e5e925dbbd5f39a89d9
    lastState: {}
    name: prometheus-config-reloader
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2020-06-11T09:40:40Z"
  - containerID: docker://92fbd45a2320b5c892c8e978d951c9486b728af95ae4076507a028118f0e7fc3
    image: jimmidyson/configmap-reload:v0.3.0
    imageID: docker-pullable://jimmidyson/configmap-reload@sha256:d107c7a235c266273b1c3502a391fec374430e5625539403d0de797fa9c556a2
    lastState: {}
    name: rules-configmap-reloader
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2020-06-11T09:40:45Z"
  hostIP: 192.168.39.116
  phase: Running
  podIP: 172.17.0.9
  podIPs:
  - ip: 172.17.0.9
  qosClass: Burstable
  startTime: "2020-06-11T09:40:24Z"

In order to reproduce the error on this PR:

  1. eval $(minikube docker-env) && make image
  2. make test-e2e

BTW, unit tests on this PR are working correctly (at lease the same way as master)

@lilic
Copy link
Contributor

lilic commented Jun 11, 2020

The unit tests are failing it says in travis 🤔 or maybe you have not pushed something? Can you just push what you have locally and remove the uncommented tests like Frederic suggested. Some tests maybe rely on others or need to be run in order, so easier to see if you push your work on the PR. thanks! :)

@ybettan
Copy link
Contributor Author

ybettan commented Jun 11, 2020

Done.

@ybettan ybettan force-pushed the devel branch 5 times, most recently from b97112b to f17d5f1 Compare June 30, 2020 13:11
@ybettan ybettan closed this Jun 30, 2020
@ybettan ybettan reopened this Jun 30, 2020
@ybettan ybettan force-pushed the devel branch 8 times, most recently from f5584f5 to eb88372 Compare July 7, 2020 12:26
@ybettan ybettan marked this pull request as ready for review July 7, 2020 13:26
@ybettan ybettan changed the title [WIP] Added support for configuring CA, cert, and key via secret or configmap. Added support for configuring CA, cert, and key via secret or configmap. Jul 7, 2020
tlsConfig := yaml.MapSlice{
{Key: "insecure_skip_verify", Value: tls.InsecureSkipVerify},
}
if tls.CAFile != "" {
tlsConfig = append(tlsConfig, yaml.MapItem{Key: "ca_file", Value: tls.CAFile})
}
if tls.CA.Secret != nil {
tlsConfig = append(tlsConfig, yaml.MapItem{Key: "ca_file", Value: pathPrefix + "_" + tls.CA.Secret.Name + "_" + tls.CA.Secret.Key})
tlsConfig = append(tlsConfig, yaml.MapItem{Key: "ca_file", Value: path.Join(secretsDir, tls.CA.Secret.Name, tls.CA.Secret.Key)})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pathPrefix previously had the namespace in there, which is important, this secretsDir is flat, so secrets with the same namespace would collide and cause errors

Copy link
Contributor Author

@ybettan ybettan Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind adding namespace as prefix for secrets names.

In this implementation secretsDir isn't flat, my-secret-a and my-secret-b in the same namespace will be mounted in etc/prometheus/my-secret-a/<secret-a key-values> and etc/prometheus/my-secret-b/<secret-b key-values>

In addition 2 different secrets with the same name but different namespace will fail anyway since only the secret within the same namespace as prometheus pod can be mounted as implemented here https://github.com/coreos/prometheus-operator/blob/f811728eefec5504dd189cbc9534647a858ac0cd/pkg/prometheus/statefulset.go#L534-L542

Am I missing something?

Copy link
Contributor

@brancz brancz Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same function is used for ServiceMonitors, which can be in different namespaces than the Prometheus object. Meaning this will now collide if there are two ServiceMonitors in different namespaces, each having a secret with the same name in their namespace and referencing it.

caResourceName = tls.CA.Secret.LocalObjectReference.Name
caPrefixedResourceName = "secret-" + caResourceName
if caResourceName != keySecretName && caPrefixedResourceName != certPrefixedResourceName {
promSpec.Secrets = append(promSpec.Secrets, caResourceName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is the strategy we want to go with, then this function should rather return volumes and volumemounts, modifying the prometheus object with extra information inferred from some other settings is not the right thing to do here .. we kind of have precedence of this type of thing, which is the alertmanager endpoint TLS configuration being in the same namespace as the prometheus, I think that strategy would be what we should do for remote write as well

Copy link
Contributor Author

@ybettan ybettan Jul 8, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I completely understand what you mean here, I tough that the point of prometheus.Spec.Secrets/prometheus.Spec.ConfigMaps is to describe the names of the secrets/configmaps to be mounted automatically at the StatefulSet stage.

This function isn't mounting actually but just mention the names of the resources to be mounter later in pkg/prometheus/statefulset.go. Which volumes/volumemountes would you expect to be returned?

Can you please elaborate a bit more about you preferred solution - "alertmanager endpoint TLS configuration"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it appears the alertmanager functionality doesn't actually exist. What I was trying to say is prometheus.Spec.ConfigMaps is only ever supposed to be used by users, the operator should not set them in order to force some behavior, that would be a cyclic dependency in some way. We've violated this in the early days of the Prometheus Operator a bit, but we should aim for never modifying the objects coming in to be processed, this only sets up for confusion of what modifies the object where.

Now we can configure the operator to use mTLS RemoteWrite by referencing
the CA, cert and key directly from k8s Secrets/ConfigMaps.

If the key and the cert are both Secrets, they can exist as a single
Secret which contain both 'cert.pem' and 'key.pem' otherwise they can
exist as 2 different Secrets (or a Secret for the key and ConfigMap for
the cert).

Signed-off-by: Yoni Bettan <ybettan@redhat.com>
@brancz
Copy link
Contributor

brancz commented Jul 15, 2020

lgtm 👍 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants