Skip to content
This repository has been archived by the owner on Feb 22, 2022. It is now read-only.

[stable/concourse]: Add ServiceMonitor #11289

Closed
wants to merge 2 commits into from
Closed

[stable/concourse]: Add ServiceMonitor #11289

wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Feb 8, 2019

What this PR does / why we need it:

This commit:

  • Adds documentation for Prometheus values.
  • Adds Prometheus Operator ServiceMonitor

Special notes for your reviewer:

Checklist

[Place an '[x]' (no spaces) in all applicable fields. Please remove unrelated fields.]

  • DCO signed
  • Chart Version bumped
  • Variables are documented in the README.md

This commit:
- Adds documentation for Prometheus values.
- Adds Prometheus Operator ServiceMonitor

Signed-off-by: Kamil Zabielski <xul.sitatirev@gmail.com>
@helm-bot helm-bot added the Contribution Allowed If the contributor has signed the DCO or the CNCF CLA (prior to the move to a DCO). label Feb 8, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: xulsitatirev
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: william-tran

If they are not already assigned, you can assign the PR to them by writing /assign @william-tran in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 8, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @xulsitatirev. Thanks for your PR.

I'm waiting for a helm member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Collaborator

@cirocosta cirocosta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Do you think these changes could come together with #10881?

thx!

@@ -81,6 +81,8 @@ The following table lists the configurable parameters of the Concourse chart and
| `web.ingress.enabled` | Enable Concourse Web Ingress | `false` |
| `web.ingress.hosts` | Concourse Web Ingress Hostnames | `[]` |
| `web.ingress.tls` | Concourse Web Ingress TLS configuration | `[]` |
| `web.prometheus.enabled` | Enable Prometheus metrics endpoint | `false` |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, the prometheus.enabled thing under web is actually under concourse.web.prometheus.enabled, right? From the set of variables that are more directly mapped to concourse web commands (which we end up not documenting in the README.md)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cirocosta I have seen these variables are not documented.
I think we might improve my pull-request and document these variables.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not having those indeed break the pattern that is established regarding documenting all the values under values.yaml in README.md.

I'd say we can have a separate PR for that though - the change would involve a bunch of additions to the README.md. Wdyt?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely. I think this should be a separate pull-request.
I will be happy to help with that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I just opened #11300 to keep track of it

@@ -1,5 +1,5 @@
name: concourse
version: 3.7.2
version: 3.7.3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this ends up adding new functionality, would it make more sense to have it as a minor bump instead of a patch?

MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards-compatible manner, and
PATCH version when you make backwards-compatible bug fixes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cirocosta Regarding semantic versioning, I fully agree.
I decided to make a patch, because I didn't break the contract at any point and improved documentation, but of course, I have added one feature.
I will bump the version

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@cirocosta
Copy link
Collaborator

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 8, 2019
@ghost
Copy link
Author

ghost commented Feb 8, 2019

Thanks for the PR!

Do you think these changes could come together with #10881?

thx!

I fully agree we should make:

  • Service separate.
  • Web endpoint separate.
  • Metrics endpoint separate.

What I would think about is to make somehow a refactor of variables convention.
Their placement wasn't super clear to me.
I would remove these hard-coded annotations for Prometheus.
What do you think, @cirocosta ?

@cirocosta
Copy link
Collaborator

Regarding refactoring the variables and removing the hard-coded annotations, I think that's great!

At the same time, it'd be nice to try to be as less backward incompatible as possible, although I think being able to support more workflows (not assuming stable#prometheus-operator or stable#prometheus - by not having too many opinions on the default variables) would end up requiring those changes anyway.

It'd also be nice to see if we can follow a pattern that might be already established out there in other charts around this 🤔

Maybe it's a matter of giving a try going with the refactor and collecting some thoughts from the other folks too.

Wdyt?

thx!

cc @william-tran

@ghost
Copy link
Author

ghost commented Feb 9, 2019

@cirocosta

  • Speaking about different charts. They mostly call it ServiceMonitor.
    I do not like this name, but I am fully open to call it like that.

serviceMonitor:

{{- if and (.Values.metrics.enabled) (.Values.metrics.serviceMonitor.enabled) }}

  • Regarding hard-coded Prometheus annotations.
    Here comes magic. Some helm charts hard-code it as well, but as a service Annotation. Some add variables. Some hard-code.

{{- if .Values.server.prometheus.scrape }}

prometheus.io/scrape: "true"

prometheus.io/scrape: "true"

  • I am fully in refactoring and I will be happy to help.

@helm-bot helm-bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed Contribution Allowed If the contributor has signed the DCO or the CNCF CLA (prior to the move to a DCO). labels Feb 9, 2019
Signed-off-by: Kamil Zabielski <xul.sitatirev@gmail.com>
@helm-bot helm-bot added Contribution Allowed If the contributor has signed the DCO or the CNCF CLA (prior to the move to a DCO). size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 9, 2019
@cirocosta
Copy link
Collaborator

cirocosta commented Feb 9, 2019

Thanks for putting the time into it!

From what you saw in the charts, does the following plan look like something that makes sense?

  • Prometheus Service segregated from web-svc (web-prometheus-svc.yaml ?)
  • ServiceMonitor as an object on its own, conditionally enabled by prometheus.serviceMonitor.enabled
    • web-servicemonitor.yaml conditionally created
    • selector matching the Prometheus service (created by web-prometheus.yaml or something similar)
  • Prometheus Service annotations with a default in values.yaml set to the standard prometheus.io/scrape

With those, here are some sample configurations that I think we'd enable:

  1. Manually enabling helm/charts#stable/prometheus to scrape the pods directly.
concourse:
  web:
    prometheus:
      enabled: true
      bindPort: 1337

web:
  annotations: # overriding default annotations ({})
    prometheus.io/scrape: "true"
    prometheus.io/port: "1337"

This would generate just the regular Web service and put those annotations under the web deployment, thus, making the pods scrapable.

  1. Enabling a Prometheus service without creating ServiceMonitor
concourse:
  web:
    prometheus:
      enabled: true
      bindPort: 1337

prometheus:
  service:
    enabled: true
    annotations:  # overriding default annotations ({prometheus.io ...})
      something.else.io/scrape: "true"
      something.else.io/port: "1337"
    serviceMonitor:
      enabled: false

This would generate an extra service from web-prometheus.yaml, but not a ServiceMonitor (so that someone wanting to do that by themselves would be able to - and not necessarily rely on prometheus-operator too).

  1. Enabling Prometheus service + automatic serviceMonitor creation
concourse:
  web:
    prometheus:
      enabled: true

prometheus:
  service:
    enabled: true
    serviceMonitor:
      enabled: true

This would generate both the extra prometheus service, as well as the ServiceMonitor object.


Wdyt?

ps.: the rationale for making prometheus.service.serviceMonitor chained in that way is that it seems like having a Service but not a ServiceMonitor object makes sense, but the opposite doesn't.

Thx!

@ghost
Copy link
Author

ghost commented Feb 9, 2019

  • I would simplify it.

(a) If someone wants to add prometheus.io annotations, he can handle it on his own with additional annotations for the service. Amen. Nothing more.

(b) If someone wants to add a ServiceMonitor, in my opinion, we should deliver a template with a boolean on values. Selector on ServiceMonitor is automatically created.

(c) I would create a service for metrics every time (in the end, boolean values parameter to handle it).

(d) I am fully in separating service (endpoints) for service, per se, and monitoring. Useful for NetworkPolicy.

What do you think?
If you agree you can create a ticket, I will help you with pleasure or review your code.

@ghost
Copy link
Author

ghost commented Feb 14, 2019

@cirocosta Do you want me to resolve the conflict?

@cirocosta
Copy link
Collaborator

Hi @xulsitatirev , yeah, please!

The plan you outlined sounds good to me. How do you feel about going for it in this very same PR?

Thx!

@ghost
Copy link
Author

ghost commented Feb 15, 2019

@cirocosta I will handle it.

@stale
Copy link

stale bot commented Mar 17, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Any further update will cause the issue/pull request to no longer be considered stale. Thank you for your contributions.

@stale stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 17, 2019
@stale
Copy link

stale bot commented Mar 31, 2019

This issue is being automatically closed due to inactivity.

@stale stale bot closed this Mar 31, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Contribution Allowed If the contributor has signed the DCO or the CNCF CLA (prior to the move to a DCO). lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. ok-to-test size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants