Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics collection profiles #1298

Open
wants to merge 26 commits into
base: master
Choose a base branch
from

Conversation

JoaoBraveCoding
Copy link
Contributor

@JoaoBraveCoding JoaoBraveCoding commented Dec 7, 2022

This PR proposes the addition of the metrics collection profiles feature for the cluster-monitoring-operator

Issue https://issues.redhat.com/browse/MON-3043

@JoaoBraveCoding
Copy link
Contributor Author

JoaoBraveCoding commented Dec 7, 2022

/assign @simonpasquier @jan--f

Some sections still need to be finished and I still want to polish the text in some parts but consider it a v1 ;)

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 7, 2022
@JoaoBraveCoding JoaoBraveCoding changed the title Scrape Profiles WIP: Scrape Profiles Dec 7, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 7, 2022

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

Copy link
Contributor

@jan--f jan--f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I left a few comments to start us off.

enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
Copy link
Contributor

@simonpasquier simonpasquier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice start!

enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
Copy link
Contributor

@jan--f jan--f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more comments, but I think this looks pretty good. I think the draft status can be removed soon so we can gather feedback from others.

ServiceMonitor examples

Signed-off-by: JoaoBraveCoding <jmarcal@redhat.com>
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
enhancements/monitoring/scrape-profiles.md Outdated Show resolved Hide resolved
JoaoBraveCoding and others added 3 commits January 20, 2023 12:58
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: JoaoBraveCoding <jmarcal@redhat.com>
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 11, 2023
@JoaoBraveCoding
Copy link
Contributor Author

/remove-lifecycle stale

@openshift-ci openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2023
@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 11, 2023
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 16, 2024
@rexagod
Copy link
Member

rexagod commented Jan 17, 2024

/remove-lifecycle rotten

Some thoughts on merging/layering the Collection Profiles featureset with the EP in question.

The primary setback I can observe is the fact that the two differ w.r.t. the degree of dynamism they offer. Users are allowed to switch back and forth between profiles, but w.r.t. the EP, they will need to "commit" to one of the full or minimal profiles for the cluster lifecycle because (2) we plan to support turning a component on, but not turning it off. This is in line with other superseding use-cases to ours, such as (5) eliminating or replacing cluster profiles which has been categorized as a non-goal.

Furthermore, since we cannot embed logic into the installer itself ((6) encoding logic in the installer itself about which components should be disabled under specific circumstances ), at the point where it's initialized, there would be little to no components that actually define profiles to validate in the first place. The installer will ideally need to check for the profiles defined by these OLM components' exporters down the line, and validate them every now and then, which seems fall in the EP non-goals ((4) allowing components to be disabled post-install).

Finally, every dependent resource that's a part the Collection Profiles featureset, such as rules, alerts, SMs, etc., that components deploy may or may not have the capability annotation with it, but may define a collection profile no less, which can cause them to be left out, and may not be the robust assumption we'd like to have (since we'd be subject to assuming that every component respects the capability annotation).

Consequently, there are constraints on embedding logic to handle specific cases such as updating service monitors with the appropriate regex for the currently set capability.openshift.io/name=monitoring_collection_profiles_{full,minimal} annotation. If dynamism is something we can tolerate to skip, the EP would still delegate such operations back to CMO, since it has no intention of doing that itself (no embedded logic), while including this EP as a dependency into the currently established workflow, which we'll have to build on top of.

All that being said, one way partially to go about this (while respecting the terms imposed) is by defining a subset capability (minimal), while the superset (full) can be optionally enabled down the line, if need be, since ((2) admins can enable a previously excluded optional component, at runtime), although it's easy to see this will not be of much use owing to the constraints that shifts control from components to the central stack. Additionally, it's worth explicitly mentioning here that this procedure goes forward, is irreversible, and can only happen once.

This is what leads me to believe it's better to keep the two decoupled to harness the most out of each featureset.

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 17, 2024
@dhellmann
Copy link
Contributor

#1555 is changing the enhancement template in a way that will cause the header check in the linter job to fail for existing PRs. If this PR is merged within the development period for 4.16 you may override the linter if the only failures are caused by issues with the headers (please make sure the markdown formatting is correct). If this PR is not merged before 4.16 development closes, please update the enhancement to conform to the new template.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 13, 2024
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 20, 2024
@openshift-bot
Copy link

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Mar 28, 2024
Copy link
Contributor

openshift-ci bot commented Mar 28, 2024

@openshift-bot: Closed this PR.

In response to this:

Rotten enhancement proposals close after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Reopen the proposal by commenting /reopen.
Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Exclude this proposal from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dhellmann
Copy link
Contributor

(automated message) This pull request is closed with lifecycle/rotten. The associated Jira ticket, MON-2483, has status "Closed, Done". Should the PR be reopened, updated, and merged? If not, removing the lifecycle/rotten label will tell this bot to ignore it in the future.

@rexagod
Copy link
Member

rexagod commented Mar 30, 2024

/reopen
/remove-lifecycle rotten
/lifecycle frozen

@openshift-ci openshift-ci bot reopened this Mar 30, 2024
Copy link
Contributor

openshift-ci bot commented Mar 30, 2024

@rexagod: Reopened this PR.

In response to this:

/reopen
/remove-lifecycle rotten
/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 30, 2024
Copy link
Contributor

openshift-ci bot commented Mar 30, 2024

@rexagod: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/reopen
/remove-lifecycle rotten
/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Contributor

openshift-ci bot commented Mar 30, 2024

@JoaoBraveCoding: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/markdownlint 48a5c98 link true /test markdownlint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link

Inactive enhancement proposals go stale after 28d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle stale.
Stale proposals rot after an additional 7d of inactivity and eventually close.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 28, 2024
@openshift-bot
Copy link

Stale enhancement proposals rot after 7d of inactivity.

See https://github.com/openshift/enhancements#life-cycle for details.

Mark the proposal as fresh by commenting /remove-lifecycle rotten.
Rotten proposals close after an additional 7d of inactivity.
Exclude this proposal from closing by commenting /lifecycle frozen.

If this proposal is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 5, 2024
rexagod added a commit to rexagod/origin that referenced this pull request May 5, 2024
The tests cover the following cases:
* Apply every collection profile, and check if appropriate metrics are
  available (or unavailable).
* CMO configuration reconcilation after a collection profile is applied
  is checked within the CMO tests itself.
* Prometheus `*monitor` selectors' reconciliation after a collection
  profile is applied is checked within the CMO tests itself.

Refer: openshift/enhancements#1298

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
rexagod added a commit to rexagod/origin that referenced this pull request May 5, 2024
The tests cover the following cases:
* Apply every collection profile, and check if the expected
  `ServiceMonitor`s are available.
* CMO configuration reconciliation after a collection profile is applied
  is checked within the CMO tests itself.
* Prometheus `*monitor` selectors' reconciliation after a collection
  profile is applied is checked within the CMO tests itself.

Refer: openshift/enhancements#1298

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
rexagod added a commit to rexagod/origin that referenced this pull request May 7, 2024
The tests cover the following cases:
* Apply every collection profile, and check if the expected
  `ServiceMonitor`s are available.
* CMO configuration reconciliation after a collection profile is applied
  is checked within the CMO tests itself.
* Prometheus `*monitor` selectors' reconciliation after a collection
  profile is applied is checked within the CMO tests itself.

Refer: openshift/enhancements#1298

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
rexagod added a commit to rexagod/origin that referenced this pull request May 7, 2024
The tests cover the following cases:
* Apply every collection profile, and check if the expected
  `ServiceMonitor`s are available.
* CMO configuration reconciliation after a collection profile is applied
  is checked within the CMO tests itself.
* Prometheus `*monitor` selectors' reconciliation after a collection
  profile is applied is checked within the CMO tests itself.

Refer: openshift/enhancements#1298

Signed-off-by: Pranshu Srivastava <rexagod@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants