-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GEP-19] Monitoring Stack - Migrating to the prometheus-operator
#6151
Conversation
Skipping CI for Draft Pull Request. |
@wyb1: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
prometheus-operator
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor wording issues
Co-authored-by: Wesley Bermbach <wesley.bermbach@sap.com> Co-authored-by: Istvan Zoltan Ballok <istvan.zoltan.ballok@sap.com>
Thanks for the reviews from everyone so far. Question: Should I create new commits instead of force pushing to make the changes easier to track? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this proposal, really looking forward to it!
I'm not through with my review, but I already left some questions and thoughts :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made it through the document and added some more comments.
Overall, it looks pretty good and I'm excited for this :)
In general, it would be good to precisely define all relevant contracts already in this GEP. This will make it easier to agree on something before jumping into the implementation.
Also, please open an umbrella issue with the concrete steps for implementing this proposal once this PR gets merged :)
Thanks for addressing my comments/questions. I think on high level I don't have additional comments, lgtm. |
We can use gardener/monitoring#14 as an umbrella issue. I will add items there. |
…6293) * `garden` namespace deployment is only needed for second kind cluster In the first kind cluster, the `garden` namespace already exists because it runs the Gardener control plane. Without this, the second client-side apply removes the project labels from the `Namespace`. Co-Authored-By: Tim Ebert <tim.ebert@sap.com> * Enable `ShootVPAEnabledByDefault` admission plugin in local setup Co-Authored-By: Tim Ebert <tim.ebert@sap.com> * Simplify gardenlet bootstrap kubeconfig This makes it usable for locally created `ManagedSeed`s and follows the same pattern like in `example/gardener-local/gardenlet/values-kind2.yaml` Co-Authored-By: Tim Ebert <tim.ebert@sap.com> * Add documentation for `ManagedSeed`s Co-Authored-By: Tim Ebert <tim.ebert@sap.com> * Add example Kubernetes resources for local `ManagedSeed`s Co-Authored-By: Tim Ebert <tim.ebert@sap.com> * Register `node` webhook for shoot clusters Co-Authored-By: Tim Ebert <tim.ebert@sap.com> * Introduce seed-local container image registry We will use this registry as a mirror for shoots such that the images built by Skaffold are accessible from the shoot cluster in case it gets registered as `ManagedSeed`. This is needed because we don't want to push the images built by Skaffold to any official, publicly available registry. In the future, we might even be able to reuse this such that we can speed up the image pull processing times. Co-Authored-By: Tim Ebert <tim.ebert@sap.com> * Allow machine pods to talk to the registry Co-Authored-By: Tim Ebert <tim.ebert@sap.com> * Mutate `containerd` config to import additional configuration files This only applies to newly created nodes. Co-Authored-By: Tim Ebert <tim.ebert@sap.com> * Bump `machine-controller-manager-provider-local` image This includes gardener-attic/machine-controller-manager-provider-local@f2c9319 which allows machine pods to talk to the seed API server. In the local setup, the seed API server is also the garden API server and the gardenlet needs to talk to it to register the `Seed`. Co-Authored-By: Tim Ebert <tim.ebert@sap.com> * Mount `backup-path` volume with `DirectoryOrCreate` mode in `provider-local` This will create the directory if it does not exist which is the case for shoot clusters registered as `ManagedSeed`s. Co-Authored-By: Tim Ebert <tim.ebert@sap.com> * Explicitly delete `Ingress`es in seed deletion This effectively fixes #6062. It's actually a work around since not all seed system components (like the monitoring stack) are deployed via `ManagedResource`s yet. Hence, `gardener-resource-manager` does not clean this up for us and we have to delete the resources manually. We only do it for `Ingress`es now to fix above mentioned bug since the deployment of the monitoring stack is anyways planned to be refactored with [GEP-19](#6151). Co-Authored-By: Tim Ebert <tim.ebert@sap.com> * Only delete seed system resources after all other resources are deleted Otherwise, we might delete important system resources like `PriorityClass`es which can cause extensions to not come up anymore (e.g. after being scaled by VPA). This can result in a deadlock during seed deletion. Co-Authored-By: Tim Ebert <tim.ebert@sap.com> * Make `defaultShoot` function reuseable in `e2e` test packages Co-Authored-By: Tim Ebert <tim.ebert@sap.com> * Add e2e test for `ManagedSeed`s Co-Authored-By: Tim Ebert <tim.ebert@sap.com> * Drop testmachinery-based `ManagedSeed` test This is no longer valuable now that we have an e2e test which can run on each PR and periodically on `master` branch. Co-Authored-By: Tim Ebert <tim.ebert@sap.com> * Address PR review feedback * Address PR review feedback Co-authored-by: Tim Ebert <tim.ebert@sap.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/hold cancel
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: timebertt The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold |
/lgtm |
/override pull-gardener-e2e-kind |
@timebertt: Overrode contexts on behalf of timebertt: pull-gardener-e2e-kind In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
* 'master' of github.com:gardener/gardener: (51 commits) Switch extension controller to `logr` and streamline/cleanup logs (gardener#6332) Switch `./test/...` packages to `logr` and drop `github.com/sirupsen/logrus` dependency (gardener#6316) Only check shoot conditions during hibernation integration test (gardener#6325) Add dashboard for monitoring conntrack race failures. (gardener#6329) Reconcile quota before rbac (gardener#6326) Update istio to v1.14.1 (gardener#6271) Update gardenlet's base image to alpine:3.16.0 (gardener#6321) Update envoy proxy to v1.21.4 (gardener#6320) Deploy the metrics server to the kind cluster (gardener#6301) Fix tools download for aarch64 (arm64) (gardener#6314) update with latest CA releases (gardener#6295) Add missing unit tests for the predicates provided by the extensions library (gardener#6249) [GEP-19] Monitoring Stack - Migrating to the `prometheus-operator` (gardener#6151) Revert "Recreate DWD deployment if needed" (gardener#6307) Update to golang 1.18.4 (gardener#6300) Cleaned up imports in vpn-seed-server (gardener#6315) Prepare next Dev Cycle v1.52.0-dev Release v1.51.0 Add pre/post reconciliation/deletion hooks for the Worker resource (gardener#6290) Update the supported values in the usage text of the `--leader-election-resource-lock` flag (gardener#6304) ...
How to categorize this PR?
/area monitoring
/area documentation
/kind discussion
What this PR does / why we need it:
A proposal on how Gardener can migrate to the prometheus operator.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Release note: