Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate enhanced meltdown handling for dependency-watchdog probe #5497

Merged
merged 4 commits into from Mar 8, 2022

Conversation

ashwani2k
Copy link
Contributor

@ashwani2k ashwani2k commented Mar 1, 2022

How to categorize this PR?

/area disaster-recovery
/area control-plane
/kind enhancement
/squash

What this PR does / why we need it:
This PR integrates the changes brought in with release v0.7.0 of dependency-watchdog.

The release brings in 2 prominent changes.

  1. The changes introduced with the DWD PR#39 are -

    1. Enhanced the dependency-watchdog probe config yaml with new fields introduced to enable scale[Up|Down] delays.
    2. Also dependency for each scale resource is also enabled to ensure that we only scale[Up|Down} resource provided the dependents are already in the desired state for the given scale operation.

    As a result of the above 2 changes the new config file is adapted to be bring down kube-controller-manager along with machine-controller-manager and cluster-autoscaler.
    Also while scaling up we give a 120 seconds delay to let the kubelet update the node status before scaling up kube-controller-manager. We give another 60 seconds delay to machine-controller-manager after kube-controller-manager is up to ensure that the node status is updated correctly by KCM.
    cluster-autoscalar shall come up immediately once machine-controller-manager is up.
    This shall ensure that we don't accidentally let MCM and Cluster Autoscaler to replace machines and wait for the api-server availability before taking any action.
    These new changes are reflected with new config file as depicted below and are accordingly adapted in the component files for dependency-watchdog.

     probes:
     - dependantScales:
       - replicas: null
         scaleRef:
           apiVersion: apps/v1
           kind: Deployment
           name: kube-controller-manager
         scaleUpDelaySeconds: 120
       - replicas: null
         scaleUpDelaySeconds: 60
         scaleRef:
           apiVersion: apps/v1
           kind: Deployment
           name: machine-controller-manager
         scaleRefDependsOn:
         - apiVersion: apps/v1
           kind: Deployment
           name: kube-controller-manager
       - replicas: null
         scaleRef:
           apiVersion: apps/v1
           kind: Deployment
           name: cluster-autoscaler
         scaleRefDependsOn:
         - apiVersion: apps/v1
           kind: Deployment
           name: machine-controller-manager
    
  2. The second list of changes are the ones introduced with the switch for leader election to endpoint leases PR#37 introduced by @ary1992.
    These change requires the existing cluster role for the dependency-watchdog-probe to be enhanced with following -

      - apiGroups:
        - coordination.k8s.io
        resources:
        - leases
        verbs:
        - create
      - apiGroups:
        - coordination.k8s.io
        resourceNames:
        - dependency-watchdog-probe
        - dependency-watchdog-endpoint
        resources:
        - leases
        verbs:
        - get
        - watch
        - update
    

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

  • Kindly review if the delay introduced for KCM and MCM are sufficient to avoid a meltdown.
  • Changes are tested with a local-setup of Gardener applying these on a shoot.
  • However one round of scaffolding test to upgrade an existing gardener cluster to the image of this PR and a newer version of DWD needs to be verified still and shall be done while the review of the PR is in progress.
  • We also noticed that DWD release is not updated to latest Go release and is still working with 1.13. It was an oversight as there has been no release for DWD for sometime, but we wish to include it in the imminent patch release because we still have some pending tasks to follow up -
    • Update the README.md with the changes introduced.
    • Update the test-cases for DWD to work with the config changes enhancements.
    • Work on the issue in DWD in reading the kubeconfigs when loading the rotated secrets#PR36
      In lieu of these imminent changes we can upgrade Go version to 1.17.7 with the next patch release of DWD to be integrated with g/g.

Release note:

Operators can now provide a `scaleUpDelaySeconds` or|and `scaleDownDelaySeconds` for individual dependent resources for dependency-watchdog probe to consider while scaling. 
In addition to the delay, for each resource managed by dependency-watchdog probe one can also specify additional dependent resources via a new field `scaleRefDependsOn`. This ensures that dependency-watchdog probe applies scaling operation on a resource only if the dependents for this resource defined under `scaleRefDependsOn` are available in the desired state as per the applicable scaling operation.
Switch default leader election resource lock for `dependency-watchdog` from `endpoints` to `endpointsleases`
Enhance package structure to isolate APIs
Export types in `pkg/restarter` and `pkg/scaler` to make them reusable for other packages.
Fix panic during shoot spec and status check.
License and copyright information is now specified in REUSE format.

@ashwani2k ashwani2k requested a review from a team as a code owner March 1, 2022 11:37
@gardener-robot gardener-robot added needs/review area/control-plane Control plane related area/disaster-recovery Disaster recovery related kind/enhancement Enhancement, improvement, extension size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 1, 2022
@ashwani2k
Copy link
Contributor Author

Currently the e2e tests are failing as I've imported github.com/gardener/gardener/extensions/pkg/controller/worker/genericactuator to get the MCMDeploymentName here as there was no constant available within g/g to capture this.
If someone has a better idea of ensuring this can be traced to then I can adapt that as well.

@rfranzke
Copy link
Member

rfranzke commented Mar 1, 2022

/assign

@rfranzke
Copy link
Member

rfranzke commented Mar 1, 2022

Currently the e2e tests are failing as I've imported github.com/gardener/gardener/extensions/pkg/controller/worker/genericactuator to get the MCMDeploymentName here as there was no constant available within g/g to capture this. If someone has a better idea of ensuring this can be traced to then I can adapt that as well.

Actually, MCM is an implementation detail of the Worker controller, but I think as part of this PR it is fair to let g/g populate it. If an extension does not use MCM and rather manages the machines on its own or via different means then DWD will simply have nothing that needs to be done/scaled-down.
Potentially we can move the constant to the v1beta1constants package (where most other constants are) to resolve the import restriction issue.

Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also bump the image in charts/images.yaml and incorporate the release notes from #5494 so that we have everything related to the change in one PR.

@acumino
Copy link
Member

acumino commented Mar 1, 2022

@rfranzke Should the k8s dependencies and c-r also be bumped to latest, I checked DWD is using v0.17 of k8s dependencies and v0.5.5 of c-r.

@rfranzke
Copy link
Member

rfranzke commented Mar 1, 2022

@acumino I think this is not related to this PR but rather to the maintenance of https://github.com/gardener/dependency-watchdog/, is it?

@acumino
Copy link
Member

acumino commented Mar 1, 2022

@acumino I think this is not related to this PR but rather to the maintenance of https://github.com/gardener/dependency-watchdog/, is it?

Yes, but @ashwani2k mentioned he will cut the patch release with latest go version, I guess k8s dependencies should be update alongside. WDYT?

@rfranzke
Copy link
Member

rfranzke commented Mar 1, 2022

I don't know whether this is strictly required, while it's certainly a reasonable thing to do. This must be decided by the maintainers of https://github.com/gardener/dependency-watchdog/.

@@ -15,6 +15,7 @@
package kubeapiserver

import (
worker "github.com/gardener/gardener/extensions/pkg/controller/worker/genericactuator"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we are importing a pkg from the extension library for a constant usage? I guess this is also why make verify fails (import-boss not allowing to import the extension library from ./pkg).

Copy link
Contributor Author

@ashwani2k ashwani2k Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any constant in g/g for this in v1beta1constants. So I used the one from the extension as I didn't want to introduce a new one if it goes untracked.
However as suggested by @rfranzke in #5497 (comment) to add it in v1beta1constants.
So this is fixed with the commit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With GEP-01 (extensibility) cloudprovider specific details are extracted to extensions. gardenlet does not need to know anything about MCM or cannot make any assumption that MCM is used. Actually from gardenlet's point of view there is only the Worker resource and that is the contract. The fact that provider extensions choose to deploy MCM as part of the Worker reconciliation is not a thing that gardenlet has to assume. IMO this PR is violating GEP-01 as it is making gardenlet to configure dependency-watchdog assuming that MCM is used. In theory, a provider extension can implement the contract without using MCM.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree to your concern @ialidzhikov.
Do you suggest an alternate approach here or you are fine with adding MCM deployment name in the constants.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it is hard to find out something with the new scaleRefDependsOn approach. Generally, having in mind the old config I was thinking of a well-known label that is configured in the dependency-watchdog-probe scales down all Deployments that match the well-known label - this allows extensions using MCM to add the well-known label to the MCM Deployment and in this way to "request" the MCM to be scaled down.
Do we need actually scaleRefDependsOn? Can't we simply scale down all components when the probe fails?
Assuming that the scaleRefDependsOn handling is needed, I am "fine" with the current approach because I cannot think of a good alternative.

Copy link
Contributor Author

@ashwani2k ashwani2k Mar 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can think of it. Currently I didn't want to introduce new semantics of identifying a scale resources. As the implementation already requires us to provide MCM as part of scaleRef as mentioned below

scaleRef:
       apiVersion: apps/v1
       kind: Deployment
       name: machine-controller-manager

So scaleRefDependsOn is not the issue here, if we want to do what you suggest we also need to change the design for scaleRef itself to have a new approach for selecting the deployment.

Do we need actually scaleRefDependsOn? Can't we simply scale down all components when the probe fails?

The logic currently works seamless for both ScaleUp and ScaleDowns.
Like what you mentioned scalingDown is done all at once. But, we need to consider the semantics of ScaleUp here. This is where we have a problem today, we scale up everything at once which is detrimental to even introduce MCM additionally along with KCM. As once MCM comes ups it will mark the nodes as Unknown before giving KCM any chance to update the node status and will start removing them. To avoid this we wish to delay it using scaleUpDelaySeconds.
However, even then we run the risk of KCM not being available and just starting MCM with some delay without checking if KCM is up will lead us to the same issue. So we introduced the scaleRefDependOn semantics to avoid running MCM on a state of the system which is not yet updated by KCM.
To not reinvent the wheel scaleRefDependsOn is just an array of same type as scaleRef.

We will explore if there is a better way to do it without breaking the extension contract for Gardener.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I again suggest to be pragmatic here and rather focus on getting this change in so that we can move on the other (blocking) issues like gardener/dependency-watchdog#36. IMO it's not problematic to specify MCM here as explained above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will explore if there is a better way to do it without breaking the extension contract for Gardener.

However, can you create an issue on dependency-watchdog side to make sure that we don't forget about this and this item is considered/worked on/brainstormed on when there is capacity? Thanks in advance!

Copy link
Contributor Author

@ashwani2k ashwani2k Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ialidzhikov As suggested created an Issue on DWD to track this.

go.mod Show resolved Hide resolved
@@ -15,6 +15,7 @@
package kubeapiserver

import (
worker "github.com/gardener/gardener/extensions/pkg/controller/worker/genericactuator"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will explore if there is a better way to do it without breaking the extension contract for Gardener.

However, can you create an issue on dependency-watchdog side to make sure that we don't forget about this and this item is considered/worked on/brainstormed on when there is capacity? Thanks in advance!

ScaleRef: autoscalingv1.CrossVersionObjectReference{
APIVersion: appsv1.SchemeGroupVersion.String(),
Kind: "Deployment",
Name: v1beta1constants.DeploymentNameClusterAutoscaler,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the Shoot does not enable autoscaling -> the cluster-autoscaler deployment is not present?

E0304 07:15:46.904892       1 prober.go:405] Scaling up dependents of shoot-kube-apiserver/shoot--foo--bar: apps/v1.Deployment/cluster-autoscaler: replicas=1: failed
E0304 07:15:46.906923       1 prober.go:452] Scaling up dependents of shoot-kube-apiserver/shoot--foo--bar: apps/v1.Deployment/cluster-autoscaler: error getting deployments.apps: deployments/scale.apps "cluster-autoscaler" not found
E0304 07:15:46.906938       1 prober.go:500] Scaling up dependents of shoot-kube-apiserver/shoot--foo--bar: apps/v1.Deployment/cluster-autoscaler: Could not get target reference: deployments/scale.apps "cluster-autoscaler" not found
E0304 07:15:46.906943       1 prober.go:501] Scaling up dependents of shoot-kube-apiserver/shoot--foo--bar: apps/v1.Deployment/cluster-autoscaler: replicas=1: failed

1 drawback I see is that in this case the logs are "polluted" with such error logs. And this is only 1 Shoot, image 50 Shoots on a Seed to have cluster-autoscaler disabled. dependency-watchdog should rather know that this component is optional and log in info level something like "the Deployment is not present, hence skipping it".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this does not cause issues for the dependency-watchdog working, then I am also fine to create an issue on dependency-watchdog side about the verbose error logging and hope that this can be fixed with an upcoming version of the component.

Copy link
Contributor Author

@ashwani2k ashwani2k Mar 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A totally valid concern. I've filed an Issue with the DWD repo. Also filed a PR for the fix and it also vendor Go 1.17 to be part of the patch release. v0.7.1

The new logs when the cluster-autoscaler is available will be:

| I0307 15:11:17.440689       1 prober.go:368] shoot-kube-apiserver/shoot--<project>--<shootname>/external: probe result: &scaler.probeResult{lastError:(*url.Error)(0xc000523200), resultRun:4}                                                                                                                                                                                  │
│ I0307 15:11:17.440816       1 prober.go:414] Scaling down dependents of shoot-kube-apiserver/shoot--<project>--<shootname>: apps/v1.Deployment/kube-controller-manager: skipped because desired=0 and current=0                                                                                                                                                                 │
│ I0307 15:11:17.441299       1 prober.go:414] Scaling down dependents of shoot-kube-apiserver/shoot--<project>--<shootname>: apps/v1.Deployment/machine-controller-manager: skipped because desired=0 and current=0                                                                                                                                                              │
│ I0307 15:11:17.441319       1 prober.go:414] Scaling down dependents of shoot-kube-apiserver/shoot--<project>--<shootname>: apps/v1.Deployment/cluster-autoscaler: skipped because desired=0 and current=0                                                                                                                                                                      │
│ I0307 15:11:37.582672       1 reflector.go:268] github.com/gardener/gardener/pkg/client/extensions/informers/externalversions/factory.go:117: forcing resync                                                                                                                                                                                                             │

When cluster-autoscaler deployment is not present will be:

│ I0307 15:11:37.582763       1 scaler.go:67] Update event on cluster: shoot--<project>--<shootname>                                                                                                                                                                                                                                                                              │
│ I0307 15:11:38.692669       1 reflector.go:268] k8s.io/client-go/informers/factory.go:135: forcing resync                                                                                                                                                                                                                                                                │
│ I0307 15:11:51.013434       1 prober.go:353] shoot-kube-apiserver/shoot--<project>--<shootname>/internal: probe succeeded                                                                                                                                                                                                                                                       │
│ I0307 15:11:51.013449       1 prober.go:368] shoot-kube-apiserver/shoot--<project>--<shootname>/internal: probe result: &scaler.probeResult{lastError:error(nil), resultRun:4}                                                                                                                                                                                                  │
│ I0307 15:11:51.024917       1 prober.go:356] shoot-kube-apiserver/shoot--<project>--<shootname>/external: probe failed with error: Get "https://api.<shootname>.<cluster-address>.com/version?timeout=10s": dial tcp: lookup api.<shootname>.cluster-address>.com on 100.64.0.10:53: no such host. Will retry...                                       │
│ I0307 15:11:51.024987       1 prober.go:368] shoot-kube-apiserver/shoot--<project>--<shootname>/external: probe result: &scaler.probeResult{lastError:(*url.Error)(0xc0005a1230), resultRun:4}                                                                                                                                                                                  │
│ I0307 15:11:51.025030       1 prober.go:414] Scaling down dependents of shoot-kube-apiserver/shoot--<project>--<shootname>: apps/v1.Deployment/kube-controller-manager: skipped because desired=0 and current=0                                                                                                                                                                 │
│ I0307 15:11:51.025280       1 prober.go:414] Scaling down dependents of shoot-kube-apiserver/shoot--<project>--<shootname>: apps/v1.Deployment/machine-controller-manager: skipped because desired=0 and current=0                                                                                                                                                              │
│ E0307 15:11:51.025351       1 prober.go:405] Scaling down dependents of shoot-kube-apiserver/shoot--<project>--<shootname>: apps/v1.Deployment/cluster-autoscaler: Skipped as target reference: deployment.apps "cluster-autoscaler" not found                                                                                                                                  │
│ I0307 15:12:07.583315       1 reflector.go:268] github.com/gardener/gardener/pkg/client/extensions/informers/externalversions/factory.go:117: forcing resync                                                                                                                                                                                                             │
│ I0307 15:12:07.583545       1 scaler.go:67] Update event on cluster: shoot--<project>--<shootname>       

Once its merged I can also cut a patch release and merge it here or vendor it later if it takes a lot more time to merge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up. Should we wait for dependency-watchdog@v0.7.1 as part of this PR or should we proceed with dependency-watchdog@v0.7.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check with @shreyas-s-rao if we can merge today and release a patch. In that case we can go with v0.7.1.
If we can not release it today, then you can go ahead and we vendor it along with the bug fix for secret rotation issue which will require a patch release.

Copy link
Contributor Author

@ashwani2k ashwani2k Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ialidzhikov I checked however we won't be able to cut the release today, as Shreyas won't be able to complete the review today due to other things at hand. So either we wait till tomorrow as we are confident of releasing it tomorrow or we can vendor it with the next set of changes for DWD.

Copy link
Member

@rfranzke rfranzke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Member

@ialidzhikov ialidzhikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

/hold
for a while because of #5497 (comment)

@ialidzhikov
Copy link
Member

/unhold

@rfranzke
Copy link
Member

rfranzke commented Mar 8, 2022

@kris94 Can you please add the release milestone and merge this one?

@krgostev krgostev added this to the v1.42 milestone Mar 8, 2022
@krgostev
Copy link
Contributor

krgostev commented Mar 8, 2022

/reviewed/ok-to-test

@rfranzke rfranzke merged commit f5b355e into gardener:master Mar 8, 2022
@ashwani2k ashwani2k deleted the dwd-meldown-handling branch March 9, 2022 03:26
krgostev pushed a commit to krgostev/gardener that referenced this pull request Apr 21, 2022
…ardener#5497)

* Vendored dependency-watchdog 0.7.0

* Adapted dependency-watchdog component to work with DWD v0.7.0

* Updated charts for DWD and adapted for MCM deployment under v1beta1 constants

* Adapted RBAC for dependency-watchdog-endpoint as well
krgostev pushed a commit to krgostev/gardener that referenced this pull request Jul 5, 2022
…ardener#5497)

* Vendored dependency-watchdog 0.7.0

* Adapted dependency-watchdog component to work with DWD v0.7.0

* Updated charts for DWD and adapted for MCM deployment under v1beta1 constants

* Adapted RBAC for dependency-watchdog-endpoint as well
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/control-plane Control plane related area/disaster-recovery Disaster recovery related kind/api-change API change with impact on API users kind/enhancement Enhancement, improvement, extension size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants