Merge pull request #250 from joelsmith/rebase

OCPCLOUD-1851: Upstream rebase to CA 1.26.1 and VPA 0.13
openshift · Feb 23, 2023 · c58c53b · c58c53b
2 parents 9a0de22 + f166a59
commit c58c53b
Show file tree

Hide file tree

Showing 2,656 changed files with 315,249 additions and 167,332 deletions.
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -17,7 +17,7 @@ jobs:
       - name: Set up Go
         uses: actions/setup-go@v2
         with:
-          go-version: 1.18.1
+          go-version: 1.19
 
       - uses: actions/checkout@v2
         with:

diff --git a/balancer/proposals/balancer.md b/balancer/proposals/balancer.md
@@ -0,0 +1,182 @@
+
+# KEP - Balancer 
+
+## Introduction
+
+One of the problems that the users are facing when running Kubernetes deployments is how to 
+deploy pods across several domains and keep them balanced and autoscaled at the same time. 
+These domains may include:
+
+* Cloud provider zones inside a single region, to ensure that the application is still up and running, even if one of the zones has issues.
+* Different types of Kubernetes nodes. These may involve nodes that are spot/preemptible, or of different machine families. 
+
+A single Kuberentes deployment may either leave the placement entirely up to the scheduler 
+(most likely leading to something not entirely desired, like all pods going to a single domain) or 
+focus on a single domain (thus not achieving the goal of being in two or more domains). 
+
+PodTopologySpreading solves the problem a bit, but not completely. It allows only even spreading 
+and once the deployment gets skewed it doesn’t do anything to rebalance. Pod topology spreading 
+(with skew and/or ScheduleAnyway flag) is also just a hint, if skewed placement is available and 
+allowed then Cluster Autoscaler is not triggered and the user ends up with a skewed deployment. 
+A user could specify a strict pod topolog spreading but then, in case of problems the deployment
+would not move its pods to the domains that are available. The growth of the deployment would also 
+be totally blocked as the available domains would be too much skewed.
+
+Thus, if full flexibility is needed, the only option is to have multiple deployments, targeting 
+different domains. This setup however creates one big problem. How to consistently autoscale multiple 
+deployments? The simplest idea - having multiple HPAs is not stable, due to different loads, race 
+conditions or so, some domains may grow while the others are shrunk. As HPAs and deployments are 
+not connected anyhow, the skewed setup will not fix itself automatically. It may eventually come to 
+a semi-balanced state but it is not guaranteed. 
+
+
+Thus there is a need for some component that will:
+
+* Keep multiple deployments aligned. For example it may keep an equal ratio between the number of
+pods in one deployment and the other. Or put everything to the first and overflow to the second and so on.
+* React to individual deployment problems should it be zone outage or lack of spot/preemptible vms. 
+* Actively try to rebalance and get to the desired layout.
+* Allow to autoscale all deployments with a single target, while maintaining the placement policy.
+
+## Balancer 
+
+Balancer is a stand-alone controller, living in userspace (or in control plane, if needed) exposing 
+a CRD API object, also called Balancer. Each balancer object has pointers to multiple deployments 
+or other pod-controlling objects that expose the Scale subresource. Balancer periodically checks 
+the number of running and problematic pods inside each of the targets, compares it with the desired 
+number of replicas, constraints and policies and adjusts the number of replicas on the targets, 
+should some of them run too many or too few of them. To allow being an HPA target Balancer itself
+exposes the Scale subresource.
+
+## Balancer API
+
+```go
+// Balancer is an object used to automatically keep the desired number of
+// replicas (pods) distributed among the specified set of targets (deployments
+// or other objects that expose the Scale subresource).
+type Balancer struct {
+   metav1.TypeMeta
+   // Standard object metadata. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#metadata
+   // +optional
+   metav1.ObjectMeta
+   // Specification of the Balancer behavior.
+   // More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#spec-and-status.
+   Spec BalancerSpec
+   // Current information about the Balancer.
+   // +optional
+   Status BalancerStatus
+}
+
+// BalancerSpec is the specification of the Balancer behavior.
+type BalancerSpec struct {
+   // Targets is a list of targets between which Balancer tries to distribute
+   // replicas.
+   Targets []BalancerTarget
+   // Replicas is the number of pods that should be distributed among the
+   // declared targets according to the specified policy.
+   Replicas int32
+   // Selector that groups the pods from all targets together (and only those).
+   // Ideally it should match the selector used by the Service built on top of the
+   // Balancer. All pods selectable by targets' selector must match to this selector,
+   // however target's selector don't have to be a superset of this one (although
+   // it is recommended).
+   Selector metav1.LabelSelector
+   // Policy defines how the balancer should distribute replicas among targets.
+   Policy BalancerPolicy
+}
+
+// BalancerTarget is the declaration of one of the targets between which the balancer
+// tries to distribute replicas.
+type BalancerTarget struct {
+   // Name of the target. The name can be later used to specify
+   // additional balancer details for this target.
+   Name string
+   // ScaleTargetRef is a reference that points to a target resource to balance.
+   // The target needs to expose the Scale subresource.
+   ScaleTargetRef hpa.CrossVersionObjectReference
+   // MinReplicas is the minimum number of replicas inside of this target.
+   // Balancer will set at least this amount on the target, even if the total
+   // desired number of replicas for Balancer is lower.
+   // +optional
+   MinReplicas *int32
+   // MaxReplicas is the maximum number of replicas inside of this target.
+   // Balancer will set at most this amount on the target, even if the total
+   // desired number of replicas for the Balancer is higher.
+   // +optional
+   MaxReplicas *int32
+}
+
+// BalancerPolicyName is the name of the balancer Policy.
+type BalancerPolicyName string
+const (
+   PriorityPolicyName     BalancerPolicyName = "priority"
+   ProportionalPolicyName BalancerPolicyName = "proportional"
+)
+
+// BalancerPolicy defines Balancer policy for replica distribution.
+type BalancerPolicy struct {
+   // PolicyName decides how to balance replicas across the targets.
+   // Depending on the name one of the fields Priorities or Proportions must be set.
+   PolicyName BalancerPolicyName
+   // Priorities contains detailed specification of how to balance when balancer
+   // policy name is set to Priority.
+   // +optional
+   Priorities *PriorityPolicy
+   // Proportions contains detailed specification of how to balance when
+   // balancer policy name is set to Proportional.
+   // +optional
+   Proportions *ProportionalPolicy
+   // Fallback contains specification of how to recognize and what to do if some
+   // replicas fail to start in one or more targets. No fallback happens if not-set.
+   // +optional
+   Fallback *Fallback
+}
+
+// PriorityPolicy contains details for Priority-based policy for Balancer.
+type PriorityPolicy struct {
+   // TargetOrder is the priority-based list of Balancer targets names. The first target
+   // on the list gets the replicas until its maxReplicas is reached (or replicas
+   // fail to start). Then the replicas go to the second target and so on. MinReplicas
+   // is guaranteed to be fulfilled, irrespective of the order, presence on the
+   // list, and/or total Balancer's replica count.
+   TargetOrder []string
+}
+
+// ProportionalPolicy contains details for Proportion-based policy for Balancer.
+type ProportionalPolicy struct {
+   // TargetProportions is a map from Balancer targets names to rates. Replicas are
+   // distributed so that the max difference between the current replica share
+   // and the desired replica share is minimized. Once a target reaches maxReplicas
+   // it is removed from the calculations and replicas are distributed with
+   // the updated proportions. MinReplicas is guaranteed for a target, irrespective
+   // of the total Balancer's replica count, proportions or the presence in the map.
+   TargetProportions map[string]int32
+}
+
+// Fallback contains information how to recognize and handle replicas
+// that failed to start within the specified time period.
+type Fallback struct {
+   // StartupTimeout defines how long will the Balancer wait before considering
+   // a pending/not-started pod as blocked and starting another replica in some other
+   // target. Once the replica is finally started, replicas in other targets
+   // may be stopped.
+   StartupTimeout metav1.Duration
+}
+
+// BalancerStatus describes the Balancer runtime state.
+type BalancerStatus struct {
+   // Replicas is an actual number of observed pods matching Balancer selector.
+   Replicas int32
+   // Selector is a query over pods that should match the replicas count. This is same
+   // as the label selector but in the string format to avoid introspection
+   // by clients. The string will be in the same format as the query-param syntax.
+   // More info about label selectors: http://kubernetes.io/docs/user-guide/labels#label-selectors
+   Selector string
+   // Conditions is the set of conditions required for this Balancer to work properly,
+   // and indicates whether or not those conditions are met.
+   // +optional
+   // +patchMergeKey=type
+   // +patchStrategy=merge
+   Conditions []metav1.Condition
+}
+```
diff --git a/builder/OWNERS b/builder/OWNERS
@@ -1,10 +1,9 @@
 approvers:
-- aleksandra-malinowska
-- losipiuk
 - maciekpytel
 - mwielgus
 reviewers:
-- aleksandra-malinowska
-- losipiuk
 - maciekpytel
 - mwielgus
+emeritus_approvers:
+- aleksandra-malinowska # 2022-09-30
+- losipiuk # 2022-09-30
diff --git a/charts/cluster-autoscaler/Chart.yaml b/charts/cluster-autoscaler/Chart.yaml
@@ -11,4 +11,4 @@ name: cluster-autoscaler
 sources:
   - https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler
 type: application
-version: 9.20.1
+version: 9.21.1
diff --git a/charts/cluster-autoscaler/README.md b/charts/cluster-autoscaler/README.md
@@ -367,6 +367,7 @@ Though enough for the majority of installations, the default PodSecurityPolicy _
 | serviceMonitor.annotations | object | `{}` | Annotations to add to service monitor |
 | serviceMonitor.enabled | bool | `false` | If true, creates a Prometheus Operator ServiceMonitor. |
 | serviceMonitor.interval | string | `"10s"` | Interval that Prometheus scrapes Cluster Autoscaler metrics. |
+| serviceMonitor.metricRelabelings | object | `{}` | MetricRelabelConfigs to apply to samples before ingestion. |
 | serviceMonitor.namespace | string | `"monitoring"` | Namespace which Prometheus is running in. |
 | serviceMonitor.path | string | `"/metrics"` | The path to scrape for metrics; autoscaler exposes `/metrics` (this is standard) |
 | serviceMonitor.selector | object | `{"release":"prometheus-operator"}` | Default to kube-prometheus install (CoreOS recommended), but should be set according to Prometheus install. |

diff --git a/charts/cluster-autoscaler/templates/_helpers.tpl b/charts/cluster-autoscaler/templates/_helpers.tpl
@@ -70,10 +70,13 @@ Return the appropriate apiVersion for podsecuritypolicy.
 {{- $kubeTargetVersion := default .Capabilities.KubeVersion.GitVersion .Values.kubeTargetVersionOverride }}
 {{- if semverCompare "<1.10-0" $kubeTargetVersion -}}
 {{- print "extensions/v1beta1" -}}
+{{- if semverCompare ">1.21-0" $kubeTargetVersion -}}
+{{- print "policy/v1" -}}
 {{- else -}}
 {{- print "policy/v1beta1" -}}
 {{- end -}}
 {{- end -}}
+{{- end -}}
 
 {{/*
 Return the appropriate apiVersion for podDisruptionBudget.

diff --git a/charts/cluster-autoscaler/templates/deployment.yaml b/charts/cluster-autoscaler/templates/deployment.yaml
@@ -59,6 +59,11 @@ spec:
             - --nodes={{ .minSize }}:{{ .maxSize }}:{{ .name }}
             {{- end }}
           {{- end }}
+          {{- if eq .Values.cloudProvider "rancher" }}
+            {{- if .Values.cloudConfigPath }}
+            - --cloud-config={{ .Values.cloudConfigPath }}
+            {{- end }}
+          {{- end }}
           {{- if eq .Values.cloudProvider "aws" }}
             {{- if .Values.autoDiscovery.clusterName }}
             - --node-group-auto-discovery=asg:tag={{ tpl (join "," .Values.autoDiscovery.tags) . }}

diff --git a/charts/cluster-autoscaler/templates/servicemonitor.yaml b/charts/cluster-autoscaler/templates/servicemonitor.yaml
@@ -20,6 +20,10 @@ spec:
   - port: {{ .Values.service.portName }}
     interval: {{ .Values.serviceMonitor.interval }}
     path: {{ .Values.serviceMonitor.path }}
+    {{- if .Values.serviceMonitor.metricRelabelings }}
+    metricRelabelings:
+{{ tpl (toYaml .Values.serviceMonitor.metricRelabelings | indent 6) . }}
+    {{- end }}
   namespaceSelector:
     matchNames:
       - {{.Release.Namespace}}

diff --git a/charts/cluster-autoscaler/values.yaml b/charts/cluster-autoscaler/values.yaml
@@ -344,6 +344,9 @@ serviceMonitor:
   path: /metrics
   # serviceMonitor.annotations -- Annotations to add to service monitor
   annotations: {}
+  ## [RelabelConfig](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/api.md#monitoring.coreos.com/v1.RelabelConfig)
+  # serviceMonitor.metricRelabelings -- MetricRelabelConfigs to apply to samples before ingestion.
+  metricRelabelings: {}
 
 ## Custom PrometheusRule to be defined
 ## The value is evaluated as a template, so, for example, the value can depend on .Release or .Chart