Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to configure disruption controls for karpenter #16311

Open
clayrisser opened this issue Feb 2, 2024 · 8 comments · May be fixed by #16327
Open

Unable to configure disruption controls for karpenter #16311

clayrisser opened this issue Feb 2, 2024 · 8 comments · May be fixed by #16327
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@clayrisser
Copy link

I am unable to figure out how to add a disruption consolidationPolicy and expireAfter in my karpenter node pools for kops. Where do I configure this?

The karpenter docs discuss this here.

https://karpenter.sh/v0.32/concepts/nodepools/#specdisruption

I'm not even able to see a CRD for karpenter NodePools, so I'm guessing kops has another way of managing the disruption controls?

  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h # 30 * 24h = 720h
@moshevayner
Copy link
Member

From what I can tell right now, kOps installs karpenter version 0.31.3 by default which didn't support the nodePools concept yet, according to what I'm seeing in the docs (I hope I'm not wrong there), ref:

if c.Image == "" {
c.Image = "public.ecr.aws/karpenter/controller:v0.31.3"
}
.
This brings me to believe that it's not supported in kOps right now, and thus, we might need to put in some effort to add this.

I don't mind taking a stab at this one, wdyt @hakman @rifelpet @olemarkus ?

@hakman
Copy link
Member

hakman commented Feb 6, 2024

I don't mind taking a stab at this one, wdyt @hakman @rifelpet @olemarkus ?

My impression is that, if we want to move Karpenter support to a newer version, we would need to move from providing the LaunchTemplates to doing everything via Karpenter objects.

{{ range $name, $spec := GetNodeInstanceGroups }}
{{ if eq $spec.Manager "Karpenter" }}
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: {{ $name }}
spec:
subnetSelector:
kops.k8s.io/instance-group/{{ $name }}: "*"
kubernetes.io/cluster/{{ ClusterName }}: "*"
launchTemplate: {{ $name }}.{{ ClusterName }}
---
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: {{ $name }}
spec:
consolidation:
enabled: true
{{ with $spec.Kubelet }}
{{ if or .MaxPods .SystemReserved .KubeReserved }}
kubeletConfiguration:
{{ if .MaxPods }}
maxPods: {{ .MaxPods }}
{{ end }}
{{ if .SystemReserved }}
systemReserved:
{{ range $key, $val := .SystemReserved}}
{{ $key }}: "{{ $val }}"
{{ end }}
{{ end }}
{{ if .KubeReserved }}
kubeReserved:
{{ range $key, $val := .KubeReserved}}
{{ $key }}: "{{ $val }}"
{{ end }}
{{ end }}
{{ end }}
{{ end }}
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"]
- key: kubernetes.io/arch
operator: In
values: ["{{ ArchitectureOfAMI $spec.Image }}"]
- key: "node.kubernetes.io/instance-type"
operator: In
values:
{{ range $type := KarpenterInstanceTypes $spec }}
- {{ $type }}
{{ end }}
{{ with $spec.Taints }}
taints:
{{ range $taintString := $spec.Taints }}
{{ $taint := ParseTaint $taintString }}
- key: {{ $taint.key }}
effect: {{ $taint.effect }}
{{ if $taint.value }}
value: "{{ $taint.value }}"
{{ end }}
{{ end }}
{{ end }}
{{ if $.ExternalCloudControllerManager }}
startupTaints:
- key: node.cloudprovider.kubernetes.io/uninitialized
effect: NoSchedule
{{ end }}
{{ with $spec.NodeLabels }}
labels:
{{ range $key, $value := . }}
{{ $key }}: "{{ $value }}"
{{ end }}
{{ end }}
providerRef:
name: {{ $name }}
{{ end }}
{{ end }}

@moshevayner
Copy link
Member

My impression is that, if we want to move Karpenter support to a newer version, we would need to move from providing the LaunchTemplates to doing everything via Karpenter objects.

Yeah, that makes sense to me.
So, would that be (theoretically) a somewhat similar process to any other cloudup add-on such as aws-cni, in which we'll update the template (and potentially supporting resources such as template functions etc.) according to the vendor chart?

@hakman
Copy link
Member

hakman commented Feb 6, 2024

Yes. The good part is that we have a Karpenter e2e test, so should be easy to test via WIP PR.

@moshevayner
Copy link
Member

Sounds good! I'll give that a try. Thanks!

/assign

@teocns
Copy link

teocns commented Mar 16, 2024

From my understanding it's unlikely possible but doesn't hurt to ask if there is any workaround for getting upstream Karpenter to manage current kOps's release InstanceGroups?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants