Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow setting an environment variable to some fraction or multiple of a resource field #91514

Open
dobesv opened this issue May 28, 2020 · 38 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@dobesv
Copy link
Contributor

dobesv commented May 28, 2020

What would you like to be added:

It would be nice to pass a process a number that is a bit lower than the container memory limit so that its garbage collection or memory management can use that limit.

For example:

  • setting GOMEMLIMIT or setting NODEMEMLIMIT
  • setting NODE_OPTS=--max-old-space-size=$(NODEMEMLIMIT) would be useful.
  • setting JAVA_MEM_LIMIT and passing -Xmx$(JAVA_MEM_LIMIT)M

Currently you can use envFrom.resourceFieldRef to "copy" a resource request or limit to an environment variable, but you can only multiply the value by some power of 1000 or 1024 by specifying divisor.

Why is this needed:

This would make it possible to pass a memory limit into a process inside the container that is based on the container resource request or limit but less.

For example, when specifying node's --max-old-space-size or Java's -Xmx parameter, setting it equal to the memory limit of the container will be a problem, because the process actually uses more memory than what you provided as a parameter.

Potential Solutions

  • If we could specify a divisor like 1.5Mi then it would result in a number of megabytes equal to roughly 2 thirds of the available memory, leaving some extra for the other heaps.
  • Add another field like scale or multiplier to enfFrom.resourceFieldRef which is multiplied by the input value to get the env var value
  • Allow arithmetic in the existing env var expansion system - where currently you can only do $(SOME_VAR) in the command line, additionally support something like $(SOME_VAR - 100)
  • Allow using a go template or some simple templating/expression language to actually fill out the env var with prefix and suffix, like --max-old-space-size={{ value * 800 }}M

Workarounds

  • Create an admission webhook that does one of the above without changing the kubernetes core features
  • If the container uses bash, you could run a command like bash -c 'NODE_OPTIONS=---max-old-space-size=$(($MEMORY_LIMIT-500)) ... or create your own entrypoint wrapper (maybe a shell script) that calculates the env vars before running the actual command
@dobesv dobesv added the kind/feature Categorizes issue or PR as related to a new feature. label May 28, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label May 28, 2020
@dobesv
Copy link
Contributor Author

dobesv commented May 28, 2020

/sig apps

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels May 28, 2020
@kow3ns
Copy link
Member

kow3ns commented Jun 1, 2020

/remove-sig apps

@k8s-ci-robot k8s-ci-robot added needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. and removed sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Jun 1, 2020
@kow3ns
Copy link
Member

kow3ns commented Jun 1, 2020

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jun 1, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 30, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 29, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@JohnRusk
Copy link

/reopen

@k8s-ci-robot
Copy link
Contributor

@JohnRusk: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@JohnRusk
Copy link

JohnRusk commented Apr 28, 2023

@dobesv I think there's an even stronger reason to support this change, now that GOMEMLIMIT is common (i.e. lots of Go Apps are now built on GO 1.19 or later). I left a comment here https://kubernetes.slack.com/archives/C0EG7JC6T/p1682654871448859 about it.

Mods: For more info on GOMEMLIMIT and why this change is needed, see the end of this page: https://billglover.me/2022/09/14/use-the-kubernetes-downwards-api-to-set-gomemlimit/

@lavalamp
Copy link
Member

/remove-sig api-machinery
/sig apps

@k8s-ci-robot k8s-ci-robot added sig/apps Categorizes an issue or PR as relevant to SIG Apps. and removed sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Apr 28, 2023
@JohnRusk
Copy link

Hi SIG Apps. Lavalamp pointed out a good point, which is that changing the sematics of an existing field may be undesirable, e.g. may make version rollbacks promblematic. Any thoughts on whether this should be solved by adding a new field, maybe scale which wouhld be a floating point number?

@dobesv
Copy link
Contributor Author

dobesv commented May 1, 2023

/reopen

@k8s-ci-robot k8s-ci-robot reopened this May 1, 2023
@k8s-ci-robot
Copy link
Contributor

@dobesv: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label May 1, 2023
@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 2, 2023
@sylr
Copy link
Contributor

sylr commented Dec 15, 2023

/reopen

@k8s-ci-robot
Copy link
Contributor

@sylr: You can't reopen an issue/PR unless you authored it or you are a collaborator.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sylr
Copy link
Contributor

sylr commented Dec 15, 2023

@dobesv I think there's an even stronger reason to support this change, now that GOMEMLIMIT is common (i.e. lots of Go Apps are now built on GO 1.19 or later). I left a comment here https://kubernetes.slack.com/archives/C0EG7JC6T/p1682654871448859 about it.

Mods: For more info on GOMEMLIMIT and why this change is needed, see the end of this page: https://billglover.me/2022/09/14/use-the-kubernetes-downwards-api-to-set-gomemlimit/

Exactly this.

We need to be able to do something like this:

- name: GOMEMLIMIT
  valueFrom:
    resourceFieldRef:
      resource: limits.memory
      divisor: "1100m"

@sftim
Copy link
Contributor

sftim commented Dec 15, 2023

/reopen
/priority backlog
/sig node
/remove-lifecycle rotten
/lifecycle stale

How about using CEL to perform the transformation? Or a limited subset of CEL.

@k8s-ci-robot k8s-ci-robot added priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Dec 15, 2023
@k8s-ci-robot
Copy link
Contributor

@sftim: Reopened this issue.

In response to this:

/reopen
/priority backlog
/sig node
/remove-lifecycle rotten
/lifecycle stale

How about using CEL to perform the transformation? Or a limited subset of CEL.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Dec 15, 2023
@k8s-ci-robot k8s-ci-robot added lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Dec 15, 2023
@sftim
Copy link
Contributor

sftim commented Dec 15, 2023

/remove-sig apps

@k8s-ci-robot k8s-ci-robot removed the sig/apps Categorizes an issue or PR as relevant to SIG Apps. label Dec 15, 2023
@2rs2ts
Copy link
Contributor

2rs2ts commented Jan 8, 2024

I see most of the conversation here has been about memory, but I would like to ask that all resource fields be available for this kind of tweaking (in case y'all were not already thinking that.) For example, CPU throttling issues with languages that don't really see the container's cpu limit and/or just behave poorly when the limits are set. Being able to set GOMAXPROCS without having to rely on external templating logic, or on an entrypoint script to compute what the value should be, would be a huge boon for my company and probably a lot of others.

Another use case I can see (although I don't think people should write cloud-native code this way) is apps that log to multiple files inside the container's emphemeral storage; being able to set app log rotation parameters based on available disk without an entrypoint script or external templating logic would probably be greatly appreciated by people working with code that does that.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 7, 2024
@2rs2ts
Copy link
Contributor

2rs2ts commented Feb 20, 2024

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 20, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 20, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 19, 2024
@frittentheke
Copy link

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 27, 2024
@dataviruset
Copy link

If this helps anyone: this is what I did in Helm to work-around the missing native functionality in Kubernetes and use 90% of the memory limit configured in values.yaml as the GOMEMLIMIT. Just change mulf $valueMi 0.9 to something other than 0.9 if you want a different percentage. Please note that it only supports Gi and Mi units.

{{- define "goMemLimitEnv" -}}
{{- $memory := .Values.resources.limits.memory -}}
{{- if $memory -}}
- name: GOMEMLIMIT
  {{- $value := regexFind "^\\d*\\.?\\d+" $memory | float64 -}}
  {{- $unit := regexFind "[A-Za-z]+" $memory -}}
  {{- $valueMi := 0.0 -}}
  {{- if eq $unit "Gi" -}}
    {{- $valueMi = mulf $value 1024 -}}
  {{- else if eq $unit "Mi" -}}
    {{- $valueMi = $value -}}
  {{- end -}}
  {{- $percentageValue := int (mulf $valueMi 0.9) }}
  value: {{ printf "%dMiB" $percentageValue -}}
{{- end -}}
{{- end -}}
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: example
spec:
  serviceName: example
  template:
    spec:
      containers:
        - name: example
          image: alpine:latest
          imagePullPolicy: IfNotPresent
          resources:
            requests:
              cpu: 100m
              memory: "200Mi"
            limits:
              cpu: 1
              memory: "200Mi"
          env:
            {{- include "goMemLimitEnv" . | nindent 12 }}
            # Will be set to "180MiB"

@frittentheke
Copy link

frittentheke commented Jul 5, 2024

I see most of the conversation here has been about memory, but I would like to ask that all resource fields be available for this kind of tweaking (in case y'all were not already thinking that.) For example, CPU throttling issues with languages that don't really see the container's cpu limit and/or just behave poorly when the limits are set. Being able to set GOMAXPROCS without having to rely on external templating logic, or on an entrypoint script to compute what the value should be, would be a huge boon for my company and probably a lot of others.

Another use case I can see (although I don't think people should write cloud-native code this way) is apps that log to multiple files inside the container's emphemeral storage; being able to set app log rotation parameters based on available disk without an entrypoint script or external templating logic would probably be greatly appreciated by people working with code that does that.

I second the call for generalization of this feature. Isn't there enough templating (gotemplate FTW?) capability that could be leveraged to allow for templating some env variables? This would also align nicely with Helm and what people do about resources there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/backlog Higher priority than priority/awaiting-more-evidence. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
Status: In Progress
Development

No branches or pull requests