Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: add documentation about resource requests #2226

Merged
merged 7 commits into from
May 28, 2021

Conversation

consideRatio
Copy link
Member

@consideRatio consideRatio commented May 24, 2021

This documents the current situation on how to set resource requests and closes #2085.

  • Is the location of this documentation section okay?
  • Are there something missing from the documentation of relevance to convey?
  • Is the language okay?

An action point list from #2085 made by @minrk, self-checked by me.

  • explain a bit of background about resource reservations and when they are useful
  • identify all components that might want resource requests
  • identify reasonable values for key pods (mainly hub and proxy) and ranges based on load (e.g. using mybinder.org as a reference point for high load)
    • I've provided reasonable recommendations for requests for a reliable configuration.
    • I've opted not to try provide a recommendation for a smaller deployment because it is a grey scale and I lack information about it.

Copy link
Member

@manics manics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I made a first pass over it.

doc/source/administrator/optimization.md Outdated Show resolved Hide resolved
doc/source/administrator/optimization.md Outdated Show resolved Hide resolved
Comment on lines 416 to 417
1. A container running out of memory is typically terminated and restarted as a
process is killed by a _Out Of Memory Killer_ (OOMKiller). When this happens
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the restart of the pod only apply if it's part of a daemonset/deployment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, not sure if I understand your question, but...

  1. A pod managed by a replicaset or statefulset could be recreated, for example if it is evicted from a node that is about to undergo maintenance of if another pod with higher priority evicts it to be able to be scheduled on a node.
  2. A container can restart independent of having the entire pod be recreated, which is in other words something that can happen with a standalone pod. This is what I'm referring to here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A pod can have a restartPolicy, and this is a description on how containers should behave. Should they restart on failure / completion etc?

https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy

doc/source/administrator/optimization.md Outdated Show resolved Hide resolved
doc/source/administrator/optimization.md Outdated Show resolved Hide resolved
doc/source/administrator/optimization.md Outdated Show resolved Hide resolved
@consideRatio consideRatio mentioned this pull request May 25, 2021
6 tasks
@consideRatio consideRatio added this to the 1.0.0 milestone May 25, 2021
@consideRatio
Copy link
Member Author

consideRatio commented May 25, 2021

@manics thanks for your feedback, I've tried addressing it now! What do you think?

Note that the unrelated CI failure is fixed in #2230.

@meeseeksmachine
Copy link

This pull request has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/suggested-cpu-memory-resource-requests-limits/9322/1

Copy link
Member

@minrk minrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @manics that this need not be a blocker for 1.0, but this is already great info as it is, and I'm happy for it to be merged right now and refined.

Tweaking individual numbers for best practices / reasonable limits is something that is always going to evolve over time, and what's here seems like a great start.

resource _limits_ close to or equal to the resource requests can be useful to
make the container operational performance more consistent.

1. Requested memory is reserved and unavailable for other containers on a node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did you find this info about memory requests being unavailable? I thought if you had two pods that each request 50% and limit 100% of a node's memory, they would each be able to allocate 100% of the node memory as long as it's not at the same time. But if this is accurate, if requests add up to 100%, no pod will be able to exceed its request at all.

My understanding has been that 'request' is really only an input to the scheduler, while 'limit' is the only input to the runtime (OOMKiller).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh oh hmmmmm, you are probably correct in your understanding.

I've not questioned this understanding much, I think it was formulated by myself based on how I figured it was unreasonable that a pod that have requested a set of memory would have to wait for memory to be freed up when it currently was used used by another container. At the same time, that process could be very quick and i bet the OOMKiller can take action before all memory is depleted as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think requests are a lot more basic than folks often expect - I think it's purely scheduler bookkeeping, and has no actual effect once pods are scheduled (except possibly in OOMKiller priority when it's time for something to be killed).

Maybe worth just deleting this point then. Then 👍 to merge.

Copy link
Member

@minrk minrk May 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested with k3d:

pods.yaml
apiVersion: v1
kind: Pod
metadata:
  name: memory-eater
spec:
  containers:
    - name: container
      image: python:3.9-alpine
      env:
        - name: PYTHONUNBUFFERED
          value: "1"
      command:
        - python3
        - -c
        - |
          import os
          import time
          MB = 1024 * 1024
          chunk_size = 100 * MB  # 100 MB
          chunks = []

          while True:
            chunk = bytearray(chunk_size)
            chunks.append(chunk)
            print(f"{len(chunks) * chunk_size // MB }MB")
            time.sleep(5)
      resources:
        requests:
          memory: 900Mi
        limits:
          memory: 1800Mi

---
apiVersion: v1
kind: Pod
metadata:
  name: memory-squatter
spec:
  containers:
    - name: container
      image: python:3.9-alpine
      env:
        - name: PYTHONUNBUFFERED
          value: "1"
      command:
        - sh
        - -c
        - while true; do sleep 30; done
      resources:
        requests:
          memory: 900Mi
        limits:
          memory: 1800Mi

on a node with 2Gi, and the memory eater is allowed to comfortably allocate almost all of the node's memory before being killed. The presence or absence of the memory squatter's request for 900M doesn't have any effect on how much the eater is able to allocate before being killed.

If there is a point to be made here, I think it's that oversubscribing CPU results in things being slower than they could be, but usually not catastrophic, while oversubscribing memory results in hangs and crashes. The same actually goes for Under-provisioning: low limits on CPU means things might be slow, while low limits on memory means things will keep getting killed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minrk thank you!!! This is an excellent summary!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I relied on this insight and made another pass

doc/source/administrator/optimization.md Show resolved Hide resolved
doc/source/administrator/optimization.md Outdated Show resolved Hide resolved
Co-authored-by: Min RK <benjaminrk@gmail.com>
@manics
Copy link
Member

manics commented May 26, 2021

How about take out any test we're uncertain about since it's better to say nothing than to give incorrect information, then merge? And refine later?

@consideRatio
Copy link
Member Author

consideRatio commented May 28, 2021

I made another pass. Let's make refinements in separate PRs and get this going!

In the pass i made:

  • Updated some descriptions to reflect new insights, correcting incorrect statements.
  • Updated resource request suggestions to be sensible assuming you want to fit the core pods onto a 2 CPU node, which i think is a reasonable aim for a z2jh deployment.
  • Added min/max ranges for the resources usage in various Berkeley deployments across 6 months

@consideRatio consideRatio merged commit e9bbbe7 into jupyterhub:main May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document resource requests for core components
4 participants