-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation on kubernetes resources and functions for each provider to calculate resources #792
Comments
Also saw that |
I've tested it by comparing the output from the for GCP Allocatable resources are calculated in the following way:
For memory resources, GKE reserves the following:
For CPU resources, GKE reserves the following:
for instance, for QHub general (on
where by the gcp provided calculation, based on
*The kubelet reserves an extra 100M of CPU and 100MB of memory for the Operating System and 100MB for the eviction threshold. I also started a doc for resource allocation from the different providers, soon I will link a PR for that. |
This is still quite a bit. Do you have a document that you are compiling that shows what you have measured? |
Note that Google may be using Mebibytes: https://simple.wikipedia.org/wiki/Mebibyte It seems that they are based on kubernetes docs: https://cloud.google.com/marketplace/docs/partners/kubernetes/select-pricing |
Allocatable local ephemeral storage resources https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#local_ephemeral_storage Eviction threshold for GCP: |
Looks like the formulas check out and get nearly exactly right. # kubernetes units
# E,P,T,G,M,K 10^3, 10^6, ... bytes
# Ei, Pi, Ti, Gi, Mi, Ki = 2^10, 2^20, ... bytes
# https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#node_allocatable
import math
def available_cpu(cpu : int):
# 6% of the first core
reserved = 0.06
if cpu > 1:
reserved = reserved + 0.01
if cpu > 2:
reserved = reserved + 0.005
if cpu > 3:
reserved = reserved + 0.005
if cpu > 4:
reserved = reserved + 0.0025 * (cpu - 4)
return cpu - reserved
def available_memory(memory : int):
# memory in bytes
Gi = 2**30
Mi = 2**20
Ki = 2**10
eviction_threshold = 100*Mi
if memory < Gi:
return memory - (255 * Mi)
reserved = min(memory, 4*Gi) * 0.25
if memory > 4*Gi:
reserved = reserved + min(memory - 4*Gi, 4*Gi) * 0.20
if memory > 8*Gi:
reserved = reserved + min(memory - 8*Gi, 8*Gi) * 0.10
if memory > 16*Gi:
reserved = reserved + min(memory - 16*Gi, 112*Gi) * 0.06
if memory > 128*Gi:
reserved = reserved + (memory - 128*Gi) * 0.02
return memory - reserved - eviction_threshold
print(available_cpu(4))
print(available_memory(16399092 * 1024) / 1024)
print(available_cpu(8))
print(available_memory(32888740 * 1024) / 1024) |
Incoming long description of how we can estimate the allocatable memory and cpu for a given instance size. There are two parts to this problem.
When I did a linear fit I got
With this approach we should reliably get the available memory within 150Mi. There are a few outliers. The OS/distro does seem to have a significant impact on the available memory. This was especially clear on aws ubuntu. |
Without any tweaking this got the available kubernetes memory within 20Mi. So I think a 200Mi fudge factor would be more than enough. # kubernetes units
# E,P,T,G,M,K 10^3, 10^6, ... bytes
# Ei, Pi, Ti, Gi, Mi, Ki = 2^10, 2^20, ... bytes
# https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#node_allocatable
import math
def available_cpu(cpu : int):
# 6% of the first core
reserved = 0.06
if cpu > 1:
reserved = reserved + 0.01
if cpu > 2:
reserved = reserved + 0.005
if cpu > 3:
reserved = reserved + 0.005
if cpu > 4:
reserved = reserved + 0.0025 * (cpu - 4)
return cpu - reserved
def available_memory(memory : int):
# memory in bytes
Gi = 2**30
Mi = 2**20
Ki = 2**10
# using linux kernel requirements
memory = memory * 0.9833 - 1.205e8
eviction_threshold = 100*Mi
if memory < Gi:
return memory - (255 * Mi)
reserved = min(memory, 4*Gi) * 0.25
if memory > 4*Gi:
reserved = reserved + min(memory - 4*Gi, 4*Gi) * 0.20
if memory > 8*Gi:
reserved = reserved + min(memory - 8*Gi, 8*Gi) * 0.10
if memory > 16*Gi:
reserved = reserved + min(memory - 16*Gi, 112*Gi) * 0.06
if memory > 128*Gi:
reserved = reserved + (memory - 128*Gi) * 0.02
return memory - reserved - eviction_threshold
print(available_cpu(4))
print(16 * 2**30 / 1024 - 16399092)
print(available_memory(16 * 2**30) / 1024 - 13610740)
print(available_cpu(8))
print(32 * 2**30 / 1024 - 32888740)
print(available_memory(32 * 2**30) / 1024 - 29093796)
# so within the 20Mi for the estimate |
Amazing! Thank you so much for this data as well as the python template. Have you tested a QHub deployment with those resources specs? May I implement this as well as the yaml file modifications? I would suggest to change the jupyterlab profiles into something like this: - display_name: Small Instance
description: Stable environment with 1 cpu / 4 GB ram
machine_type: e2-medium
kubespawner_override:
image: quansight/qhub-jupyterlab:v0.3.12 then during the rendering step we could include those limits into the qhub-config file |
This issue has been automatically marked as stale because there was no recent activity in 60 days. Remove the stale label or add a comment, otherwise, this issue will automatically be closed in 7 days if no further activity occurs. |
This issue was closed because it has been stalled for 7 days with no activity. |
@costrouc @viniciusdc can you update this issue and include additional action items? Also, should we close #796 in favor of this issue? |
Hi Kim, some context here, some clients and users related that it would be beneficial to qhub to:
So @costrouc and I worked on getting an automatic tool to generate the qhub profiles based on the machine type the user requests with the most efficient resource utilization. Chris came with the prototype within the above comment to address #795 (which was closed by the stale bot), and some docs should then be added to inform the user about this. We have the prototype, but now:
|
Closing in favor of #35 |
Description
A common problem that we have had with QHub deployments is getting the memory and cpu right for the jupyterhub and dask gateway profiles. In an effort to further automate this process we would like qhub to be aware of cpu/memory usage of nodes in given kubernetes deployments. Example:
Success Criterial
The text was updated successfully, but these errors were encountered: