Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation on kubernetes resources and functions for each provider to calculate resources #792

Closed
2 tasks done
costrouc opened this issue Aug 25, 2021 · 15 comments
Closed
2 tasks done
Assignees
Labels
area: documentation 📖 Improvements or additions to documentation needs: discussion 💬 Needs discussion with the rest of the team status: stale 🥖 Not up to date with the default branch - needs update type: enhancement 💅🏼 New feature or request

Comments

@costrouc
Copy link
Member

costrouc commented Aug 25, 2021

Description

A common problem that we have had with QHub deployments is getting the memory and cpu right for the jupyterhub and dask gateway profiles. In an effort to further automate this process we would like qhub to be aware of cpu/memory usage of nodes in given kubernetes deployments. Example:

Success Criterial

def allocatable_memory(total_memory):
     return 0.75 * total_memory

def allocatable_cpu(total_cpu):
      return 0.9 * total_cpu
@costrouc costrouc added type: enhancement 💅🏼 New feature or request area: documentation 📖 Improvements or additions to documentation labels Aug 25, 2021
@tylerpotts
Copy link
Contributor

@costrouc
Copy link
Member Author

Also saw that kubectl describe node <nodename> | grep Allocatable -B 7 -A 6 gave some results for checking

@viniciusdc
Copy link
Contributor

viniciusdc commented Aug 31, 2021

I've tested it by comparing the output from thekubectl command shown above within the results for the gcp allocation formula... The difference is not too much (0.6 VCPU or 200/800 Mb. I will compare and test using one gcp deployment to retrieve for info and maybe adapt the formula provided by Google.

for GCP Allocatable resources are calculated in the following way:

ALLOCATABLE = CAPACITY - RESERVED - EVICTION-THRESHOLD

For memory resources, GKE reserves the following:

  • 255 MiB of memory for machines with less than 1 GiB of memory
  • 25% of the first 4 GiB of memory
  • 20% of the next 4 GiB of memory (up to 8 GiB)
  • 10% of the next 8 GiB of memory (up to 16 GiB)
  • 6% of the next 112 GiB of memory (up to 128 GiB)
  • 2% of any memory above 128 GiB

For CPU resources, GKE reserves the following:

  • 6% of the first core
  • 1% of the next core (up to 2 cores)
  • 0.5% of the next 2 cores (up to 4 cores)
  • 0.25% of any cores above 4 cores

for instance, for QHub general (on e2-standard-4) we have from the above command:

attachable-volumes-gce-pd:  127
  cpu:                        4
  ephemeral-storage:          98868448Ki
  hugepages-1Gi:              0
  hugepages-2Mi:              0
  memory:                     16399088Ki
  pods:                       110
Allocatable:
  attachable-volumes-gce-pd:  127
  cpu:                        3920m
  ephemeral-storage:          47093746742
  hugepages-1Gi:              0
  hugepages-2Mi:              0
  memory:                     13610736Ki

where by the gcp provided calculation, based on e2-standard-4 : 4 vCPUs | 16 Gib,

  • Allocable Memmory = 16 - 2,6 - 0.2* = 13,2
  • Allocable CPU = 4 -0.08 - 0.1* = 3.92

*The kubelet reserves an extra 100M of CPU and 100MB of memory for the Operating System and 100MB for the eviction threshold.

I also started a doc for resource allocation from the different providers, soon I will link a PR for that.

@costrouc
Copy link
Member Author

The difference is not too much (0.6 VCPU or 200/800 Mb)

This is still quite a bit. Do you have a document that you are compiling that shows what you have measured?

@tylerpotts
Copy link
Contributor

tylerpotts commented Aug 31, 2021

Note that Google may be using Mebibytes: https://simple.wikipedia.org/wiki/Mebibyte

It seems that they are based on kubernetes docs: https://cloud.google.com/marketplace/docs/partners/kubernetes/select-pricing

@viniciusdc
Copy link
Contributor

viniciusdc commented Aug 31, 2021

@costrouc
Copy link
Member Author

costrouc commented Sep 7, 2021

Looks like the formulas check out and get nearly exactly right.

# kubernetes units
# E,P,T,G,M,K   10^3, 10^6, ... bytes
# Ei, Pi, Ti, Gi, Mi, Ki = 2^10, 2^20, ... bytes

# https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#node_allocatable

import math

def available_cpu(cpu : int):
    # 6% of the first core
    reserved = 0.06
    if cpu > 1:
       reserved = reserved + 0.01
    if cpu > 2:
       reserved = reserved + 0.005
    if cpu > 3:
       reserved = reserved + 0.005
    if cpu > 4:
       reserved = reserved + 0.0025 * (cpu - 4)
    return cpu - reserved

def available_memory(memory : int):
    # memory in bytes
    Gi = 2**30
    Mi = 2**20
    Ki = 2**10

    eviction_threshold = 100*Mi

    if memory < Gi:
        return memory - (255 * Mi)

    reserved = min(memory, 4*Gi) * 0.25
    if memory > 4*Gi:
        reserved = reserved + min(memory - 4*Gi, 4*Gi) * 0.20
    if memory > 8*Gi:
        reserved = reserved + min(memory - 8*Gi, 8*Gi) * 0.10
    if memory > 16*Gi:
        reserved = reserved + min(memory - 16*Gi, 112*Gi) * 0.06
    if memory > 128*Gi:
        reserved = reserved + (memory - 128*Gi) * 0.02

    return memory - reserved - eviction_threshold

print(available_cpu(4))
print(available_memory(16399092 * 1024) / 1024)

print(available_cpu(8))
print(available_memory(32888740 * 1024) / 1024)

@costrouc
Copy link
Member Author

costrouc commented Sep 8, 2021

Incoming long description of how we can estimate the allocatable memory and cpu for a given instance size. There are two parts to this problem.

  1. When you provision say a 2 Gi x 1 cpu instance only a fraction of the 2 Gi is actually usable since the linux kernel consumes some memory along with devices that are connected. This is a non-negligible amount. Here is a table with my findings. This problem will exist on every cloud provider. This was checked by provisioning a vm on each cloud provider of different sizes and checking cat /proc/meminfo | grep MemTotal
| Kb Available | Physical Storage | place |
|--------------+------------------+-------|
|     32898016 |               32 | pc    |
|     32823188 |               32 | pc    |
|    263856156 |              256 | pc    |
|     65480752 |               64 | pc    |
|    131716428 |              128 | pc    |
|    131947532 |              128 | pc    |
|    131838128 |              128 | pc    |
|     16269920 |               16 | pc    |
|     16296824 |               16 | pc    |
|     16254080 |               16 | pc    |
|     16399092 |               16 | gcp   |
|     32888740 |               32 | gcp   |
|      1004852 |                1 | do    |
|      2035508 |                2 | do    |
|      4030664 |                4 | do    |
|      8152848 |                8 | do    |
|    263945852 |              256 | do    |
|      4040796 |                4 | gcp   |
|     65969820 |               64 | gcp   |
|      1009616 |                1 | gcp   |
|      2041808 |                2 | gcp   |
|      8152988 |                8 | azure |
|     32887408 |               32 | azure |
|      8166288 |                8 | aws   |
|     61811360 |               60 | aws   |
|    193680648 |              192 | aws   |
|     15807620 |               16 | aws   |
|     32525428 |               32 | aws   |
|     32938904 |               32 | aws   |

When I did a linear fit I got 0.9833x - 1.205e8 where x is the physical number of bytes of the machine and the result is the available memory.

  1. Next cloud providers with their kubernetes implementations reserve a set amount of ram and cpu. Many example are seen in this issue. The formula for calculation on gpu is above and quite accurate within 10Mi.

With this approach we should reliably get the available memory within 150Mi. There are a few outliers. The OS/distro does seem to have a significant impact on the available memory. This was especially clear on aws ubuntu.

@costrouc
Copy link
Member Author

costrouc commented Sep 8, 2021

Without any tweaking this got the available kubernetes memory within 20Mi. So I think a 200Mi fudge factor would be more than enough.

# kubernetes units
# E,P,T,G,M,K   10^3, 10^6, ... bytes
# Ei, Pi, Ti, Gi, Mi, Ki = 2^10, 2^20, ... bytes

# https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#node_allocatable

import math

def available_cpu(cpu : int):
    # 6% of the first core
    reserved = 0.06
    if cpu > 1:
       reserved = reserved + 0.01
    if cpu > 2:
       reserved = reserved + 0.005
    if cpu > 3:
       reserved = reserved + 0.005
    if cpu > 4:
       reserved = reserved + 0.0025 * (cpu - 4)
    return cpu - reserved

def available_memory(memory : int):
    # memory in bytes
    Gi = 2**30
    Mi = 2**20
    Ki = 2**10

    # using linux kernel requirements
    memory = memory * 0.9833 - 1.205e8

    eviction_threshold = 100*Mi

    if memory < Gi:
        return memory - (255 * Mi)

    reserved = min(memory, 4*Gi) * 0.25
    if memory > 4*Gi:
        reserved = reserved + min(memory - 4*Gi, 4*Gi) * 0.20
    if memory > 8*Gi:
        reserved = reserved + min(memory - 8*Gi, 8*Gi) * 0.10
    if memory > 16*Gi:
        reserved = reserved + min(memory - 16*Gi, 112*Gi) * 0.06
    if memory > 128*Gi:
        reserved = reserved + (memory - 128*Gi) * 0.02

    return memory - reserved - eviction_threshold

print(available_cpu(4))
print(16 * 2**30 / 1024 - 16399092)
print(available_memory(16 * 2**30) / 1024 - 13610740)

print(available_cpu(8))
print(32 * 2**30 / 1024 - 32888740)
print(available_memory(32 * 2**30) / 1024 - 29093796)

# so within the 20Mi for the estimate

@viniciusdc
Copy link
Contributor

viniciusdc commented Sep 8, 2021

Without any tweaking this got the available kubernetes memory within 20Mi. So I think a 200Mi fudge factor would be more than enough.

# kubernetes units
# E,P,T,G,M,K   10^3, 10^6, ... bytes
# Ei, Pi, Ti, Gi, Mi, Ki = 2^10, 2^20, ... bytes

# https://cloud.google.com/kubernetes-engine/docs/concepts/cluster-architecture#node_allocatable

import math

def available_cpu(cpu : int):
    # 6% of the first core
    reserved = 0.06
    if cpu > 1:
       reserved = reserved + 0.01
    if cpu > 2:
       reserved = reserved + 0.005
    if cpu > 3:
       reserved = reserved + 0.005
    if cpu > 4:
       reserved = reserved + 0.0025 * (cpu - 4)
    return cpu - reserved

def available_memory(memory : int):
    # memory in bytes
    Gi = 2**30
    Mi = 2**20
    Ki = 2**10

    # using linux kernel requirements
    memory = memory * 0.9833 - 1.205e8

    eviction_threshold = 100*Mi

    if memory < Gi:
        return memory - (255 * Mi)

    reserved = min(memory, 4*Gi) * 0.25
    if memory > 4*Gi:
        reserved = reserved + min(memory - 4*Gi, 4*Gi) * 0.20
    if memory > 8*Gi:
        reserved = reserved + min(memory - 8*Gi, 8*Gi) * 0.10
    if memory > 16*Gi:
        reserved = reserved + min(memory - 16*Gi, 112*Gi) * 0.06
    if memory > 128*Gi:
        reserved = reserved + (memory - 128*Gi) * 0.02

    return memory - reserved - eviction_threshold

print(available_cpu(4))
print(16 * 2**30 / 1024 - 16399092)
print(available_memory(16 * 2**30) / 1024 - 13610740)

print(available_cpu(8))
print(32 * 2**30 / 1024 - 32888740)
print(available_memory(32 * 2**30) / 1024 - 29093796)

# so within the 20Mi for the estimate

Amazing! Thank you so much for this data as well as the python template. Have you tested a QHub deployment with those resources specs? May I implement this as well as the yaml file modifications? I would suggest to change the jupyterlab profiles into something like this:

- display_name: Small Instance
    description: Stable environment with 1 cpu / 4 GB ram
    machine_type: e2-medium
    kubespawner_override:
      image: quansight/qhub-jupyterlab:v0.3.12

then during the rendering step we could include those limits into the qhub-config file

@github-actions
Copy link

github-actions bot commented Nov 8, 2021

This issue has been automatically marked as stale because there was no recent activity in 60 days. Remove the stale label or add a comment, otherwise, this issue will automatically be closed in 7 days if no further activity occurs.

@github-actions github-actions bot added the status: stale 🥖 Not up to date with the default branch - needs update label Nov 8, 2021
@github-actions
Copy link

This issue was closed because it has been stalled for 7 days with no activity.

@dharhas dharhas reopened this Nov 16, 2021
@iameskild iameskild removed the status: stale 🥖 Not up to date with the default branch - needs update label Nov 17, 2021
@iameskild iameskild added status: stale 🥖 Not up to date with the default branch - needs update needs: discussion 💬 Needs discussion with the rest of the team labels Apr 14, 2022
@kcpevey
Copy link
Contributor

kcpevey commented Jun 7, 2022

@costrouc @viniciusdc can you update this issue and include additional action items?

Also, should we close #796 in favor of this issue?

@viniciusdc
Copy link
Contributor

@costrouc @viniciusdc can you update this issue and include additional action items?

Also, should we close #796 in favor of this issue?

Hi Kim, some context here, some clients and users related that it would be beneficial to qhub to:

  • have a tool to identify system mem/cpu usage, need to instantiate a determined profile. Right now we suppose some values that are `good enough to create each user/worker pod, but they don't reflect the total capabilities of the machine type and cluster node assigned to the profile.

So @costrouc and I worked on getting an automatic tool to generate the qhub profiles based on the machine type the user requests with the most efficient resource utilization. Chris came with the prototype within the above comment to address #795 (which was closed by the stale bot), and some docs should then be added to inform the user about this.

We have the prototype, but now:

  • we need to implement this by taking into consideration the new structure of qhub (post 0.4.0)
  • test and debug as each cloud provider has a different way to inquire machine type data (that's why only a few of the cloud providers would get this initially)
  • then test with an end-to-end deployment:
  • this comment sumirezes the most important information regarding this issue.

@kcpevey
Copy link
Contributor

kcpevey commented Jun 7, 2022

Closing in favor of #35

@kcpevey kcpevey closed this as completed Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: documentation 📖 Improvements or additions to documentation needs: discussion 💬 Needs discussion with the rest of the team status: stale 🥖 Not up to date with the default branch - needs update type: enhancement 💅🏼 New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants