In [21]:
import math

# Jupyterhub Costs Calculation

This notebook can be used to calculate the costs associated with running Jupyterhub in several configurations.

You can modify the values below to adjust the end prices.



In [54]:
# How many students predicted per class
students_per_class = 200

# Cost in dollars per student that we'd like to stay below
target_cost_per_student = 50

# Months in a school term
term_months = 3

# Resource allocation multiplier per student
# 1 being 1cpu/4gb guarantee
res_alloc_mult = 1

# Azure vm costs per their calculator
azure_compute = {
    "vm_b4ms":   { "price": 121.18, "cpus": 4 },
    "vm_d4v4":   { "price": 140.16, "cpus": 4 },
    "aks_d2sv5": { "price": 70.08,  "cpus": 2 },
    "aks_b16ms": { "price": 486.18, "cpus": 16 },
}

# Azure misc components such as public IP, container registry, etc
azure_misc_monthly_cost = 100

# Calculate the number of cpus needed 
# to support the requested resource allocation
required_cpus = math.ceil(students_per_class * res_alloc_mult)
print(f"Required number of CPUs: {required_cpus}")

Required number of CPUs: 200


## Jupyterhub Cluster - Azure

This scenario is similar to how we currently have our jupyterhub cluster set up on-prem.

This configuration uses standard Virtual Machines that are always on and is *not scalable*. We can use with B-series systems (cheaper, less performant), or D4-series (enterprise performance). I've chosen the cheaper system, as these machines will not be seeing sustained heavy use.

There are also additional costs in azure for things like the public IP, container registry, etc, that are calculated below. 

In [55]:
# Select the performance size of each node
vm_size = "vm_b4ms"

# Grab the details for that size
vm_price = azure_compute[vm_size]["price"]
vm_cpus = azure_compute[vm_size]["cpus"]

# Calculate how many nodes we'll need to support the required cpus
num_nodes = math.ceil(required_cpus / vm_cpus)

# Then we simply calculate the monthly cost of all the nodes
azure_vm_cluster_monthly_cost = \
    (num_nodes * vm_price) + azure_misc_monthly_cost
print(f"Total Monthly Cost: ${azure_vm_cluster_monthly_cost:,.2f}")

# And finally the cost per student, including storage (~$4 per student)
storage_monthly_cost = 4
azure_vm_cost_per_student_per_term = \
    (azure_vm_cluster_monthly_cost / students_per_class + storage_monthly_cost) * term_months
print(f"Cost per student per term: ${azure_vm_cost_per_student_per_term:,.2f}")

Total Monthly Cost: $6,159.00
Cost per student per term: $104.39


## Jupyterhub Cluster - On-Prem

This is our current environment. Exactly the same as above, but our on-prem costs are estimated to be one-third of azure costs, and we don't charge for student storage.

In [56]:
# Simply compute a third of the azure price
on_prem_monthly_cost = azure_vm_cluster_monthly_cost * .3
print(f"Total Monthly Cost: ${on_prem_monthly_cost:,.2f}")

on_prem_cost_per_student_per_term = on_prem_monthly_cost / students_per_class * term_months
print(f"Cost per student per term: ${on_prem_cost_per_student_per_term:,.2f}")

Total Monthly Cost: $1,847.70
Cost per student per term: $27.72


## Jupyterhub Kubernetes - Azure AKS

This is a much more complicated environment, but may lead to the most cost savings if on-prem is to be avoided.

This cluster is configured using the [Zero to Jupyterhub](https://zero-to-jupyterhub.readthedocs.io/en/latest/) guide for hosting Jupyterhub on Kubernetes, specifically Azure Kubernetes Service (AKS).

Some benefits include:

- Much smaller footprint when not in use
- Autoscaling pods to reduce cluster size after class
- Creating a new cluster is much faster

Possible concerns:

- New-ish technology with no official support
- Managed very differently from our traditional toolset
- Staff proficiency in container management platforms


## Configuration

The AKS cluster is configured with three static systems that serve as the controller nodes, and a user node pool that scales with demand.

### Node sizing

- Controller nodes: D2s-v5
    - static cost, running 24x7, no scaling

- User pool: B16ms
    - dynamic cost, calculated below

We use huge B16ms systems (16cpu, 64gb) to minimize scale events that would result in wait times of over 10min for multiple events in series.

In [57]:
# Controller nodes
aks_controller_size = "aks_d2sv5"

aks_controller_price = azure_compute[aks_controller_size]["price"]
num_controllers = 3

# Monthly cost for the controller nodes
aks_controller_monthly_cost = aks_controller_price * num_controllers

### Availability

Due to using Kubernetes instead of fixed virtual machines, costs are calculated slightly differently, as we will be scaling the user pods up and down as needed. The calculations below will be a best-effort estimate based on the following schedule:

Full capacity: 

- Tue: 1.5hr lecture
- Thu: 1.5hr lecture
- Fri: 12hr labs

Half capacity the rest of the time, with more down-tuning available as we see fit.

In [58]:
# Details about class hours
hours_in_a_week = 168
class_sessions_per_week = 3
# Tue,Thu,Friday total hours
class_hours = 15

# For full capacity hours, we need to add an hour and a half
# to each day, because we plan to manually scale the cluster 
# prior to class to facilitate the load without triggering scale events
full_capacity_hours_per_week = class_hours + 1.5 * class_sessions_per_week
half_capacity_hours_per_week = hours_in_a_week - full_capacity_hours

### Calculating Costs

In [60]:
# First let's establish our base cost
# We already calculated the controller cost above, so we
# just add the cost for the other azure misc components
aks_base_monthly_cost = aks_controller_monthly_cost + azure_misc_monthly_cost

# Now let's calculate the hourly cost for the user pod systems
aks_userpod_size = "aks_b16ms"
aks_userpod_hourly_cost = azure_compute[aks_userpod_size]["price"] / 30 / 24

# Next we determine how many userpods we need at half and full capacity
aks_userpod_cpus = azure_compute[aks_userpod_size]["cpus"]
aks_userpod_fc_count = math.ceil(required_cpus / aks_userpod_cpus)
aks_userpod_hc_count = math.ceil(aks_userpod_fc_count / 2)

# Then we can estimate the cost for those systems
aks_userpod_monthly_cost = \
    full_capacity_hours_per_week * aks_userpod_fc_count * aks_userpod_hourly_cost + \
    half_capacity_hours_per_week * aks_userpod_hc_count * aks_userpod_hourly_cost

# Add that to the base price
aks_total_monthly_cost = aks_base_monthly_cost + aks_userpod_monthly_cost
print(f"Total monthly cost: ${aks_total_monthly_cost}")

aks_cost_per_student_per_term = \
    aks_total_monthly_cost / students_per_class * term_months
print(f"Cost per student per term: ${aks_cost_per_student_per_term:,.2f}")

Total monthly cost: $1183.33825
Cost per student per term: $17.75
