
# Fair Scheduler — Simple Stakeholder Demo
This notebook shows a **simple, visual** demo of a **fair scheduler** using the idea of **DRF (Dominant Resource Fairness)**.

**What it does (in plain words):**
- We have a pool with a limited amount of **CPU** and **Memory**.
- We have a few **teams** using some of that CPU/Memory already.
- We will allocate the **next N executors** one-by-one to the team that currently has the **lowest dominant share** (their biggest share of CPU or Memory).
- You can change the numbers and re-run to see how fairness behaves.

**What you’ll see:**
1) A **table** of who got each executor (step-by-step).
2) A **table** of the **final allocation** per team.
3) A **bar chart** of each team’s **dominant share** (lower is fairer; they tend to equalise).
4) A **line chart** showing how each team’s **dominant share evolves** as we allocate more executors.



> **Notes:**  
> - Uses standard Python, `pandas`, and `matplotlib` only.  
> - No special setup required.  
> - One chart per plot (no subplots), and default colors.


In [None]:

from dataclasses import dataclass, field
from typing import List, Dict, Tuple
import pandas as pd
import matplotlib.pyplot as plt

@dataclass
class Pool:
    name: str
    total_cpu: float          # total vCPU in the pool
    total_mem_gb: float       # total GB RAM in the pool
    exec_cpu: float           # vCPU consumed per executor
    exec_mem_gb: float        # GB Memory consumed per executor

@dataclass
class UserState:
    name: str
    cpu: float                # current allocated CPU
    mem_gb: float             # current allocated Memory
    max_executors: int        # ceiling on number of additional executors
    executors: int = field(default=0)  # how many we add in this simulation

    def can_receive(self) -> bool:
        return self.executors < self.max_executors

    def allocate_one(self, pool: Pool):
        self.cpu += pool.exec_cpu
        self.mem_gb += pool.exec_mem_gb
        self.executors += 1

def dominant_share(user: UserState, pool: Pool) -> float:
    cpu_share = user.cpu / pool.total_cpu if pool.total_cpu else 0.0
    mem_share = user.mem_gb / pool.total_mem_gb if pool.total_mem_gb else 0.0
    return max(cpu_share, mem_share)

def drf_allocate(pool: Pool, users: List[UserState], extra_executors: int) -> Tuple[List[UserState], List[Dict], pd.DataFrame]:
    """Allocate 'extra_executors' using DRF (lowest dominant share wins each step)."""
    history = []
    # track dominant shares over time for the line chart
    timeline = {u.name: [dominant_share(u, pool)] for u in users}
    timeline_steps = [0]

    for step in range(1, extra_executors+1):
        eligible = [u for u in users if u.can_receive()]
        if not eligible:
            break

        shares = [(u, dominant_share(u, pool)) for u in eligible]
        shares.sort(key=lambda x: (round(x[1], 12), x[0].name))  # lowest share wins; break ties by name
        chosen, share_before = shares[0]

        chosen.allocate_one(pool)
        share_after = dominant_share(chosen, pool)

        history.append({
            "step": step,
            "allocated_to": chosen.name,
            "dominant_share_before": round(share_before, 4),
            "dominant_share_after": round(share_after, 4),
            "total_execs_for_user": chosen.executors
        })

        # record timeline (dominant share snapshot after each step)
        for u in users:
            timeline[u.name].append(dominant_share(u, pool))
        timeline_steps.append(step)

    timeline_df = pd.DataFrame(timeline, index=timeline_steps).rename_axis("step").reset_index()
    return users, history, timeline_df

def final_allocation_df(users: List[UserState], pool: Pool) -> pd.DataFrame:
    rows = []
    for u in users:
        rows.append({
            "team": u.name,
            "cpu_alloc": u.cpu,
            "mem_alloc_gb": u.mem_gb,
            "dominant_share": round(dominant_share(u, pool), 4),
            "executors_added": u.executors,
            "ceiling_execs": u.max_executors
        })
    return pd.DataFrame(rows).sort_values(by=["dominant_share","team"]).reset_index(drop=True)



## 1) Configure a simple scenario (tweak and re-run)
Change the values below to show a different scenario to stakeholders.


In [None]:

# Pool: total capacity and per-executor size
pool = Pool(
    name="Batch",
    total_cpu=200,       # vCPU in this pool
    total_mem_gb=800,    # GB RAM in this pool
    exec_cpu=2,          # vCPU used per executor
    exec_mem_gb=16       # GB used per executor
)

# Teams: starting allocations and ceilings
users = [
    UserState(name="TeamA_CPUheavy", cpu=60, mem_gb=120, max_executors=30),  # more CPU than MEM
    UserState(name="TeamB_MEMheavy", cpu=40, mem_gb=400, max_executors=30),  # more MEM than CPU
    UserState(name="TeamC_Light",    cpu=20, mem_gb= 40, max_executors=30),  # small/light
]

# How many new executors to distribute (one-by-one)
extra_executors = 15



## 2) Run the fair scheduler (DRF) and show results


In [None]:

users_after, history, timeline_df = drf_allocate(pool, users, extra_executors=extra_executors)
history_df = pd.DataFrame(history)
final_df = final_allocation_df(users_after, pool)

print("=== Who got each executor (step-by-step) ===")
display(history_df)

print("\n=== Final allocation per team ===")
display(final_df)

print("\n=== Dominant share over time (first few rows) ===")
display(timeline_df.head())



## 3) Visual — Final dominant share per team (bar chart)
Lower is fairer. With DRF, shares tend to equalise as we allocate more executors.


In [None]:

plt.figure(figsize=(6,4))
plt.bar(final_df["team"], final_df["dominant_share"])
plt.title("Final Dominant Share per Team")
plt.xlabel("Team")
plt.ylabel("Dominant Share (0.0–1.0)")
plt.xticks(rotation=20)
plt.tight_layout()
plt.show()



## 4) Visual — Dominant share over allocation steps (line chart)
This shows how DRF gives the next executor to whoever is currently farthest from their fair share.


In [None]:

plt.figure(figsize=(6,4))
for col in [c for c in timeline_df.columns if c != "step"]:
    plt.plot(timeline_df["step"], timeline_df[col], label=col)
plt.title("Dominant Share Over Time")
plt.xlabel("Allocation Step")
plt.ylabel("Dominant Share (0.0–1.0)")
plt.legend()
plt.tight_layout()
plt.show()
