# Simple Solution Architecture
**Goal:** Make DSE predictable, fair, and transparent — without late‑night babysitting.

**Scope:** This notebook explains the **Decision Layer** and two fairness strategies. It is designed to be readable by stakeholders and immediately useful for engineers.

**Decision Layer (ingress & decisions):**
1. **Submit Gateway** → standardise job requests into a JobSpec.
2. **Pre‑Flight (LITE)** → quick sample/size, sanity checks, coarse ETA bucket (no deep join/UDF inspection).
3. **Policy Engine** → QoS routing, fairness limits, scheduling rules; returns lane & timing.

**Fairness strategies (execution & isolation):**
- **Target solution (recommended):** Anti‑monopoly controls — executor ceilings, DRF fairness, per‑user concurrency caps, polite preemption, reservation windows & deadlines.
- **Static fairness fallback:** Two fixed resource pools (L/H) if we need a simpler interim.


## 1) Architecture at a glance
```
+-------------------+     +----------------------+     +--------------------+     +-------------------+
|   Submit Gateway  | --> |  Pre-Flight (LITE)  | --> |   Policy Engine    | --> |   Scheduler/Queues|
+-------------------+     +----------------------+     +--------------------+     +-------------------+
                                              |                                        |
                                              v                                        v
                                     +------------------+                      +--------------------+
                                     |  Observability   | <------------------- |  Execution (Spark) |
                                     +------------------+                      +--------------------+
```
- **Submit Gateway:** standardises a job into a JobSpec and validates permissions.
- **Pre‑Flight (LITE):** quick sample read, size estimate, coarse ETA bucket, basic checks.
- **Policy Engine:** applies QoS routing, fairness limits, and scheduling rules; exposes auto‑queue.
- **Scheduler:** enqueues runs (now or at a scheduled time), protecting interactive responsiveness.
- **Execution:** Spark with AQE, checkpoint + auto‑retry.
- **Observability:** live queues, resource usage, per‑job timeline, alerts.


## 2) Submit Gateway
**Purpose:** One intake for all runs; validate and normalise into a JobSpec. Enables **Run now** or **Schedule**.

**What users see (UI):**
- A single form (job type, inputs, date range, output path, priority, schedule option).
- Immediate receipt: *"Received. Tracking ID **JP-87213**. Running pre‑flight..."*

**JobSpec (YAML example):

In [None]:
jobspec_yaml = """
jobType: model_plan                # model_plan | evaluation | feature_build | dataset_extract
submittedBy: "user_id"
team: "Fraud DS"
inputs:
  - uri: "delta://fdr/payments/2023-01..2025-08"
    format: delta
output:
  uri: "delta://sandbox/features_v4"
params:
  planId: "fe8_v4"
  startDate: "2023-11-01"
  endDate: "2025-08-31"
priorityHint: P1                   # P0 | P1 | P2
qosHint: batch                     # interactive | batch | critical
deadline: "2025-09-15T17:00:00Z"   # optional finish-by
schedule:                           # run now or later
  mode: run_now                    # run_now | start_at
  startAt: null
codeRef:
  repo: "git@corp/dse-plans.git"
  commit: "9f2c1b8"
"""
print(jobspec_yaml)

## 3) Pre‑Flight Analyzer (LITE)
**Purpose:** Quick **size/ETA bucket** and **sanity checks** — *no deep plan inspection*.

**What it does:**
- Reads a **1–2% sample** to estimate input volume (rows/GB) and partition spread.
- Runs **basic checks** only (no joins/UDF parsing): path exists, schema matches, date range valid, warn if no partition filter.
- Derives **coarse ETA bucket** by job type + input size + historical class medians (e.g., `<1h`, `1–4h`, `>4h`).
- Recommends **pool** (Interactive vs Batch) and a **resource hint** (small/medium/large).

**What users see (UI):**
- **ETA bucket:** “Medium (≈ 1–4h)”
- **Input size:** “~220 GB across 680 partitions”
- **Checks:** “Schema OK; date range OK; *note: no partition filter detected*”
- **Buttons:** **Run now** (shows queue) • **Auto‑start 19:00**

**Pre‑Flight Report (JSON example):

In [None]:
preflight_report = {
  "trackingId": "JP-87213",
  "etaBucket": "1–4h",
  "inputSizeGB": 220,
  "partitions": 680,
  "recommendedPool": "batch",
  "resourceHint": "large",
  "basicChecks": [
    {"check": "path_exists", "status": "ok"},
    {"check": "schema", "status": "ok"},
    {"check": "date_range", "status": "ok"},
    {"check": "partition_filter_missing", "status": "warn"}
  ]
}
print(json.dumps(preflight_report, indent=2))

## 4) Policy Engine — QoS & Scheduling Rules
**Purpose:** Decide *where* and *when* to run; enforce fairness.

**What it does:**
- **QoS classification (rules):**
  - `model_plan | feature_build` → **Batch**
  - `notebook_cell | small_extract` → **Interactive**
- **Admission control (static):** if `resourceHint == large` or user marks **Batch**, route to Batch.
- **Fairness & anti‑monopoly:**
  - **Executor ceilings per job:** Interactive ≤ 10 execs; Batch ≤ 120 execs.
  - **Per‑user concurrency caps:** max **1** heavy job; per‑team caps configurable.
  - **DRF fairness** across CPU & memory.
- **Scheduling:**
  - **Interactive floor** (e.g., 20–30% cores 09:00–18:00).
  - **Auto‑queue / calendar start** – “Start at **19:00** automatically”.
  - **Reservation windows** for known spikes (e.g., Fri 15:00–18:00 evals).
  - **Deadlines** – earliest‑deadline‑first; reject if impossible.
- **Polite preemption (elastic pools):** if interactive p95 latency > 5s (for 2m), checkpoint & pause lowest‑priority Batch; resume later.

**Readable policy (YAML):

In [None]:
policy_yaml = """
routing:
  - if: "jobType in ['model_plan','feature_build'] or resourceHint == 'large'"
    action: "route_to: batch"
  - if: "jobType in ['notebook_cell','small_extract']"
    action: "route_to: interactive"

fairness:
  executorCeilingPerJob: {interactive: 10, batch: 120}
  concurrencyCaps:
    perUser: {heavyJobs: 1}
    perTeam: {heavyJobs: 4}
  algorithm: DRF

scheduling:
  interactiveFloor: {percent: 30, hours: "09:00-18:00"}
  reservationWindows:
    - {name: "Friday_evals", when: "FRI 15:00-18:00", pool: batch, capacityPercent: 25}
  deadlines: {mode: earliest-deadline-first, rejectIfImpossible: true}

preemption:
  trigger: "interactive.p95Latency > 5s for 120s"
  action: "checkpoint_and_pause_lowest_priority_batch"
"""
print(policy_yaml)

### Example: Decision flow demo (toy)
Given a `JobSpec` and `Pre‑Flight` output, choose lane & start‑time options. This is illustrative — not production logic.


In [None]:
from datetime import datetime

def decide_lane(jobspec, preflight):
    # Lane choice
    if jobspec.get('jobType') in ['model_plan','feature_build'] or preflight.get('resourceHint')=='large':
        lane = 'batch'
    else:
        lane = 'interactive'

    # Start options
    options = [
        {'mode':'run_now','note':f'queue varies ({lane} lane)'},
        {'mode':'start_at','startAt':'19:00','note':'off-peak auto-start'}
    ]

    # Simple class-level ETA note
    eta_note = 'Typical Batch duration: 2–6h' if lane=='batch' else 'Typical Interactive duration: < 15m'

    return {
        'lane': lane,
        'etaNote': eta_note,
        'options': options
    }

# Demo with the examples above
jobspec = {
  'jobType':'model_plan', 'submittedBy':'user_id'
}
preflight = preflight_report
print(json.dumps(decide_lane(jobspec, preflight), indent=2))

## 5) Fairness Strategies

### 5.1 Target Solution — Anti‑monopoly Controls (recommended)
- **Executor ceilings per job** (e.g., Interactive ≤ 10, Batch ≤ 120) — caps depth.
- **DRF fairness** (CPU & memory) — balances dominant resources.
- **Per‑user concurrency caps** — caps breadth (max 1 heavy job per user).
- **Polite preemption with checkpoints** — return borrowed capacity to Interactive without wasting Batch progress.
- **Reservation windows & deadlines** — predictable slots and finish‑by guarantees.

**Stakeholder outcome:** responsive notebooks; predictable finishes; no night babysitting.

### 5.2 Static Fairness Fallback — Two Pools (L/H)
- **Interactive (L) Pool:** fixed floor (e.g., 30%) for notebooks, small PySpark, quick reads.
- **Heavy (H) Pool:** for model plans/evals.
- **No elastic borrow/preemption:** simpler, but off‑hours capacity may sit idle.

**Stakeholder outcome:** simple to reason about; fewer moving parts; acceptable baseline if target solution is deferred.


## 6) Observability (User Ops Pane)
- **Live cluster view:** pool utilisation, queue depth, active jobs by user/team, fair‑share usage.
- **Per‑job timeline:** queued → running → checkpoint → completed, plus class‑level ETA.
- **Alerts:** actionable causes (OOM, permissions, timeouts) and runbook links.
- **Post‑run report:** wall time, shuffle bytes, hotspots, practical hints (e.g., “add date filter”).

## 7) Plain‑English Summary (for stakeholders)
- **Today’s pain:** jobs take hours→days, users block each other, and progress is opaque.
"
"- **Decision Layer:** a light brain that standardises jobs, sanity‑checks size, and applies fairness.
"
"- **Outcome:** predictable starts/finishes, protected interactive work, and the option to auto‑queue for evenings.
"
"- **If we can’t build the full target:** run the static two‑pool model as a simple, safe baseline.