-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture
Isi Roca edited this page Jun 6, 2026
·
2 revisions
PUMA is built as a six-layer modular system designed for reproducibility, swappability between models and strategies, and clear separation of concerns between the inference path, the metrics path, and the storage path.
-
Preflight — hardware capability detection (CPU cores, RAM, GPU make and
VRAM), automatic selection of one of fifteen hardware profiles (five
baselines
cpu-litethroughgpu-high, plus ten Apple-Silicon variants), and pre-flight validation that the chosen model fits the chosen profile. - Runtime — the Ollama HTTP client with retry, timeout, and structured logging; a response cache keyed on (model, prompt, options) so repeated evaluations of the same instance never touch the GPU twice.
- Datasets — readers for the Jira Social Repository balanced 200-issue triage set, the TAWOS multi-project story-point estimation set, and the prioritization pairwise dataset. Each reader is deterministic and seeded.
-
Scenarios — the abstract
Scenarioclass plus three concrete implementations:TriageJiraScenario,EstimationTawosScenario, andPrioritizationJiraScenario. A scenario owns its dataset, parser, and ground-truth label. -
Adaptation — the prompting strategies:
zero_shot,few_shot_3,few_shot_5,few_shot_8,chain_of_thought,rcoif(Role/Context/Objective/ Instructions/Format), andcontextual_anchoring. New strategies plug in via a registry. - Metrics + Sustainability — seven metric families (Accuracy, Calibration, Efficiency, Stability, Robustness, Fairness, Sustainability) computed from the predictions table, plus a CodeCarbon emissions wrapper that records energy and CO₂ for every run.
- Default
--seed 42and--temperature 0. The samepuma runinvocation twice on the same hardware produces byte-identical predictions. - Deterministic Ollama invocations (
options.seed,options.temperatureset at every request). - Bi-temporal SQLite storage: every row records both the wall-clock time and the logical run version, so historical comparisons stay consistent even when the dataset is updated.
- The full run specification (scenario, model, strategy, instances, seed, hardware profile, PUMA version) is stored alongside the metrics so any result can be regenerated bit-for-bit.
Preflight ─► Runtime ─► Scenario ─► Adaptation ─► Metrics
(profile) (Ollama) (dataset + (prompt (7 families
parser) template) + CodeCarbon)
│
▼
Storage
(SQLite)
│
▼
Dashboard
(Streamlit)
Each layer is exercised end to end by the integration test suite, and each boundary is documented with explicit data contracts in the source.