Ask your AWS bill questions. In English.
A natural-language agent for cloud cost attribution, backed by a Graph Neural Network
that infers resource ownership from CloudTrail, IAM, and cost behaviour.
▶ Live demo · 87% on real AWS · The audit story · GitHub
| 13 / 15 Real-AWS attribution accuracy (87%) |
+53% Lift over best baseline (k-fold CV) |
2.6M VMs in audited Microsoft dataset |
3 clouds AWS · Azure · GCP collectors |
| Cloud | Live scan | Methodology validated | Install |
|---|---|---|---|
| AWS | ✅ production-tested (13/15 = 87% on real AWS) | ✅ | pip install costdna |
| Azure | ⚠ implemented per Azure SDK patterns, untested against live subscription | ✅ via Microsoft Public Dataset audit (2.6M VMs) | pip install 'costdna[azure]' |
| GCP | ⚠ implemented per Google Cloud SDK patterns, untested against live project | — | pip install 'costdna[gcp]' |
The model + features + agent are cloud-agnostic — only the collector layer is provider-specific. AWS calls cloudtrail:LookupEvents; Azure calls monitor.activity_logs.list; GCP calls cloud_logging.list_entries. All three return identical-shape DataFrames downstream, so the rest of the pipeline doesn't know which cloud the data came from.
costdna scan --cloud aws --aws-profile prod # production-tested path
costdna scan --cloud azure --region <subscription_id> # az login first; untested live
costdna scan --cloud gcp --region <project_id> # gcloud auth ADC; untested liveThe Azure / GCP collectors live at src/costdna/collectors/azure_live.py and src/costdna/collectors/gcp.py. They follow each cloud's official SDK patterns with documented type signatures and required IAM scopes — but I haven't validated either against a live account. Anyone with an Azure subscription or a GCP project can flip the ⚠ to ✅ in an afternoon — the code is in place, it just needs a real run.
$ costdna ask "why did our bill spike Tuesday?" --from-dir runs/today
? why did our bill spike Tuesday?
╭─── CostDNA ────────────────────────────────────────────────────────────╮
│ Resource `i-0c4f3230` (predicted team: ml, conf 0.92) had a $7.30 │
│ cost spike at Tue 16:00 UTC. Team ml's deploy at Tue 14:18 (commit │
│ a4f2c91, repo ml-training-pipeline) is the most likely cause │
│ (Granger p=0.000). Two other ml-team RDS instances spiked at the │
│ same time, suggesting the deploy fanned out across the cluster. │
╰────────────────────────────────────────────────────────────────────────╯The agent has 10 tools available — the LLM (GPT-4o, function-calling; LLM backend is pluggable) chains them to answer questions like:
- "Which 5 resources are racking up the most spend?" →
top_spenders - "What does
i-9f8e7dbelong to?" →attribute_resource - "Find the largest cost spikes and what caused them" →
find_cost_spikes - "Which resources don't fit any team?" →
find_anomalies - "Show me everything that hasn't been used in days" →
find_idle - "What was active a month ago but went silent?" →
find_abandoned - "Compare the ml team and the data team" →
compare_teams - "What did
prod-rds-985438do recently?" →signal_history - "Find anything with 'warehouse' in the name" →
search_resources - "Just summarize the account" →
summarize_account
Three ways to use the agent:
# 1. One-shot question (CLI)
costdna ask "why did our bill spike Tuesday?" --from-dir runs/today
# 2. Multi-turn chat (CLI REPL)
costdna chat --from-dir runs/today
# 3. Web chat UI (Streamlit)
costdna serve # then open the "💬 Chat with the agent" tabSetup: pip install 'costdna[agent]' + export OPENAI_API_KEY=....
cost-dna.vercel.app — full landing page with the live agent, methodology, charts, and audit narrative.
GraphSAGE embedding space on the synthetic env: 4 teams form clean clusters; the tan "unowned" cluster (vendor / legacy / orphan / shadow resources) sits visibly apart and is caught automatically by the anomaly detector.
Tag-based cost attribution fails on 40–60% of real AWS resources. CostDNA infers ownership from behavioral fingerprints (IAM access, VPC traffic, deploy timing, cost time-series shape) using a Graph Neural Network, and writes the inferred tags back to AWS so existing FinOps tooling works on previously-unattributable spend.
Methodological finding (the most defensible thing in the repo): I tested CostDNA on two production-scale public cloud datasets (Microsoft's 2.6M-VM Azure trace and Microsoft Philly's 117K-DL-job trace) and audited my own results. Both first-cut high-accuracy numbers were tautologies — deployment_id is 100% deterministic of subscription_id on Azure; user_id is 85% deterministic of vc on Philly. With those leaks removed, behavioral attribution alone is modest. The audit pattern itself is the contribution: production cloud attribution is mostly a metadata-lookup problem, and behavioral fingerprinting matters specifically when metadata is missing or unreliable — exactly the gap CostDNA's synthetic env reproduces (where GraphSAGE hits 95%+ while feature-only baselines fail catastrophically).
$ costdna scan --aws-profile prod
┏━━━━━━━━━━━━━━━━━━━━━━ CostDNA — Executive summary ━━━━━━━━━━━━━━━━━━━━━━┓
┃ You have $9,570.32 in untagged spend across 60 resources. ┃
┃ ┃
┃ ✓ Ready to tag: 58 resources, $9,186.31 (96%) at ≥70% confidence ┃
┃ ⚠ Need review: 2 resources, $384.01 (4%) below 70% confidence ┃
┃ ┃
┃ Recommended actions: ┃
┃ • Tag 17 resources as ml → moves $4,412.54 out of 'untagged'. ┃
┃ • Tag 14 resources as data → moves $2,142.65 out of 'untagged'. ┃
┃ • Tag 16 resources as backend → moves $1,829.61 out of 'untagged'. ┃
┃ • Tag 12 resources as platform → moves $801.51 out of 'untagged'. ┃
┃ • Review 2 low-confidence resources before tagging — needs human eye. ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
$ costdna apply --predictions runs/today/predictions.csv --apply
58 tags written. Drop the 2 low-confidence ones into Slack for review. ┌─ costdna doctor ─ pre-flight your AWS account
├─ costdna discover ─ find candidate teams from IAM patterns
├─ costdna scan ─ predict ownership + dollars + anomalies
├─ costdna learn ─ confirm low-confidence guesses (active learning)
├─ costdna apply ─ write tags back to AWS
└─ costdna diff ─ weekly drift check (cron)
flowchart LR
subgraph AWS["Your AWS account (read-only)"]
CT[CloudTrail<br/>events]
IAM[IAM roles<br/>+ users]
VPC[VPC flow logs]
CE[Cost Explorer<br/>aggregates]
META[Resource<br/>metadata]
end
subgraph collect["Collectors (boto3, hardened)"]
SCAN[costdna scan]
end
subgraph features["Feature extraction"]
BEHAV[Behavioral features<br/>peak_hour, weekend_ratio,<br/>cost_slope, unique_users…]
SEMANTIC[LLM-derived semantic features<br/>sentence-transformers MiniLM<br/>over IAM role names + IDs]
GRAPH[Graph: VPC + IAM + flow edges]
end
subgraph model["GraphSAGE classifier"]
GNN[4-layer residual GraphSAGE<br/>+ supervised contrastive head]
PRED[predictions.csv<br/>resource_id → team + confidence]
end
subgraph agent["LLM agent (10 callable tools)"]
A[summarize_account<br/>top_spenders<br/>find_cost_spikes<br/>find_anomalies<br/>attribute_resource<br/>...]
end
subgraph downstream["Downstream"]
TAGS[AWS tags<br/>via costdna apply]
DASH[Existing FinOps dashboard<br/>CloudHealth / Vantage / etc.]
CHAT[Natural-language chat<br/>cost-dna.vercel.app]
end
CT --> SCAN
IAM --> SCAN
VPC --> SCAN
CE --> SCAN
META --> SCAN
SCAN --> BEHAV
SCAN --> SEMANTIC
SCAN --> GRAPH
BEHAV --> GNN
SEMANTIC --> GNN
GRAPH --> GNN
GNN --> PRED
PRED --> A
PRED --> TAGS
TAGS --> DASH
A --> CHAT
GraphSAGE learns a 2D-projected representation where same-team resources cluster together and unowned resources sit visibly separate.
Synthetic (4 teams + unowned mess): clean per-team clusters; the tan "unowned" cluster (vendor / legacy / orphan / shadow) sits visibly apart from the team clusters. The anomaly detector catches them automatically.
Real Azure (10 subscriptions × 200 VMs): clusters are looser because the per-VM features (summary CPU stats) are weaker than the synthetic case. Same-color points still group, but with overlap.
Every team leaves the same fingerprint on every resource it owns:
| Feature | What it captures |
|---|---|
event_count, unique_users, unique_roles |
Activity volume + team breadth |
peak_hour, weekend_ratio |
When work happens (afternoon=backend, off-hours=data, late-night=ml) |
cross_account |
Shared-services that span accounts |
cost_slope, cost_variance, cost_autocorr |
Cost shape: spiky training vs. flat services vs. periodic batch |
These become node features in a graph where edges come from VPC flows, shared IAM roles, and shared VPCs. A two-layer GraphSAGE classifier learns from a small labeled seed and propagates ownership.
We tested CostDNA on two production-scale public datasets and audited each one for label leakage. The same pattern emerged both times: structural metadata dominates real-world cloud attribution.
| Dataset | Resources | Teams | First-cut accuracy | Audited "shortcut" | Honest behavioral accuracy |
|---|---|---|---|---|---|
| Microsoft Azure | 2.6M VMs / 100 subs | 100 | LabelProp 97% | deployment_id → subscription (100% deterministic) |
GraphSAGE 6.9% (12× random) |
| Microsoft Philly | 117K DL jobs / 15 VCs | 15 | LabelProp 89% | user → vc (85% deterministic) |
GraphSAGE 14% (2× random) |
The methodological finding: in real cloud data, the dominant attribution signal is almost always structural metadata — deployment IDs, IAM principals, machine assignments — not behavioral time-series. CostDNA's first-cut numbers on Azure (97%) and Philly (89%) looked great until we audited and discovered the labels were essentially encoded in the graph already.
This negative-result-as-positive-finding is the most defensible thing in the project: production cost attribution is mostly a metadata-lookup problem; behavioral fingerprinting matters specifically when metadata is missing or unreliable, which is exactly what the synthetic env's hard-case kinds reproduce.
117K real DL training jobs at Microsoft Research's Philly cluster, attributed to 15 virtual clusters (research teams). 99.8% of machines are shared across multiple VCs, so machine co-location isn't a tautology.
But 85% of users belong to exactly one VC. So user_id is a near-tautological signal of team membership. This is the kind of finding that looks like a result if you don't audit and a methodology critique if you do:
| Edges enabled | LabelProp | GraphSAGE |
|---|---|---|
| All (machine + user) | 89.5% | 71.5% |
| Without user edges | 19.9% | 15.1% |
| Without machine edges | 89.9% | 71.9% |
| No graph at all | 10.0% | 13.1% |
The user-IAM edge is doing essentially all the work. In a production system this is exactly the realistic case: most cloud users belong to one team, and "who called this API" is the strongest team signal available. The methodology validates: graph-aware attribution exploits this signal effectively.
But for a fair test of behavioral attribution (independent of IAM-style metadata), only the third row matters: GraphSAGE 71.9% with machine edges removed but user edges kept; 15% if we strip everything.
We validated the pipeline (collectors, scaling, schema mapping) on Microsoft's published Azure trace. Reading this section in full matters — there's an audit story buried in it.
First-cut result (misleading): running with all features and graph edges, LabelProp scored 97% across 5–100 teams. That looked great. So we audited it.
The audit: in Azure, every deployment belongs to exactly one subscription. Verified across all 33,205 deployments in the 2.6M-VM dataset — 100% map 1:1 to subscriptions. The deployment_id graph edge is a perfect lookup of subscription_id. LabelProp's "97%" was a graph database join, not learning. We caught it; we're documenting it; nothing in the README claims that result anymore.
The honest result, deployment_id edges removed so the model has to attribute from behavior alone:
| N teams | GraphSAGE | LogReg | k-NN | LabelProp | Random |
|---|---|---|---|---|---|
| 5 | 34.6% ± 1.6% | 31.3% ± 0.8% | 28.6% ± 3.2% | 20.0% ± 2.0% | 20.0% |
| 10 | 22.4% ± 1.6% | 18.3% ± 0.3% | 17.3% ± 0.1% | 10.0% ± 1.9% | 10.0% |
| 25 | 10.6% ± 0.0% | 9.2% ± 0.8% | 10.0% ± 0.3% | 4.0% ± 0.2% | 4.0% |
| 100 | 6.9% ± 0.5% | 3.4% ± 0.1% | 3.8% ± 0.2% | 1.0% ± 0.0% | 1.0% |
GraphSAGE consistently wins, but the absolute numbers are modest — 7× random at 100 classes, not 90×. Why so low? The Azure trace only ships summary CPU stats (max/avg/p95) per VM, not the hourly time-series (the time-series files total 140GB). With those summary stats alone, behavioral fingerprinting just doesn't have enough to work with. With true hourly traces (or full CloudTrail-like event logs), the GNN's lift would be much larger — that's what the synthetic results below demonstrate, where we control the feature richness.
What this Azure run actually validates:
- The pipeline works at production scale — load, sample, build graphs, and train across 20,000 real VMs.
- GraphSAGE consistently outperforms feature-only baselines even on this thin data — not by a huge margin, but consistently across 5–100 classes.
- Where deterministic structural metadata exists, use it directly — don't reach for ML. Caught this honestly during audit.
The strong test of the methodology is on the synthetic AWS environment below, where we deliberately construct hard cases (shared services, cross-team resources, reassigned ownership) that break the structural-lookup shortcut and where the per-resource feature density matches what real CloudTrail provides.
Provisioned the labeled environment in a real AWS account (terraform/), ran the per-team simulators (simulation/) for 3 days on a 24/7 t3.micro EC2 (see terraform/simulator.tf) to generate authentic CloudTrail signal, then ran costdna scan against the live account. Repro: scripts/real-aws-test.sh → wait → scripts/real-aws-finish.sh. Total incremental spend: $0 (covered by AWS Free Tier + $100 credit).
| Metric | Value |
|---|---|
| Resources discovered | 25 |
| Labeled (synthetic env Terraform-provisioned) | 15 |
| Per-resource accuracy vs ground truth | 13 / 15 = 87% |
| High-confidence (≥ 0.79) accuracy | 13 / 13 = 100% |
| 5-fold CV accuracy | 80% ± 27% (2.4× random; +47% lift over best baseline) |
| CloudTrail events processed | 13,402 |
| Anomalies surfaced (low-conf flagged for review) | 5 — both wrong predictions are in this set |
The honest test of CostDNA's value prop: does it attribute spend correctly when the data is real but we know the answer? Yes — every high-confidence prediction (13 of 13) was correct. The 2 wrong predictions came back with confidence below 0.7 and were correctly surfaced by find_anomalies for human review. That's exactly the active-learning workflow the system is designed for.
The wide ±27% on k-fold reflects the small label set (15 labels split into 5 folds = 3 samples per fold, so each fold's accuracy is one of 0/3, 1/3, 2/3, 3/3). Methodology validates with tighter error bars on the synthetic env (95.7% ± 2.5% across 5 seeds, see below) where label count and feature density are controllable.
This run also exposed a real engineering finding: the original 4-layer / hidden_dim=16 GraphSAGE config tuned for the synthetic env's 50+ labeled nodes overfits hard on small real-AWS label sets — train accuracy 100% / test 0% by epoch 20. Auto-shrinking to 2 layers / hidden=8 / dropout=0.4 + early stopping + stratified split + class-weighted loss took the same data from 53% → 87% accuracy. See commits 93c0dee through ffec566 for the architecture changes.
Reproducibility note:
real-aws-finish.shrunsterraform destroyon completion. The scan outputs (predictions.csv, executive summary panel, metadata, explanations) are committed underdocs/real-aws-evidence/for verification — the labeled test account is now torn down.
$ costdna benchmark --synthetic --seeds 5
Model comparison — accuracy ± 1σ across 5 seeds
╭───────────┬──────────────┬──────────┬───────────┬──────────┬──────────┬──────────╮
│ Model │ Overall │ clean │ cross_t. │ reassg. │ sh.svc. │ sparse │
├───────────┼──────────────┼──────────┼───────────┼──────────┼──────────┼──────────┤
│ Majority │ 26.3% ±6.7% │ 23% ±9% │ 20%±40% │ 60%±49% │ 60%±49% │ 20%±40% │
│ LogReg │ 89.5% ±4.7% │ 99% ±3% │ 0% ±0% │ 60%±49% │ 60%±49% │ 100% ±0% │
│ k-NN(k=5) │ 76.8% ±4.2% │ 87% ±6% │ 0% ±0% │ 60%±49% │ 20%±40% │ 80%±40% │
│ LabelProp │ 96.8% ±2.6% │100% ±0% │ 40%±49% │ 100% ±0% │ 100% ±0% │ 100% ±0% │
│ GraphSAGE │ 94.7% ±4.7% │ 97% ±3% │ 40%±49% │ 100% ±0% │ 100% ±0% │ 100% ±0% │
╰───────────┴──────────────┴──────────┴───────────┴──────────┴──────────┴──────────╯
LogReg looks fine at 90% overall — but 0% on cross-team across all 5 seeds and 60% ±49% on shared-services. The graph-aware methods solve those.
24 resources, 15 labels, 5-fold CV: 13K real CloudTrail events captured but k-fold accuracy stays at random (~25-40% with high variance) due to insufficient labels. The collectors work end-to-end against real AWS — this is a validation of the engineering, not the model.
$ costdna learn --budget 14 --strategy least_confidence
Labels Test acc Overall Curve
4 72.2% 75.0% ██████████████████████░░░░░░░░
6 88.9% 90.0% ███████████████████████████░░░
10 94.4% 96.7% ████████████████████████████░░
12 100.0% 100.0% ██████████████████████████████
Real environments have some tags + tribal knowledge. The active-learning loop surfaces the lowest-confidence resources to a human ("which team owns i-0a1b2c…?"), retrains, and converges fast. This is the realistic bootstrap path.
$ costdna scan --show-kind
Top anomalies (don't fit any team)
data-ec2-cross_team-002 data conf=0.54 3.5σ from data centroid
ml-rds-reassigned-000 ml conf=1.00 3.0σ from ml centroid
backend-ec2-cross_team-001 backend conf=1.00 1.8σ from backend centroid
The model surfaces the resources that don't match any team well — exactly the synthetic hard cases (cross_team, reassigned), automatically discovered without being told they're hard. In production these are the resources you want a human to look at: vendor infra, leaked-credential workloads, new teams forming.
When a deploy precedes a cost spike with statistical significance (Granger causality, p < 0.05):
Resource
mlops-rds-002had a $9.43 cost spike at Wed 01:00. Team ml's deploy at Tue 23:28 (commitae5a13c, repoml-svc) is the most likely cause (p=0.000).
Lets you tell a CFO not just "the bill went up" but "this commit made it go up."
$ costdna calibrate
Confidence calibration — ECE = 0.001 (0 = perfectly calibrated)
When the model says 0.7, it's right 70% of the time. That makes the confidence column actionable — the active-learning loop and the apply threshold both rely on it being honest.
| Tool | Attribution mechanism | Scope (typical AWS account) | Untagged-resource handling |
|---|---|---|---|
| AWS Cost Allocation Tags | Reads existing tags | Tagged resources only — 40-60% of spend on most accounts | Nothing. Resources without tags are aggregated under "untagged". |
| AWS Cost Categories | Rules you write manually (regex on resource name / arn) | Whatever your rules cover | Manual: you write a rule per-pattern, per-team. Doesn't infer. |
| Kubecost | k8s pod / namespace metadata | Containerized workloads only — Lambda, RDS, S3, plain EC2 invisible | Out of scope. |
| CloudHealth / Vantage / Apptio | Tags + manual allocation rules | Tagged resources + rule-matched | Tag-based blind spot inherited; rules require maintenance. |
| CostDNA | Behavioral fingerprints (CloudTrail + IAM + VPC flow + cost shape) → GraphSAGE GNN | All AWS resources that emit CloudTrail | Inferred with calibrated confidence (ECE = 0.001). Writes tags back so downstream tools see them. |
Quantitative comparison on the synthetic 4-team / 68-resource env (5-seed mean accuracy on hard cases):
| Method | Clean | Cross-team | Reassigned | Shared-services | Sparse |
|---|---|---|---|---|---|
| Tag-based (CloudHealth, Vantage, etc.) ¹ | 100% | 0% | 0% | 0% | 0% |
| LogReg (feature-only) | 99% | 0% | 60% | 60% | 100% |
| LabelProp (graph-aware) | 100% | 40% | 100% | 100% | 100% |
| GraphSAGE (CostDNA) | 97% | 40% | 100% | 100% | 100% |
¹ Tag-based tools only attribute pre-tagged resources. The synthetic env has no team tags by design — that's the regime CostDNA is built for. On a tag-complete account, every tag-based tool is already 100% by definition; the question is what fraction of resources actually have tags. CostDNA's contribution is the inferred-tags layer for the resources that don't.
Positioning: CostDNA isn't a dashboard — it's the missing input layer that makes every other FinOps tool work on previously-unattributable resources. Run costdna apply, then your existing dashboard suddenly explains 90% of spend instead of 50%.
pip install -e .
costdna scan --synthetic --show-kind # full pipeline
costdna benchmark --synthetic --seeds 5 # multi-seed evidence
costdna benchmark --synthetic --kfold 5 # stratified k-fold CV
costdna ablate --synthetic # feature & edge ablation
costdna calibrate --synthetic # reliability diagram
costdna learn --synthetic --compare-all # active learning curves
costdna discover # auto-find teams from IAMcostdna doctor --aws-profile prod # preflight first
costdna scan --aws-profile prod --save-dir runs/$(date +%F)
costdna apply --predictions runs/$(date +%F)/predictions.csv # dry-run
costdna apply --predictions runs/$(date +%F)/predictions.csv --applyFull walkthrough: see DEPLOYMENT.md. Considering running it on your team's account? Forward docs/evaluation.md — a one-pager covering the IAM policy needed, what it does, what it can't do, and what the failure modes look like.
cd terraform && terraform init && terraform apply
# run simulation/* on cron for 3-5 days, then:
costdna scan --aws-profile dev --save-dir runs/firstFor FinOps engineers who'd rather click than type. Single-page app:
upload a saved predictions.csv (or run a synthetic scan in-browser),
filter by team / type / confidence, generate aws ec2 create-tags commands.
pip install 'costdna[ui]'
costdna serve # http://localhost:8501costdna watch runs a fresh scan, diffs against the previous run, and
posts a digest (drifted resources, new anomalies, lost-confidence flags)
to Slack/Discord. Designed for cron:
# Daily at 6am UTC
0 6 * * * /usr/local/bin/costdna watch --aws-profile prod \
--slack-webhook $SLACK_WEBHOOK_URLEach run saves to a date-stamped subdirectory under --state-dir
(default runs/watch/). The digest format works with both Slack and
Discord webhooks.
The fastest way to try CostDNA — pulls a prebuilt image with all dependencies and the embedding model baked in:
# 30-second synthetic demo
docker run --rm pauti04/costdna scan --synthetic --epochs 50
# Live AWS scan (mount your AWS credentials)
docker run --rm -v ~/.aws:/root/.aws pauti04/costdna scan --aws-profile prodThe image is multi-arch (linux/amd64 and linux/arm64), built from this repo via GitHub Actions on every release tag.
To build locally:
docker build -t costdna .
docker run --rm costdna scan --syntheticsrc/costdna/
collectors/aws.py hardened boto3 collectors (retries, fallbacks, throttling)
collectors/synthetic.py realistic synthetic data with 4 hard-case kinds
features.py 9-feature behavioral extraction
graph.py NetworkX (VPC + IAM + VPC-CIDR edges) → PyG conversion
model.py GraphSAGE + supervised contrastive head
train.py training loop with stratified split
baselines.py Majority / LogReg / k-NN / LabelProp baselines
benchmark.py multi-seed + k-fold harness with mean ± std
ablate.py feature & edge ablation
calibrate.py ECE + reliability diagram
anomaly.py centroid-distance anomaly detection on GNN embeddings
active.py active-learning loop (random / least_confidence / margin)
explain.py Granger-causality spike explainer
summary.py executive summary builder ($ untagged → newly attributed)
tagger.py AWS tag write-back (dry-run + live)
drift.py diff two scans, surface resources with changed teams
doctor.py preflight checks for live AWS scans
discover.py team auto-discovery from IAM role naming patterns
output.py Rich-formatted tables, panels, sparklines
cli.py 14 subcommands wired to the above
terraform/ 4-team labeled AWS environment
simulation/ per-team workload generators
tests/ pipeline + baseline-failure invariants
DEPLOYMENT.md step-by-step runbook for real AWS
Four teams (backend, data, ml, platform) × four resource types × five resource "kinds":
| Kind | What it models | Why it's hard |
|---|---|---|
clean |
Single-team usage | Easy — any model gets these |
shared_service |
Backend's RDS/S3, hammered by data + ml (~65% cross-team callers) | Behavioral features point the wrong direction |
cross_team |
Used roughly equally by two teams (~70% noise) | Same |
reassigned |
Team A owned it for 7 days; team B took over | Time-window features blend two teams |
sparse |
Cold-storage S3, infrequent Lambdas | Few events → unstable fingerprint |
IAM roles use realistic patterns (apicore-execution-role, etl-runner-role, mlops-sagemaker-training, devops-eks-node) — the team is implied by tribe naming, not stated. The model has to infer team from behavior, not read it off the role name.
MIT



