A Claude Code skill that audits the operational health of a project in production.
Most vibe-coded projects ship fast. Every time I onboard on a new project, I find the same gaps, devopsiphai asks five questions instead:
-
Can I onboard easily? Can a new developer clone the repo, start the project, and contribute without asking anyone for anything?
-
Can I deploy safely? Can I ship a new version to staging and production without being afraid of breaking either environment?
-
Do I know what is running where? After a deploy, can I tell exactly what version is live, when it was deployed, and what migration state the database is in?
-
Can I see what is happening? Do I have metrics, error tracking, and analytics to know whether the deploy went well and whether users are behaving as expected?
-
Can I recover if something goes wrong? If errors are reported, can I roll back the application, the database, and the frontend independently — and do I have a runbook for each?
These are the questions that matter when you are running a live product with a small team. The ARC score is the answer to all five, expressed as a grade. Scoring is kind of gamifying the process of getting to a operationally good project.
ARC — Automation, Reporting, Control — is a framework for operational excellence I built with previous CTOs. It is more digestible for small teams than existing frameworks. Every audit is scored against three pillars:
| Pillar | Answers | Covers |
|---|---|---|
| Automation | Can I deploy safely? · Do I know what is running where? | CI/CD, artifact promotion, version tracking, migration automation, security scanning |
| Reporting | Can I see what is happening? | Observability, metrics, error tracking, alerting, dashboards, product analytics |
| Control | Can I onboard easily? · Can I recover? | Onboarding, secrets management, backup/rollback, IaC reproducibility, git auditability |
A — Present and enforced (automated, no manual bypass possible)
B — Present, partially enforced
C — Present, not enforced (exists but can be bypassed)
D — Partially present, not enforced
F — Absent
One grade per pillar. One overall ARC grade.
A full audit run produces two files in /tmp/devopsiphai/<timestamp>/:
The full audit report covering:
- Phase 1 — Factual map of the project: repo structure, stack, architecture, auth, database, secrets, observability, hosting, onboarding, git workflow, testing, AI usage, code quality, questionable decisions
- Phase 2 — Deep domain audits: CI/CD, containers, IaC, security, observability, onboarding
- Phase 3 — ARC score with the five questions answered, per-pillar grades, findings by severity, and priority actions
- Phase 4 — Workflow checks: runtime lockdown, developer environment, testing layers, task runner, local-first stack
- Phase 5 — Delivery checks: artifact identity, pipeline correctness, migration automation, rollback strategy, infrastructure codification
A flat, ordered, actionable checklist derived entirely from the audit findings:
# TODO — my-project
Generated by devopsiphai | 2026-03-17 | ARC: D/D/F overall: D
## 🔴 Immediate
- [ ] [XS] Rotate AWS IAM key AKIA... — go to AWS Console → IAM → Security credentials
- [ ] [XS] Run `git rm --cached backend/.aws/.env.dev` and push
## 🟠 Workflow Foundation
- [ ] [XS] Add `.nvmrc` to frontend/ with content `20` (matches node:20-slim in Dockerfile)
- [ ] [XS] Add `.python-version` to backend/ with content `3.11`
- [ ] [S] Add `scripts/db-dump.sh` — dumps staging Postgres using pg_dump to `./dumps/<timestamp>.sql`
- [ ] [S] Add `scripts/db-load.sh` — loads a given .sql file into local Supabase via pg_restore
- [ ] [S] Add `scripts/env-generate.sh` — reads ports and keys from running containers,
writes backend/.env and frontend/.env
- [ ] [XS] Add `.env.example` to backend/ — list all 56 vars, mark Clerk/Stripe/OpenAI
as "# obtain from tech lead"
## 🟡 Pipeline & Delivery
- [ ] [S] Tag Docker images with `${{ github.sha }}` in deploy-staging.yml
- [ ] [S] Add `versions.json` tracking artifact version + migration checkpoint per environment
- [ ] [M] Update deploy-prod.yml to promote staging image instead of rebuilding
- [ ] [XS] Read image tag from versions.json staging entry
- [ ] [XS] Pull that image from ECR
- [ ] [XS] Deploy to production App Runner
- [ ] [XS] Update versions.json production entry on success
...Every item names an actual file, script, or config. M and L tasks are broken into XS subtasks. Nothing says "improve" or "consider."
# Clone the repo
git clone https://github.com/sanhajio/devopsiphai ~/.claude/skills/
# Or copy the skill to Claude Code's skills directory
cp -r devopsiphai/skills/devopsiphai ~/.claude/skills/Restart Claude Code. The skill is available automatically.
- Claude Code with subagent support
- Projects accessible from your local filesystem or a mounted path
Point Claude Code at your project and trigger the skill naturally:
audit the devops side of this project
devopsiphai my project at ~/code/myapp
give me an ARC score for this repo
run just the preliminary audit on this project
generate a TODO for this project
| Intent | What runs |
|---|---|
| Full audit | Phases 1 → 2 → 3 → 4 → 5 → 6 |
| Exploration only | Phase 1 (preliminary audit) |
| Specific section | "just check secrets", "check my CI", etc. |
| ARC score only | Phase 3 (requires prior audit output) |
| TODO only | Phase 6 (requires Phases 3–5 output) |
skills/devopsiphai/config.yaml:
# Output directory for timestamped reports
output:
directory: /tmp/devopsiphai
# Sections to skip in the preliminary audit (1.1–1.17)
skip_sections: []
# - "1.14" # AI Usage
# - "1.16" # Questionable Architecture Decisions
# Domains to skip in the domain audit
skip_domains: []
# - iac
# - containers
# ARC pillar weights (must sum to 1.0 per pillar)
arc_weights:
automation:
cicd: 0.40
testing: 0.30
containers: 0.15
code_quality: 0.15
reporting:
observability: 0.50
logging_tracing: 0.30
user_audit_trail: 0.20
control:
security: 0.30
iac: 0.25
backup_rollback: 0.20
git_auditability: 0.25Phase 1 is facts only. No suggestions, no judgement, no icons. The audit maps what exists before it evaluates anything. This prevents the common failure mode of AI audits: confidently critiquing things that don't exist or misreading what does.
Every check derives from what was found. Phase 4 and 5 checks are generated from Phase 1 findings. If the project uses Python, it checks for .python-version. If it uses MongoDB, it checks for mongodump. Nothing is assumed.
The TODO is derived, not invented. Phase 6 reads FAIL/PARTIAL checks from Phases 4–5 and CRITICAL/HIGH/MEDIUM findings from Phase 3. Every item in the TODO names an actual file path, tool, or config key from the audit. No generic recommendations.
Parallel subagents per section. Each of the 17 Phase 1 sections runs as an independent subagent. On a real codebase this makes Phase 1 fast enough to be useful in practice.
Local-first. The skill flags any service that could run locally but is hitting external SaaS, and suggests self-hostable alternatives. Anything that can run in Docker should run locally for development.
devopsiphai/
├── SKILL.md ← router, ARC definition, phase sequencing
├── preliminary-audit.md ← Phase 1: 17 factual sections
├── domain-audit.md ← Phase 2: routes to reference files
├── arc-scoring.md ← Phase 3: ARC grades + findings
├── workflow.md ← Phase 4: developer workflow checks
├── delivery.md ← Phase 5: pipeline + rollback checks
├── todogen.md ← Phase 6: TODO generation
├── post-run.md ← report dump + feedback prompt
├── config.yaml ← weights, skip lists, output directory
├── analysis/
│ ├── onboarding.md ← onboarding domain audit
│ ├── observability.md ← observability domain audit
│ └── security.md ← security domain audit
└── references/
├── cicd.md ← CI/CD domain audit
├── containers.md ← containers domain audit
└── iac.md ← IaC domain audit
If you use this, run it on a project, have opinions about the ARC framework, or just want to share what you found — please reach out. Feedback, disagreements, and improvements are all welcome and would genuinely make my day.
The skill improves through use. After each audit run, Claude Code will ask if you want to log feedback. That feedback is saved to /tmp/devopsiphai/<timestamp>/feedback.md.
To contribute improvements:
- Run the skill on a real project
- Note what was wrong, missing, or unhelpful
- Open a PR with the change to the relevant phase file
The most valuable contributions are: checks that should exist but don't, output formats that are hard to read, and TODO items that weren't granular enough to act on. You can find me on:
MIT