diagnos — Incident Intake & Operational Triage

name	diagnos
description	Structured incident intake, operational triage, blast-radius analysis, and investigation framing for engineering incidents, regressions, outages, production anomalies, rollout failures, and debugging requests. Use this skill before debugging begins — whenever a report is still ambiguous, severity is unclear, ownership is uncertain, or the organization needs to determine what is actually broken, who is affected, and what investigation path should start first. Trigger on phrases like "investigate this", "users are reporting", "something is broken", "production issue", "incident", "regression", "outage", "why is this happening", "triage this", "help classify this", or anytime the incoming report is operationally noisy, emotionally escalated, or technically incomplete.
license	MIT

diagnos — Incident Intake & Operational Triage

The purpose of this skill is not to debug.

It transforms ambiguous operational noise into a structured engineering investigation — before debugging begins.

Most incidents fail before debugging even starts: severity inflation, symptom confusion, chasing secondary effects, debugging before defining the failure, assigning the wrong owner, treating customer pain as implementation pain, conflating infrastructure symptoms with application defects.

diagnos establishes operational clarity so that investigation can begin from a known position.

Where This Fits

raw incident
    ↓
diagnos   ← this skill
    ↓
disciplina
    ↓
monumentum

This skill produces a structured handoff. It does not debug, fix, or postmortem.

Install

Using the skills CLI (recommended):

npx skills add naiplawan/diagnos

Or install manually:

# User-level (applies across all projects)
cp -r diagnos ~/.claude/skills/diagnos

# Or project-level
cp -r diagnos /path/to/project/.claude/skills/diagnos

Claude Code will pick up the skill on next session start.

Invoking

Explicit invocation:

/diagnos users can't check out, something is broken

Or describe an incident and Claude will apply the skill when the report is ambiguous, operationally noisy, or severity is unclear.

Core Principle

Do not debug a system until you can clearly describe what is failing, who is affected, how severe it is, what changed, what systems are involved, and what is still unknown.

Severity is operational, not emotional. It is determined by customer impact and blast radius, recovery ability and data integrity risk, revenue or workflow impact, and rollout blockage — NOT by log volume, emotional escalation, executive visibility, or engineering frustration.

Unknowns must remain unknown. Do not invent root causes, impact scope, mitigation status, or ownership certainty. Prefer:

"Impact scope still under verification."

over:

"Likely isolated."

if isolation has not been confirmed.

Responsibilities

This skill owns:

incident intake and symptom extraction
operational triage and blast-radius analysis
severity framing and ownership narrowing
timeline reconstruction and regression classification
dependency identification and investigation framing
missing-information detection

This skill does NOT:

implement fixes or perform deep debugging
produce postmortems or write leadership updates
route workflows across skills (that is ordinator)

Viability Check — Run Before Any Phase

Before extracting or structuring anything, check whether the report contains enough observable information to triage.

Minimum viable report requires at least two of:

a described symptom or behavior
an affected system, workflow, or user group
a time reference (even approximate)

If the report falls below this threshold, push back before proceeding:

"There is not yet enough operational evidence to classify this confidently. Can you share what was observed, which users or systems are affected, and when it started?"

Do not invent structure from noise. Proceed to Phase 1 only when minimum viability is met.

Workflow — Seven Phases

Phase 1 — Normalize the Report

Extract: exact symptoms, affected workflows, affected users/systems, timestamps, environments, release versions, regions, services, reproduction information, alerts/logs, observed failures, and operational changes.

Separate every piece of information into one of four categories — never merge them:

Category	Meaning
Fact	directly observed
Assumption	inferred but unverified
Hypothesis	possible explanation
Unknown	not yet established

Phase 2 — Define the Failure Boundary

What still works is often more valuable than what fails.

Is the failure global or partial?
Does rollback restore behavior?
Is the issue environment-specific?
Is this isolated to one workflow?
Does retry succeed?
Are reads affected or only writes?
Is degradation continuous or intermittent?

The failure boundary narrows the search space before any hypothesis is formed.

Phase 3 — Blast Radius Analysis

Level	Example
User-localized	One account broken
Workflow-specific	Checkout flow degraded
Service-wide	Entire API failing
Dependency-wide	Shared auth system unstable
Platform-wide	Infrastructure outage

Then identify downstream systems, upstream dependencies, release coordination implications, and customer-visible effects.

Phase 4 — Change Correlation

Identify recent changes: deployments, migrations, config changes, feature flags, infra scaling, dependency upgrades, certificate rotations, rollout sequencing, and operational policy changes.

Do NOT assume the most recent change caused the issue. Only establish:

"The issue began after X."

not:

"X caused the issue."

unless proven.

Phase 5 — Severity Classification

Severity	Definition
S0	catastrophic platform-wide outage
S1	major customer-impacting degradation
S2	significant workflow impairment
S3	localized regression or operational issue
S4	low-impact bug or investigation

Severity must be justified with blast radius, user impact, operational impact, recovery constraints, or data integrity implications. Avoid dramatic language.

Phase 6 — Ownership Narrowing

Identify likely ownership domains: frontend, backend, platform, infra, mobile, auth, data, networking, CI/CD, release engineering, observability.

Ownership narrowing is probabilistic. State confidence explicitly:

"Most likely within the API/auth boundary based on token refresh failures."

Phase 7 — Investigation Framing

Produce the next investigation steps with enough specificity to hand off cleanly into disciplina.

Good: "Focus next investigation on session refresh handling during reconnect events after token expiry." Bad: "Debug auth."

Output Structure

## Incident Summary
## Current Symptoms
## Affected Scope
## Severity Assessment
## Known Facts
## Unknowns
## Recent Changes
## Likely Ownership
## Initial Hypotheses
## Recommended Investigation Path
## Immediate Mitigations

Investigation Framing Rules

Good framing names a specific system, behavior, or time range. It narrows — it does not gesture.

Bad: "Investigate backend instability." Good: "Investigate whether the new deployment altered request timeout behavior between the API gateway and worker pool."

Bad: "The database is overloaded." Good: "Connection acquisition latency increased sharply after deployment; determine whether pool exhaustion or lock contention explains the queue growth."

Missing Information Detection

Aggressively identify what is absent: no timestamps, no reproduction path, no affected environment, no deployment history, no metrics or logs, unclear impact scope, unknown rollback state.

State gaps explicitly:

"The current report lacks enough information to determine whether this is isolated or systemic."

Refusal Conditions

Push back when:

severity is clearly inflated without observable evidence
the user demands certainty that has not been established
the report is purely hypothetical with no observed behavior
emotional escalation is being used as a substitute for operational detail

Do not catastrophize, do not minimize. Match the language to what has actually been confirmed.

Good: "Users intermittently lose saved progress after session refresh." Bad: "Critical customer data loss issue." — unless data was truly lost.

Success Criteria

Criterion	How to verify
Failure boundary is defined	Output distinguishes what works from what fails — not just what fails
Severity is calibrated	Severity level (S0–S4) is stated and justified with at least one of: blast radius, user impact, recovery constraint, data integrity risk
Ownership is narrowed	Output names at least one specific domain or system boundary, not just "the backend"
Uncertainty is explicit	Every unconfirmed claim is labeled Assumption, Hypothesis, or Unknown — none stated as fact
Investigation path is specific	Recommended Investigation Path names a system, behavior, or time range — not a generic verb like "investigate"
No severity inflation	Output contains no dramatic language unsupported by stated facts

The output should make the incident feel smaller, clearer, and more structured — not more dramatic.

License

MIT — see LICENSE.

Credit

Packaged by @rachaphol.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

diagnos — Incident Intake & Operational Triage

Where This Fits

Install

Invoking

Core Principle

Responsibilities

Viability Check — Run Before Any Phase

Workflow — Seven Phases

Phase 1 — Normalize the Report

Phase 2 — Define the Failure Boundary

Phase 3 — Blast Radius Analysis

Phase 4 — Change Correlation

Phase 5 — Severity Classification

Phase 6 — Ownership Narrowing

Phase 7 — Investigation Framing

Output Structure

Investigation Framing Rules

Missing Information Detection

Refusal Conditions

Success Criteria

License

Credit

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

diagnos — Incident Intake & Operational Triage

Where This Fits

Install

Invoking

Core Principle

Responsibilities

Viability Check — Run Before Any Phase

Workflow — Seven Phases

Phase 1 — Normalize the Report

Phase 2 — Define the Failure Boundary

Phase 3 — Blast Radius Analysis

Phase 4 — Change Correlation

Phase 5 — Severity Classification

Phase 6 — Ownership Narrowing

Phase 7 — Investigation Framing

Output Structure

Investigation Framing Rules

Missing Information Detection

Refusal Conditions

Success Criteria

License

Credit

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages