feat(skills): SRE skill-library expansion — incident playbooks + runtime/AI engines (11 skills)#94
Merged
Merged
Conversation
…, SLO burn, connectivity, triage router Adds five read-only diagnostic skills that compose existing tool groups (no new tools, no code change — go:embed picks up builtin/*.md): - deploy-regression: align Argo sync time vs. error/latency onset, name the rollback-target revision. - capacity-scheduling: classify Pending pods as capacity vs. constraint vs. downstream-block from the verbatim FailedScheduling reason. - slo-burn: multi-window multi-burn-rate budget analysis, fast-burn vs. slow-burn page decision. - network-connectivity: walk the request path (DNS → Service/endpoints → NetworkPolicy → Ingress/mesh) and stop at the first broken layer. - triage-orchestrator: breadth-first first-responder router that localises the blast radius and hands off to one specialist skill. Signed-off-by: rlaope <piyrw9754@gmail.com>
…ruby, dotnet, native, ai) Fills the engine-level monitoring gaps where the perf/prom tools already exist but no skill drove them. Each playbook embeds the runtime's compile/optimisation model so the diagnosis names the cause class and the lever, not just the symptom. Read-only; perf.* sampling is RiskHigh and gated on operator approval. - node-runtime: V8 event-loop lag vs. GC (scavenge/mark-sweep) vs. TurboFan deopt; CPU profile via perf.v8_inspector_*. - go-runtime: goroutine leaks, GC pacing (GOGC/GOMEMLIMIT), GOMAXPROCS oversubscription; pprof via perf.go_pprof_cpu. - ruby-runtime: GVL contention, generational GC, YJIT, malloc bloat; stacks via perf.rbspy_dump. - dotnet-runtime: gen2/LOH GC, ThreadPool starvation, Server-vs-Workstation GC, tiered JIT warmup (prom EventCounters). - native-perf: C/C++/Rust CPU hotspots — codegen/cache/branch/contention cause classes; perf.linux_perf_record call graph. - ai-inference: LLM serving TTFT vs. ITL decomposition, KV-cache/batch saturation, GPU compute-vs-memory bound across vLLM/Triton/TGI/TorchServe. Signed-off-by: rlaope <piyrw9754@gmail.com>
…ng, Node heap default, vLLM KV-cache metric, .NET threadpool injection) Signed-off-by: rlaope <piyrw9754@gmail.com>
This was referenced May 28, 2026
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Expands the built-in skill library from 13 → 24, all read-only markdown playbooks composing existing tool groups (no new tools, no Go change —
//go:embed builtin/*.mdauto-loads them;TestLoadBuiltinasserts>= 7).Incident / ops playbooks (5)
FailedSchedulingreason; reasons on requests, not utilisation.Runtime / compile-engine + AI playbooks (6)
Fill the engine-level gaps where perf/prom tools existed but no skill drove them. Each embeds the runtime's compile/optimisation model so the output names a cause class + lever.
perf.*sampling is RiskHigh (operator-approved).perf.v8_inspector_*.perf.go_pprof_cpu.perf.rbspy_dump.perf.linux_perf_record.Combined with existing jvm-gc / jvm-thread / py-perf, runtime coverage now spans Go, JVM, .NET, V8/Node, Python, Ruby, native, and AI serving.
Test plan
go test ./internal/core/skills/...— all 24 builtin skills parse/loadgo test ./...— full suite greenallowed_toolsentry references a registered tool (k8s/prom/log/trace/db/alert/gitops/perf)skills.Parseinvariants hold for each file (name==stem, description, ≥1 allowed_tools, non-empty body)