Local tools for agent skill coverage and long-running agent job monitoring.
SkillScope is a local, plugin-first viewer for Codex and Claude Code sessions. It maps SKILL.md instructions to trace evidence, highlights the exact spans that were followed, violated, ignored, or not taken, and stores completed analyses in a local SQLite cache.
The same local app also includes Agent Monitor: a job-oriented control room for long-running local Codex / Claude / Docker benchmark work. It shows jobs, progress, artifacts, containers, safe-stop previews, and stop results instead of a raw PID list.
中文:SkillScope 是一组本地 agent 工具。第一部分是面向 Codex / Claude Code 的 skill 覆盖分析:把 SKILL.md 里的约束和真实轨迹证据对齐,精确高亮遵循、违反、忽略、未走分支,并把分析结果缓存到本地 SQLite。第二部分是 Agent Monitor:面向本机长任务的任务控制室,展示 job、进度、产物、容器、安全停止预览和停止结果,而不是裸 PID 列表。
SkillScope: inspect skill usage in Codex / Claude Code traces, judge compliance, and produce evidence-backed rewrite proposals.Agent Monitor: watch local long-running agent jobs, inspect artifacts and Docker containers, preview what a safe stop would touch, then stop only attributable child work.
中文:
SkillScope:查看 Codex / Claude Code 轨迹里的 skill 使用窗口,判断合规性,并生成有证据的 skill 修改建议。Agent Monitor:监控本机长时间运行的 agent 任务,查看产物和 Docker 容器,先预览安全停止范围,再只停止可归属的子任务。
| Skill graph / Skill 图 | Analysis process / 分析过程 |
|---|---|
![]() |
![]() |
| Optimization results / 优化结果 |
|---|
![]() |
Early single-instance SkillsBench checks show the loop improving both verifier outcomes and SkillScope non-compliance on real coding-agent traces.
中文:当前单实例 SkillsBench 检查已经能看到,闭环优化可以改善 verifier 结果和 SkillScope 违约率。
| Task | Native verifier | SkillScope NC | Decision |
|---|---|---|---|
court-form-filling |
4/5 -> 5/5 tests, reward 0 -> 1 |
25% -> 0% |
accept optimized skill |
manufacturing-fjsp-optimization |
13/15 -> 15/15 tests, reward 0 -> 1 |
50% -> 2.6% |
accept optimized skill |
azure-bgp-oscillation-route-leak |
3/4 -> 3/4 tests, reward 0 -> 0 |
27.8% -> 23.3%, final-output NC 35.9% -> 8.7% |
not accepted; native still failed |
SkillScope coverage:
- Discovers local Codex and Claude Code sessions.
- Groups traces by project, then shows individual skill-use windows instead of forcing whole-session analysis.
- Highlights exact source spans in
SKILL.md, not just whole lines. - Compiles a skill into constraints, branches, ordering rules, numeric checks, output contracts, and a skill graph.
- Uses the skill graph to guide trace inspection instead of relying on keyword matching or blind summarization.
- Runs the matching local agent as judge: Codex captures use Codex; Claude Code captures use Claude Code.
- Streams analysis progress and generated artifacts.
- Caches completed findings in
.skilllens/skillscope.sqlite. - Produces anti-bloat skill rewrite proposals: minimal evidence-backed deltas, not automatic skill self-expansion.
Agent Monitor:
- Aggregates local agent work into jobs with status, root agent, runtime, progress, processes, containers, artifacts, and stale state.
- Polls active jobs and artifact tails every 2 seconds.
- Marks jobs stale after 30 seconds without artifact or log updates, but never auto-stops them.
- Shows Active, Recent, Stale, Docker, and attention-oriented filters.
- Previews safe stops before execution, including protected root agents, process groups, and Docker containers.
- Protects external Codex / Claude root process groups from stop actions.
- Cleans only containers listed in the stop preview, including detached Docker containers and Compose-project containers discovered through Docker labels.
- Persists stopped job history as local JSON under
.skilllens/monitor-runs/<date>/<jobId>.json.
中文:
SkillScope 覆盖分析:
- 自动发现本机 Codex / Claude Code sessions。
- 先按项目归类,再按单次 skill 使用窗口查看,不强制分析完整长对话。
- 对
SKILL.md做精确 span 高亮,而不是整行粗标。 - 把 skill 编译成约束、条件分支、顺序规则、数值检查、输出契约和 skill graph。
- 用这个图来引导轨迹分析,而不是只做关键词匹配或盲总结。
- 调用同一种本地 agent 做 judge:Codex 轨迹用 Codex,Claude Code 轨迹用 Claude Code。
- 实时展示分析过程和产物。
- 已完成结果保存到
.skilllens/skillscope.sqlite,刷新后可复用。 - 对违反/忽略项生成 anti-bloat 的 skill 优化建议:只给有证据的最小补丁,不自动堆防御性废话。
Agent Monitor:
- 把本机 agent 工作聚合成 job,展示状态、root agent、运行时长、进度、进程、容器、产物和 stale 状态。
- 每 2 秒刷新 active job 和 artifact tail。
- 超过 30 秒无产物或日志更新时标记为 stale,但不会自动停止。
- 提供 Active、Recent、Stale、Docker 和 attention 排序/过滤。
- 停止前先展示 safe-stop preview,包括受保护 root agent、将停止的进程组和将清理的 Docker 容器。
- 永远保护外部 Codex / Claude root 进程组。
- 只清理 stop preview 中列出的容器,包括 detached Docker 容器,以及通过 Docker label 发现的 Compose project 容器。
- 停止后的 job history 写入
.skilllens/monitor-runs/<date>/<jobId>.json。
Requirements / 依赖:
- Node.js 20+
- npm
- Codex CLI if you want the
/skillscopeentry
npm install
npm run devOpen / 打开:
http://localhost:5173
Open Agent Monitor directly / 直接打开 Agent Monitor:
http://localhost:5173/?view=monitor
Install the Codex slash prompt / 安装 Codex 入口:
npm run install:codex-promptThen run inside Codex / 然后在 Codex 中运行:
/skillscope
The launcher captures the current session, identifies trace-proven skill files, writes a local capture bundle, and opens the browser UI.
启动器会捕获当前 session,识别轨迹中能证明被使用的 skill 文件,写入本地 capture bundle,并打开浏览器界面。
Agent Monitor is for local single-user work: Codex / Claude runs, SkillsBench experiments, benchmark scripts, docker run, Docker Compose projects, and Docker-contained codex-exec jobs.
中文:Agent Monitor 面向本机单用户工作:Codex / Claude 长任务、SkillsBench 实验、benchmark 脚本、docker run、Docker Compose project,以及 Docker 容器里的 codex-exec 任务。
Run the app:
npm install
npm run devOpen:
http://localhost:5173/?view=monitor
Use the control room:
- Keep
Liveenabled for 2-second refresh, or pause it while inspecting a stopped job. - Use
Active,Recent,Stale,Docker, search, and sort controls to find the job. - Open a job to inspect progress, latest output, artifact tail, process tree, container details, and persisted history paths.
- Click the safe-stop preview action first. The preview shows protected root processes, stoppable child process groups, and containers that would be removed.
- Execute stop only after the preview. The result shows killed processes, removed containers, residual checks, cleanup errors, and the saved history JSON.
安全规则:
- 外部 Codex / Claude root process group 不会被停止。
- 只有可归属到 agent job 的子任务 process group 可以停止。
- Docker cleanup 只处理 stop preview 中列出的容器。
- Compose project 通过 Docker labels 查找容器并
docker rm -f,不会运行docker compose down。
Local monitor data:
.skilllens/monitor-runs/<date>/<jobId>.json: stopped / recent job history withlastStopResult..skilllens/monitor-smoke-screenshots/<run-id>/: screenshot-backed verification reports.
API surface:
GET /api/agent-jobs: active and recent jobs.POST /api/agent-jobs/:id/stop/preview: safe-stop preview.POST /api/agent-jobs/:id/stop: execute the previously previewed stop plan.GET /api/agent-processes: compatibility endpoint for older process views.
Every Agent Monitor feature is covered by screenshot-backed smoke evidence.
Run the full core verification:
npm run monitor:verifyRun Docker coverage as well:
npm run monitor:verify:dockerThese commands start a temporary local Vite server, exercise the monitor UI, validate the generated evidence manifest, print the report path, and stop the server.
Evidence is written to:
.skilllens/monitor-smoke-screenshots/<run-id>/evidence.md
.skilllens/monitor-smoke-screenshots/<run-id>/evidence.json
The evidence contract currently checks 31 core screenshots and 42 Docker screenshots, including active job cards, live progress, stale state, protected root state, stop preview, stop result, recent history, detached Docker cleanup, Compose label cleanup, and Docker-contained codex-exec cleanup.
See docs/monitor-evidence.md for the complete feature-to-screenshot matrix.
- Select a project on the left.
- Select one skill-use window on the right.
- Open
Skill Highlight,Skill Graph,Trace,Analysis, orRewrite Skill. - Click
Start Analysisto launch the local agent-guided judge. - Reopen the same window later; cached findings are restored from SQLite.
中文:
- 左侧选择项目。
- 右侧选择一次 skill 使用窗口。
- 查看
Skill 高亮、Skill 图、轨迹、分析过程或优化 Skill。 - 点击
启动分析,由本地 agent 按 analyzer skill 检查轨迹。 - 之后重新打开同一窗口,会直接从 SQLite 恢复结果。
SkillScope treats a skill like a small specification program. The graph is not just a visualization; it is the analysis plan used to inspect the trace.
SKILL.md
-> constraint IR
-> skill graph: conditions, obligations, prohibitions, order, outputs
-> trace facts: commands, tools, files, edits, final output, event order
-> observed path through the skill graph
-> evidence-backed judgments
中文:SkillScope 把 skill 当成一个小型规格程序。图不是装饰性可视化,而是轨迹分析计划。
Status labels / 状态:
covered/遵循: the trace shows the required behavior.violated/违反: the trace shows explicit conflicting behavior.missed/忽略: the instruction applied, but the required behavior is absent.not_applicable/未走: the branch was not taken for this selected task.unknown/待判断: evidence is insufficient.
Violation and ignored are intentionally separate: absence is ignored/missed; explicit conflict is violated.
中文:违反和忽略必须区分。缺失 required action 是忽略;出现相反行为才是违反。
SkillScope can turn violated and ignored findings into an optimized skill proposal, but it deliberately avoids automatic self-expansion.
The rewrite output is a reviewed delta:
- cite the violated or ignored constraint;
- cite evidence event IDs;
- propose the smallest replacement/addition/deletion;
- define how a future trace should prove compliance;
- avoid defensive text such as
no old logic,never repeat the previous mistake, or genericbe careful.
中文:SkillScope 只生成可审查的最小补丁建议,不会把建议自动拼回原 skill,也不会鼓励 “no old logic / 不要再犯旧错 / be careful” 这类不可观测的防御性废话。
Analyze artifacts without the browser:
npm run analyze -- \
--skill sample_data/pdf-edit/SKILL.md \
--trace sample_data/pdf-edit/with-skill.jsonl \
--result sample_data/pdf-edit/result.with-skill.json \
--task sample_data/pdf-edit/task.md \
--out skillscope-reportAnalyze a capture bundle:
npm run analyze -- --bundle skillscope.capture.json --out skillscope-reportInternal repeated-finding audit for cached analyses:
npm run cia:auditSkillsBench Codex/GPT-5.5 experiment control:
npm run skillsbench -- plan --skillsbench-root /path/to/skillsbench --prebuilt-skillsbench-ghcr --trials 3See docs/skillsbench-experiment.md.
The audit writes ignored local artifacts under .skilllens/.
Use the single-instance loop first. One instance means one SkillsBench task plus one selected SKILL.md.
中文:先跑单实例闭环。一个实例就是一个 SkillsBench task 加一个选定的 SKILL.md。
original run
-> native verifier + SkillScope judge
-> gate: skip low-risk passing cases, optimize high-NC or failing cases
-> one optimized skill
-> rerun + native verifier + SkillScope judge
-> compare verifier pass count and non-compliance
Find clean one-skill tasks:
git clone https://github.com/benchflow-ai/skillsbench.git .skilllens/vendor/skillsbench
python - <<'PY'
from pathlib import Path
root = Path('.skilllens/vendor/skillsbench/tasks')
for task in sorted(root.iterdir()):
skills_dir = task / 'environment' / 'skills'
skills = sorted(skills_dir.glob('*/SKILL.md')) if skills_dir.exists() else []
if len(skills) == 1:
print(f'{task.name}\t{skills[0].relative_to(task)}')
PYRun one selected instance with the bundled workflow skill:
Use $skillscope-e2e on taskId=<task-id>, skillRelPath=<environment/skills/.../SKILL.md>, slug=<experiment-slug>.
The equivalent CLI phases are:
npm run build
npm run skillsbench -- plan \
--skillsbench-root .skilllens/vendor/skillsbench \
--out .skilllens/experiments/<slug>/original \
--agent codex \
--model gpt-5.5 \
--trials 1 \
--task <task-id> \
--prebuilt-skillsbench-ghcr \
--bench-arg --usage-tracking \
--bench-arg off
bash .skilllens/experiments/<slug>/original/pull-prebuilt-images.sh
SKILLSCOPE_TRIAL_TIMEOUT_SECONDS=7200 \
SKILLSCOPE_SKIP_FAILED=1 \
SKILLSCOPE_RUN_CONCURRENCY=4 \
bash .skilllens/experiments/<slug>/original/run-original.sh
npm run skillsbench -- collect \
--runs-root .skilllens/experiments/<slug>/original/jobs \
--out .skilllens/experiments/<slug>/original/collected
npm run skillsbench -- judge \
--plan .skilllens/experiments/<slug>/original/run-plan.json \
--trials-file .skilllens/experiments/<slug>/original/collected/trials.json \
--out .skilllens/experiments/<slug>/original/agent-analysis \
--agent-concurrency 1 \
--agent-timeout-ms 1200000Only optimize when native verifier failed, or selected-skill SkillScope non-compliance is high, or final/artifact constraints were violated or ignored. If native passes and NC is low, record pass-through and do not rewrite.
中文:只有 native verifier 失败、选中 skill 的 NC 高、或 final/artifact 约束被违反/忽略时才优化。native 已通过且 NC 低的实例直接 pass-through,不改 skill。
--prebuilt-skillsbench-ghcr uses skillsbench@1.1 plus per-task GHCR images
such as ghcr.io/benchflow-ai/skillsbench-task-env:standard-v1-<task>, so the
generated BenchFlow command pulls an environment image instead of rebuilding
that task Dockerfile locally.
中文:--prebuilt-skillsbench-ghcr 会使用 skillsbench@1.1 和按 task 切分的
GHCR 镜像,生成的 BenchFlow 命令会拉取环境镜像,而不是每次本地构建该 task 的
Dockerfile。
Then generate exactly one optimized skill and rerun:
npm run skillsbench -- propose \
--plan .skilllens/experiments/<slug>/original/run-plan.json \
--analysis .skilllens/experiments/<slug>/original/agent-analysis/violation-rates.json \
--out .skilllens/experiments/<slug>/optimized-skills \
--min-failures 1 \
--optimize-nc-threshold 0.1 \
--max-edits-per-skill 4 \
--agent-concurrency 1 \
--agent-timeout-ms 1200000
npm run skillsbench -- rerun-plan \
--plan .skilllens/experiments/<slug>/original/run-plan.json \
--optimized-skills-root .skilllens/experiments/<slug>/optimized-skills \
--out .skilllens/experiments/<slug>/optimized-run
SKILLSCOPE_TRIAL_TIMEOUT_SECONDS=7200 \
SKILLSCOPE_SKIP_FAILED=1 \
bash .skilllens/experiments/<slug>/optimized-run/run-optimized.sh
npm run skillsbench -- collect \
--runs-root .skilllens/experiments/<slug>/optimized-run/jobs \
--out .skilllens/experiments/<slug>/optimized-analysis/collected
npm run skillsbench -- analysis-plan \
--plan .skilllens/experiments/<slug>/original/run-plan.json \
--optimized-skills-root .skilllens/experiments/<slug>/optimized-skills \
--out .skilllens/experiments/<slug>/optimized-analysis
npm run skillsbench -- judge \
--plan .skilllens/experiments/<slug>/optimized-analysis/run-plan.json \
--trials-file .skilllens/experiments/<slug>/optimized-analysis/collected/trials.json \
--out .skilllens/experiments/<slug>/optimized-analysis/agent-analysis \
--agent-concurrency 1 \
--agent-timeout-ms 1200000For more scale, launch multiple Codex workers in parallel from an outer coordinator, one worker per (taskId, skillRelPath, slug). Keep each worker single-instance so artifacts stay debuggable.
中文:要并发时,用外层 coordinator 分发多个 Codex worker,每个 worker 只处理一个 (taskId, skillRelPath, slug),这样产物和失败原因都容易追踪。
Generated data stays local:
.skilllens/captures/: captured Codex / Claude Code bundles..skilllens/registry.json: local project and skill-use index..skilllens/skillscope.sqlite: cached agent analyses and coverage findings..skilllens/agent-judge/: per-run prompts, artifacts, raw output, findings, and rewrite proposals.
src/
App.tsx browser UI
lib/skillParser.ts SKILL.md to instruction units and constraints
lib/traceParser.ts Codex / Claude / ACP / generic JSONL parsers
lib/coverage.ts local constraint-level coverage pass
lib/agentJudge.ts parser for agent-generated findings and skill graphs
lib/report.ts Markdown / HTML / JSON export
scripts/
codex.ts Codex /skillscope launcher
capture.ts capture bundle writer
analyze.ts CLI analyzer
cia-audit.ts local cached-analysis aggregator
integrations/
skillscope-codex/ Codex slash prompt integration
claude-code/ Claude Code /skillscope command template
skills/
skillscope-analyzer/ skill that guides local agent analysis
docs/
capture-and-analysis.md capture contract
Alpha. The current focus is local-first, evidence-backed skill coverage for real Codex and Claude Code trajectories.



