Summary
Carry source, decision, and output provenance through the main workflow so downstream agents can audit and cite it.
This issue was generated from an org-wide EvalOps mining pass on 2026-05-10 07:57 UTC. It combines live GitHub repo signals with a per-repo arXiv search. Treat the research links as grounding for a concrete implementation, not as a request for a literature review.
Repo Evidence
- Repository description: Maestro — multi-model coding agent with TUI, web, IDE, Slack, and GitHub interfaces
- Tree signals: 79 docs files, 19 workflows, 1 proto files, 737 test-like files.
AGENTS.md:31 includes latent-spec language: Critical: Consult .github/workflows/ (evals.yml, nx-ci.yml, release.yml) to mirror CI environments.
AGENTS.md:36 includes latent-spec language: * Full Test Suite: npx nx run maestro:test --skip-nx-cache (Builds tui + maestro-web automatically). Run after every code change. * Linting: bun run bun:lint (Biome + Eval Verifier). Run after every code change. * Runtime Commands: Avoid long-lived dev/watch servers (e.g., npm run dev) unless th
AGENTS.md:54 includes latent-spec language: 2. Never force-push to main. This rewrites shared history and breaks collaborators. 3. Atomic commits only. Each commit should be one logical change. Don't mix unrelated changes. 4. Never use --force or --force-with-lease on shared branches.
AGENTS.md:187 includes latent-spec language: 4. Stop tasks when done - they'll auto-cleanup on Maestro exit, but explicit stops are cleaner 5. Use restart policies for resilient services - ideal for dev servers that should recover from crashes 6. Direct execution is safer - omit shell parameter for simple commands without pipes
AGENTS.md:374 includes latent-spec language: 4. Wire handler in src/cli-tui/tui-renderer.ts: ```typescript
CLAUDE.md:31 includes latent-spec language: Critical: Consult .github/workflows/ (evals.yml, nx-ci.yml, release.yml) to mirror CI environments.
Research Grounding
Repo axes: tooling, evaluation, security, desktop
Search keywords: run, maestro, use, dev, bun, never, servers, commands, test, build, command, npx
- arXiv:2508.07575v1 MCPToolBench++: A Large Scale AI Agent Model Context Protocol MCP Tool Use Benchmark (Shiqing Fan, Xichen Ding, Liang Zhang, Linjian Mo), 2025.
- arXiv:2603.24943v1 FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol (Jie Zhu, Yimin Tian, Boyang Li, Kehao Wu, Zhongzhi Liang, Junhui Li), 2026.
- arXiv:2508.12566v1 Help or Hurdle? Rethinking Model Context Protocol-Augmented Large Language Models (Wei Song, Haonan Zhong, Ziqi Ding, Jingling Xue, Yuekang Li), 2025.
- arXiv:2602.01129v1 SMCP: Secure Model Context Protocol (Xinyi Hou, Shenao Wang, Yifan Zhang, Ziluo Xue, Yanjie Zhao, Cai Fu), 2026.
- arXiv:2603.00123v1 CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers (Yannian Gu, Xizhuo Zhang, Linjie Mu, Yongrui Yu, Zhongzhen Huang, Shaoting Zhang), 2026.
- arXiv:2507.19570v1 MCP4EDA: LLM-Powered Model Context Protocol RTL-to-GDSII Automation with Backend Aware Synthesis Optimization (Yiting Wang, Wanghao Ye, Yexiao He, Yiran Chen, Gang Qu, Ang Li), 2025.
- arXiv:2604.13849v1 MCPThreatHive: Automated Threat Intelligence for Model Context Protocol Ecosystems (Yi Ting Shen, Kentaroh Toyoda, Alex Leung), 2026.
- arXiv:2506.14683v2 Unified Software Engineering Agent as AI Software Engineer (Leonhard Applis, Yuntong Zhang, Shanchao Liang, Nan Jiang, Lin Tan, Abhik Roychoudhury), 2025.
- arXiv:2503.23803v2 Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute (Yingwei Ma, Yongbin Li, Yihong Dong, Xue Jiang, Rongyu Cao, Jue Chen), 2025.
- arXiv:2506.19998v1 Doc2Agent: Scalable Generation of Tool-Using Agents from API Documentation (Xinyi Ni, Haonan Jian, Qiuyang Wang, Vedanshi Chetan Shah, Pengyu Hong), 2025.
What To Build
- Add stable identifiers for source records, derived decisions, and emitted outputs.
- Thread those identifiers through logs/events/API responses without leaking secrets.
- Provide a query or debug surface that reconstructs the chain for one completed workflow.
Acceptance Criteria
Notes
- Generated issue 2/5 for
evalops/maestro by evalops_org_miner.py.
- Before implementation, confirm the sampled latent-spec snippets still match
main; this issue intentionally cites exact file paths/lines where the mining pass saw them.
Summary
Carry source, decision, and output provenance through the main workflow so downstream agents can audit and cite it.
This issue was generated from an org-wide EvalOps mining pass on 2026-05-10 07:57 UTC. It combines live GitHub repo signals with a per-repo arXiv search. Treat the research links as grounding for a concrete implementation, not as a request for a literature review.
Repo Evidence
AGENTS.md:31includes latent-spec language: Critical: Consult.github/workflows/(evals.yml,nx-ci.yml,release.yml) to mirror CI environments.AGENTS.md:36includes latent-spec language: * Full Test Suite:npx nx run maestro:test --skip-nx-cache(Buildstui+maestro-webautomatically). Run after every code change. * Linting:bun run bun:lint(Biome + Eval Verifier). Run after every code change. * Runtime Commands: Avoid long-liveddev/watch servers (e.g.,npm run dev) unless thAGENTS.md:54includes latent-spec language: 2. Never force-push to main. This rewrites shared history and breaks collaborators. 3. Atomic commits only. Each commit should be one logical change. Don't mix unrelated changes. 4. Never use--forceor--force-with-leaseon shared branches.AGENTS.md:187includes latent-spec language: 4. Stop tasks when done - they'll auto-cleanup on Maestro exit, but explicit stops are cleaner 5. Use restart policies for resilient services - ideal for dev servers that should recover from crashes 6. Direct execution is safer - omitshellparameter for simple commands without pipesAGENTS.md:374includes latent-spec language: 4. Wire handler insrc/cli-tui/tui-renderer.ts: ```typescriptCLAUDE.md:31includes latent-spec language: Critical: Consult.github/workflows/(evals.yml,nx-ci.yml,release.yml) to mirror CI environments.Research Grounding
Repo axes: tooling, evaluation, security, desktop
Search keywords: run, maestro, use, dev, bun, never, servers, commands, test, build, command, npx
What To Build
Acceptance Criteria
Notes
evalops/maestrobyevalops_org_miner.py.main; this issue intentionally cites exact file paths/lines where the mining pass saw them.