Make provenance and evidence traceability first-class for maestro (maestro multi-model coding tui web)

## Summary

Carry source, decision, and output provenance through the main workflow so downstream agents can audit and cite it.

This issue was generated from an org-wide EvalOps mining pass on 2026-05-10 07:57 UTC. It combines live GitHub repo signals with a per-repo arXiv search. Treat the research links as grounding for a concrete implementation, not as a request for a literature review.

## Repo Evidence

- Repository description: Maestro — multi-model coding agent with TUI, web, IDE, Slack, and GitHub interfaces
- Tree signals: 79 docs files, 19 workflows, 1 proto files, 737 test-like files.
- `AGENTS.md:31` includes latent-spec language: **Critical:** Consult `.github/workflows/` (`evals.yml`, `nx-ci.yml`, `release.yml`) to mirror CI environments.
- `AGENTS.md:36` includes latent-spec language: * **Full Test Suite:** `npx nx run maestro:test --skip-nx-cache` (Builds `tui` + `maestro-web` automatically). Run after every code change. * **Linting:** `bun run bun:lint` (Biome + Eval Verifier). Run after every code change. * **Runtime Commands:** Avoid long-lived `dev`/watch servers (e.g., `npm run dev`) unless th
- `AGENTS.md:54` includes latent-spec language: 2. **Never force-push to main.** This rewrites shared history and breaks collaborators. 3. **Atomic commits only.** Each commit should be one logical change. Don't mix unrelated changes. 4. **Never use `--force` or `--force-with-lease` on shared branches.**
- `AGENTS.md:187` includes latent-spec language: 4. **Stop tasks when done** - they'll auto-cleanup on Maestro exit, but explicit stops are cleaner 5. **Use restart policies for resilient services** - ideal for dev servers that should recover from crashes 6. **Direct execution is safer** - omit `shell` parameter for simple commands without pipes
- `AGENTS.md:374` includes latent-spec language: **4. Wire handler** in `src/cli-tui/tui-renderer.ts`: ```typescript
- `CLAUDE.md:31` includes latent-spec language: **Critical:** Consult `.github/workflows/` (`evals.yml`, `nx-ci.yml`, `release.yml`) to mirror CI environments.

## Research Grounding

Repo axes: tooling, evaluation, security, desktop

Search keywords: run, maestro, use, dev, bun, never, servers, commands, test, build, command, npx

- [arXiv:2508.07575v1](https://arxiv.org/abs/2508.07575v1) MCPToolBench++: A Large Scale AI Agent Model Context Protocol MCP Tool Use Benchmark (Shiqing Fan, Xichen Ding, Liang Zhang, Linjian Mo), 2025.
- [arXiv:2603.24943v1](https://arxiv.org/abs/2603.24943v1) FinMCP-Bench: Benchmarking LLM Agents for Real-World Financial Tool Use under the Model Context Protocol (Jie Zhu, Yimin Tian, Boyang Li, Kehao Wu, Zhongzhi Liang, Junhui Li), 2026.
- [arXiv:2508.12566v1](https://arxiv.org/abs/2508.12566v1) Help or Hurdle? Rethinking Model Context Protocol-Augmented Large Language Models (Wei Song, Haonan Zhong, Ziqi Ding, Jingling Xue, Yuekang Li), 2025.
- [arXiv:2602.01129v1](https://arxiv.org/abs/2602.01129v1) SMCP: Secure Model Context Protocol (Xinyi Hou, Shenao Wang, Yifan Zhang, Ziluo Xue, Yanjie Zhao, Cai Fu), 2026.
- [arXiv:2603.00123v1](https://arxiv.org/abs/2603.00123v1) CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers (Yannian Gu, Xizhuo Zhang, Linjie Mu, Yongrui Yu, Zhongzhen Huang, Shaoting Zhang), 2026.
- [arXiv:2507.19570v1](https://arxiv.org/abs/2507.19570v1) MCP4EDA: LLM-Powered Model Context Protocol RTL-to-GDSII Automation with Backend Aware Synthesis Optimization (Yiting Wang, Wanghao Ye, Yexiao He, Yiran Chen, Gang Qu, Ang Li), 2025.
- [arXiv:2604.13849v1](https://arxiv.org/abs/2604.13849v1) MCPThreatHive: Automated Threat Intelligence for Model Context Protocol Ecosystems (Yi Ting Shen, Kentaroh Toyoda, Alex Leung), 2026.
- [arXiv:2506.14683v2](https://arxiv.org/abs/2506.14683v2) Unified Software Engineering Agent as AI Software Engineer (Leonhard Applis, Yuntong Zhang, Shanchao Liang, Nan Jiang, Lin Tan, Abhik Roychoudhury), 2025.
- [arXiv:2503.23803v2](https://arxiv.org/abs/2503.23803v2) Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute (Yingwei Ma, Yongbin Li, Yihong Dong, Xue Jiang, Rongyu Cao, Jue Chen), 2025.
- [arXiv:2506.19998v1](https://arxiv.org/abs/2506.19998v1) Doc2Agent: Scalable Generation of Tool-Using Agents from API Documentation (Xinyi Ni, Haonan Jian, Qiuyang Wang, Vedanshi Chetan Shah, Pengyu Hong), 2025.

## What To Build

- Add stable identifiers for source records, derived decisions, and emitted outputs.
- Thread those identifiers through logs/events/API responses without leaking secrets.
- Provide a query or debug surface that reconstructs the chain for one completed workflow.

## Acceptance Criteria

- [ ] A short design note names the repo-specific workflow, threat or correctness model, and the research assumptions being adopted.
- [ ] A runnable check, fixture, or verifier exercises the new contract in CI or an equivalent local command documented in the repo.
- [ ] The implementation emits or stores enough evidence for a downstream agent/operator to cite inputs, decisions, and outputs.
- [ ] At least one negative/degraded-mode case is covered so failures are observable rather than silently accepted.
- [ ] Documentation links the new behavior to the relevant EvalOps platform primitive or explicitly records why this repo remains standalone.

## Notes

- Generated issue 2/5 for `evalops/maestro` by `evalops_org_miner.py`.
- Before implementation, confirm the sampled latent-spec snippets still match `main`; this issue intentionally cites exact file paths/lines where the mining pass saw them.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make provenance and evidence traceability first-class for maestro (maestro multi-model coding tui web) #384

Summary

Repo Evidence

Research Grounding

What To Build

Acceptance Criteria

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Make provenance and evidence traceability first-class for maestro (maestro multi-model coding tui web) #384

Description

Summary

Repo Evidence

Research Grounding

What To Build

Acceptance Criteria

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions