English · 简体中文
👉 Active work: open issues · project board · roadmap — every task is tracked as an issue and labeled by phase, so contributors always see the current plan, not yesterday's.
Production-grade AI agent system for mathematical modeling competitions (MCM / ICM / CUMCM / 华数杯).
One problem statement → a full submission-ready paper (PDF / DOCX / TeX / Markdown) in ≈ 30 minutes. Five agents (Analyzer → Searcher → Modeler → Coder → Writer) drive a live Jupyter kernel, stream tokens through a Rust gateway, and produce a 15–20 page paper with 5+ award-grade figures (sensitivity tornado, Monte-Carlo box-plot, 2-D heatmap, convergence curve, residual diagnostic) that follows COMAP Outstanding / CUMCM 一等奖 conventions.
- Why Mathodology
- Features
- Quick start
- How it works
- Paper export — 4 formats × 3 templates
- Chart catalog — 20 canonical types
- Award-mode prompts
- Multi-engine web search (MCP)
- Benchmarks
- Configuration
- Architecture
- Developer setup
- Testing
- Roadmap & milestones
- Known limitations
- Contributing
- License
Math modeling competitions reward a narrow, non-obvious skill stack: restate an ambiguous problem crisply, pick a model that actually fits, run the solve with the right diagnostics, and write it up in the style judges skim-read. The 72-hour contest format compresses this into one very stressful weekend.
Off-the-shelf LLM chat interfaces don't cut it — they hallucinate citations, forget they already picked a model two turns ago, and produce walls of text without figures. Mathodology is built specifically for this workflow:
- First-principles model derivation, not boilerplate method lookups
- Live Jupyter kernel execution with reproducible numerical results
- Figures are first-class — every chart is saved, captioned, and linked by id
- Award-mode prompts internalize COMAP official tips + CUMCM 评阅规范 + 2016 MCM B judges' commentary
- Multi-format export — paper goes out as submission-ready PDF with real
\begin{figure}blocks, not screenshots
- 5-agent linear pipeline — Analyzer (scope + sub-questions + approaches) → Searcher (arXiv + web) → Modeler (ModelSpec) → Coder (Jupyter cells + figures) → Writer (structured paper draft)
- 3 competition paper templates —
cumcm(中文 ctexart · xelatex · Fandol fonts) ·huashu(华数杯) ·mcm(MCM/ICM English article) - 4 export formats — PDF via Tectonic · DOCX via Pandoc · raw LaTeX · Markdown
- 20 canonical chart types injected into the Coder prompt — tornado, heatmap, Monte-Carlo box-plot, Pareto front, convergence, residual, QQ, ROC, confusion matrix, network graph, radar, contour, 3D surface, etc.
- MCP web search via open-webSearch — Bing · Baidu · DuckDuckGo · CSDN · Juejin · Brave · Exa · Startpage, no API keys needed
- Robust LLM routing — OpenAI-compatible / Anthropic / Ollama / vLLM multi-protocol gateway with automatic fallback, per-call cost accounting, transport-error retry with exponential backoff
- Streaming everything — tokens fan out via Redis Streams + WebSocket; frontend renders KaTeX math and shiki-highlighted code in real time
- Deterministic figure pipeline — Writer uses
[[FIG:<id>]]placeholders; pipeline substitutes them against the Coder's registeredFigurelist so figure references never break - 197 tests — 45 gateway (Rust) + 197 worker (Python; catalog + MCP client + pipeline) + vue-tsc + vite build
Pick the path that matches your platform — each guide covers prerequisites, three install options (portable archive / native installer / source build), and troubleshooting.
| Platform | Guide | Native installer |
|---|---|---|
| macOS (Intel + Apple Silicon) | docs/install/macos.md | .pkg |
| Linux (Debian/Ubuntu/Fedora/Arch/Alpine; x86_64 + aarch64) | docs/install/linux.md | .deb |
| Windows 10/11 + Server 2022 | docs/install/windows.md | .msi |
| Production / Docker / systemd | docs/install/server.md | Multi-arch GHCR images |
Three-line version, if you already have a working dev environment:
git clone https://github.com/ymylive/mathodology.git && cd mathodology
cp .env.example .env # edit; add at least one *_API_KEY
./scripts/install.sh && just bootstrap && just migrate && just dev./scripts/install.sh (and its Windows twin scripts\install.ps1) auto-detects your package manager (brew / apt / dnf / pacman / zypper / apk / winget / scoop) and installs Postgres, Redis, Python 3.11, uv, pandoc, and tectonic. Run ./scripts/preflight.sh any time to verify everything is in place.
A typical CUMCM-style problem run takes about 28 minutes end-to-end and costs around ¥2 on a mid-tier reasoning model.
Every release ships portable archives (.tar.gz / .zip) for 5 targets — Linux x86_64 + aarch64, macOS x86_64 + aarch64, Windows x86_64 — plus native installers (.deb, .pkg, .msi) and multi-arch Docker images on GHCR. See the latest release and docs/install/server.md for production deployment.
┌─────────┐ ProblemInput ┌──────────┐ AgentEvent stream ┌─────────┐
│ Web UI │ ──────────────▶ │ Gateway │ ◀──────────────────── │ Worker │
│ (Vue) │ │ (Rust) │ XADD mm:events:<run> │(Python) │
│ │ ◀── WS replay ─ │ │ │ │
└─────────┘ └──────────┘ └─────────┘
│ │
│ XADD mm:jobs │ spawns
▼ ▼
┌──────────┐ ┌──────────────┐
│ Redis │ │ Jupyter │
│ Streams │ │ kernel │
└──────────┘ └──────────────┘
stage.start(analyzer) → token*N → cost → agent.output(AnalyzerOutput) → stage.done
stage.start(searcher) → log(queries) → log(papers) → agent.output(SearchFindings) → stage.done
stage.start(modeler) → token*N → cost → agent.output(ModelSpec) → stage.done
stage.start(coder) → token*N → cost → log("executing cell N")*7
→ kernel.stdout*M
→ agent.output(CoderOutput) → stage.done
stage.start(writer) → token*N → cost → agent.output(PaperDraft) → stage.done
done(status=success, notebook_path, paper_path, meta_path)
| Agent | Job | Output contract |
|---|---|---|
| Analyzer | Restate the problem, enumerate sub-questions, list assumptions, propose 2-6 candidate approaches, identify data requirements | AnalyzerOutput |
| Searcher | Derive 4-5 focused queries, hit arXiv + (via MCP) Bing/Baidu/DuckDuckGo/CSDN/Juejin in parallel, dedupe by URL/DOI, LLM-synthesize key findings | SearchFindings (papers, key_findings, datasets_mentioned) |
| Modeler | Consult HMML library of canonical methods, produce ONE fully-specified modeling approach with first-principles-derived equations, variables with units, numbered algorithm outline, validation strategy with dimensional / boundary / baseline checks, sensitivity plan for ≥ 3 key parameters | ModelSpec |
| Coder | Iterate up to 7 turns in a persistent Jupyter kernel; each turn picks a chart type from the catalog, runs one focused cell, saves PNG+SVG figures, registers them in figures_saved |
CoderOutput (cells, figures, notebook_path, summary) |
| Writer | Produce a 12-section paper — abstract with ≥ 2 numerical results, Problem Analysis, Assumptions with Justification+Impact, Symbol table, Model, Algorithm, Results, Sensitivity Analysis, Strengths & Weaknesses, Conclusion, References ≥ 15 — embedding figures via [[FIG:<id>]] placeholders |
PaperDraft |
The gateway exposes GET /runs/:run_id/export/:format?template=<t> which produces:
| Format | Pipeline | Typical size |
|---|---|---|
pdf |
Tera template → LaTeX → Tectonic (xelatex) | 900 KB – 1.5 MB |
docx |
paper.md → Pandoc → .docx with embedded PNGs |
600-800 KB |
tex |
Rendered Tera template | 30-40 KB |
md |
Pre-substituted paper.md |
25-35 KB |
Templates (crates/gateway/templates/):
| Template | Class | Cover | CJK |
|---|---|---|---|
cumcm |
ctexart |
国赛摘要页 + 目录 | Fandol Song/Hei bundled in Tectonic |
huashu |
ctexart |
华数杯封面 + 页眉 | Fandol |
mcm |
article |
Summary sheet with Team Control Number / Problem / Year | ctex fallback for accidental Chinese |
Frontend side, the ExportPanel component shows a template selector + 5 format buttons (PDF / DOCX / LaTeX / Markdown / Notebook) with error mapping:
404—论文还未生成(run not finished)503—服务器缺少 tectonic/pandoc(binary not on PATH)500— displays ≤ 4 KB of the compile stderr
Error codes stay in the response body as a clean JSON { code, error }. Dev token auth via Authorization: Bearer <DEV_AUTH_TOKEN> or ?token=....
apps/agent-worker/src/agent_worker/chart_catalog.py ships 20 vetted chart types the Coder picks from. Every entry has: id (slug), display name (中英双语), when-to-use, when-NOT-to-use, 2-3 typical pitfalls, and a runnable matplotlib template 17-32 lines long.
Grouped by purpose:
| Purpose | Types |
|---|---|
| Distribution | histogram_kde, boxplot_grouped, violinplot |
| Correlation / sensitivity | heatmap_correlation, heatmap_sensitivity, tornado_sensitivity |
| Optimization landscape | contour_2d, surface_3d, pareto_front |
| Time series / trend | line_plot, line_with_ci, convergence_curve |
| Regression diagnostics | scatter_regression, residual_plot, qq_plot |
| Classification diagnostics | roc_curve, confusion_matrix |
| Category comparison | bar_grouped_stacked, radar_chart |
| Relational | network_graph |
The Coder's prompt carries only a compact markdown index (~1 KB / 20 rows). Snippets stay in the module — the LLM doesn't read them; the humans + future maintainers do.
Helpers styled_figure(), save_figure(fig, fig_id, caption, width), and annotate_peak(ax, x, y, label) are inlined into the Jupyter kernel bootstrap so they're globally available without imports.
Writer / Modeler / Coder prompts are rewritten around 19 actionable rules distilled from 16 authoritative sources — COMAP official MCM/ICM Procedures and Tips PDF, 2016 MCM Problem B judges' commentary, Pitt MCM Guide, CUMCM 赛区评阅工作规范, Tsinghua 清风数学建模 materials. See memory/project_award_mode.md for the full distillation.
Highlights:
- Abstract ≤ 1 page, must include ≥ 2 concrete numerical results in the first paragraph (e.g., "prediction error 3.2%", "fuel saving 17%"). No vague "significant improvement".
- Four-element abstract order: context → method → results (with units) → conclusion. Forbidden opening: restating the problem.
- No school / student names / region anywhere. Signatures in memo problems:
Sincerely, Team #<ID>. - Assumptions as numbered triples
{Assumption, Justification, Impact}— every assumption tied to literature / data / domain reasoning. - References ≥ 15, each inline-cited ≥ 1x. English sources preferred for MCM; GB/T 7714 for CUMCM.
- Mandatory sections: Sensitivity Analysis · Strengths & Weaknesses (≥ 3 strengths, ≥ 2 weaknesses — honest limitations, not "time constraints").
- Pipeline self-check at end: abstract ≤ 1 page + ≥ 2 numbers · no names · every sub-question answered · ≥ 15 references · Sensitivity + S&W present · every
[[FIG:xxx]]discussed in prose · no filler phrases.
- Must propose ≥ 2 candidate models with explicit selection criterion (AIC / BIC / CV error / robustness / interpretability for the decision-maker). No "effect was better".
- Prefer first-principles derivation from the problem's physics / economics / business logic. Escalating complexity (LP → MILP → DL) requires justifying why the simpler method is insufficient (Occam's razor guardrail).
validation_strategyMUST include all three of: dimensional analysis, boundary / limit-case test, baseline comparison.- Sensitivity plan: ≥ 3 key parameters × ±10% / ±20% + Monte Carlo with N ≥ 1000 when parameters are highly uncertain. Tornado plot + heatmap figures requested explicitly.
- Chart captions must be independently readable — include variable names AND a key number (
"残差直方图(均值 0.01,标准差 0.08,N=200)"). - Honor the Modeler's sensitivity plan: produce the requested tornado / heatmap / Monte-Carlo figures across iterations.
np.random.seed(...)fixed before any stochastic call; constants named at the top of the first cell. Every printed numerical result must be reproducible from the notebook alone.
Raise MAX_ITERATIONS to 7 lets the Coder spread analyses across turns — baseline, tornado, heatmap, Monte-Carlo, scan, convergence, diagnostic — instead of cramming all 5 figures into one cell.
Searcher fans out across sources in parallel:
| Source | Transport | Typical hit rate | Auth |
|---|---|---|---|
| arXiv | HTTP Atom API | High for methods / English papers | none |
| Bing / DuckDuckGo / Brave / Exa / Startpage | MCP stdio → open-webSearch | Medium | none (scraping) |
| Baidu / CSDN / Juejin | MCP stdio → open-webSearch | High for Chinese competition context | none |
For CUMCM / 华数杯 (Chinese competitions), the Searcher auto-generates an extra Chinese methodology query to hit Baidu / CSDN / Juejin — which cover 国赛论文精选 blog posts and solution walkthroughs that arXiv misses entirely.
Graceful degradation paths:
OPEN_WEBSEARCH_DISABLED=1→ disable MCP, fall back to arXiv-only- MCP subprocess spawn fails (Node missing, binary not in PATH) → log warning, return empty web results, arXiv continues
- Per-engine rate limit (captcha / 429) → that engine returns empty for the run, others continue
- Per-query timeout (30s) → drop that query, keep others
Shared _stream_with_retry helper covers transport errors (httpx.RemoteProtocolError, ReadTimeout, ConnectError, PoolTimeout) AND silent empty 200 OK responses with exponential backoff 5s / 15s / 30s across 4 attempts. Observed on cornna/gpt-5.4 upstream; retry logic prevents the Writer / Coder from failing on random upstream hiccups.
Representative run (bus-scheduling CUMCM problem, gpt-5.4 via cornna provider, reasoning=low):
| Stage | Duration | Notes |
|---|---|---|
| Analyzer | 132 s | 12 sub-questions, 5 proposed approaches |
| Searcher | 4 s | arXiv returned 0 (niche problem); MCP disabled in this run |
| Modeler | 312 s | 28 variables, 18 equations, 13-step algorithm, full sensitivity plan |
| Coder | 758 s | 7 cells, 5 figures (baseline / tornado / heatmap / MC box-plot / headway scan) |
| Writer | 501 s | 12 sections, 16 references, abstract with 4 numerical results |
| Total | 28 min 30 s | ¥2.22 |
Exported paper: 17 pages · 921 KB PDF · 5 figures embedded · 10 image streams. All section headings, equations (LaTeX), and references rendered in CUMCM template with Fandol CJK fonts via Tectonic.
Test suite baseline:
cargo test -p gateway— 45 passed, 3#[ignore](tectonic/pandoc real-compile, opt-in via-- --ignored)uv run pytest apps/agent-worker— 197 passed (figure pipeline + chart catalog + MCP stdio client)pnpm --filter web typecheck && build— vue-tsc clean, bundle 171 KB gzip (shiki ~282 KB gzip lazy-loaded)
# Gateway
GATEWAY_HOST=127.0.0.1
GATEWAY_PORT=8080
DEV_AUTH_TOKEN=dev-local-insecure-token
# Infra
REDIS_URL=redis://127.0.0.1:6379/0
DATABASE_URL=postgres://mm:mm@127.0.0.1:5432/mm
RUNS_DIR=/absolute/path/to/runs # MUST be absolute
# LLM provider keys (fill in at least one)
DEEPSEEK_API_KEY=
ANTHROPIC_API_KEY=
OPENAI_API_KEY=
MOONSHOT_API_KEY=
CORNNA_API_KEY=sk-...
# Web search (MCP)
OPEN_WEBSEARCH_CMD=open-websearch # or absolute path
OPEN_WEBSEARCH_ENGINES=bing,duckduckgo,baidu,csdn,juejin
OPEN_WEBSEARCH_DISABLED=false # set to true to skip web search entirely[[providers]]
name = "deepseek"
kind = "openai_compat"
base_url = "https://api.deepseek.com/v1"
api_key_env = "DEEPSEEK_API_KEY"
models = ["deepseek-chat", "deepseek-reasoner"]
price_input_per_1m = 1.0 # RMB per 1M tokens
price_output_per_1m = 2.0
[router]
default_model = "deepseek-chat"
fallback = ["deepseek-chat", "moonshot-v1-32k"]Per-run model override via ProblemInput.model_override — e.g., force claude-sonnet-4-6 for one particularly hard problem while the rest default to deepseek-chat.
math_agent/
├── apps/
│ ├── agent-worker/ # Python worker
│ │ └── src/agent_worker/
│ │ ├── agents/ # analyzer / searcher / modeler / coder / writer
│ │ ├── prompts/ # v1.toml per agent
│ │ ├── tools/ # arxiv client + web_search_mcp stdio client
│ │ ├── kernel/ # Jupyter session manager (inline chart helpers)
│ │ ├── chart_catalog.py # 20 canonical chart types
│ │ ├── _chart_helpers.py # styled_figure / save_figure / annotate_peak
│ │ ├── pipeline.py # 5-agent orchestration + paper.meta.json
│ │ └── main.py # XREADGROUP consumer on mm:jobs
│ └── web/ # Vue 3 SPA
│ └── src/
│ ├── views/ # Showcase / Dashboard / Workbench
│ ├── components/ # ExportPanel, PaperDraft, AgentOutputCard…
│ ├── stores/ # Pinia: run, settings
│ └── api/ # figures + export + runs clients
├── crates/
│ └── gateway/
│ ├── src/
│ │ ├── routes/ # runs · figures · export · llm · stats · ws_run
│ │ ├── llm/ # OpenAI-compat + Anthropic streaming adapters
│ │ ├── auth.rs # dev-token middleware
│ │ ├── cost.rs # per-call cost accounting
│ │ └── app.rs # axum router wiring
│ ├── templates/ # cumcm / huashu / mcm .tex.tera
│ └── tests/ # integration tests (including #[ignore] real-compile)
├── packages/
│ ├── contracts/ # OpenAPI + event JSON schema (source of truth)
│ ├── py-contracts/ # pydantic v2 codegen target
│ └── ts-contracts/ # TypeScript codegen target
├── config/
│ └── providers.toml # LLM router
├── docs/
│ ├── demo.gif # README hero
│ └── promo/ # polished promo HTML + MP4 sources
├── scripts/ # bootstrap + smoke_e2e.sh
├── justfile # task runner
└── Procfile.dev # overmind dev orchestration
| Layer | Format | Generated from |
|---|---|---|
| Event envelope (WebSocket) | AgentEvent { run_id, agent, kind, seq, ts, payload } |
packages/contracts/events.schema.json |
| LLM completion | OpenAI-compatible SSE | gateway translates from Anthropic / others |
| Agent I/O | Pydantic v2 (Python) + TS interface | packages/py-contracts/ + packages/ts-contracts/ |
paper.meta.json |
Structured paper + figure list with [[FIG:]] placeholders |
Writer emits, gateway export consumes |
One Redis INCR mm:seq:<run_id> per event, shared between gateway and worker. WS replay filters by payload.seq. XADD uses * (time-based id), seq is the authoritative monotonic per-run counter.
For end-user install (any of the 5 supported platforms / 4 install paths), see Quick start and the per-platform guides under docs/install/. This section covers source-build setup for contributors.
git clone https://github.com/ymylive/mathodology.git && cd mathodology
./scripts/install.sh --with-source # adds node + pnpm on top of runtime tools
cp .env.example .env # at least one *_API_KEY
just bootstrap # cargo fetch + uv sync + pnpm install
just migrate # sqlx migrations
just dev # overmind: gateway + worker + vite dev :5173scripts/install.sh auto-detects brew / apt / dnf / pacman / zypper / apk; on Windows use scripts\install.ps1 (winget / scoop). Both are idempotent and prompt before any sudo action — pass --yes / -Yes for CI. scripts/preflight.{sh,ps1} is the verifier: [OK] / [MISS] / [STALE] per tool with install hints inline.
Notes that bite if skipped:
- Tectonic downloads a ~200 MB TeXLive bundle into
~/Library/Caches/Tectonic(macOS) or~/.cache/Tectonic(Linux) on first PDF render. CI must cache this directory. - Pandoc + Tectonic are runtime dependencies, called as subprocesses from the gateway during paper export. The
.debdeclares them; the portable archive doesn't. just devuses overmind (Linux/macOS) which depends on tmux. On Windows, run the three processes in separate terminals — see docs/install/windows.md §4.
For docker-compose Redis + Postgres instead of system services: just infra-up. Tectonic and Pandoc are still required on the host because they run inside the gateway/worker process, not the infra containers.
just lint # cargo clippy + ruff + vue-tsc
just test # cargo test + pytest + vitest
just smoke # end-to-end ping through the live stack
# Subset runs:
cargo test -p gateway # 45 passed + 3 #[ignore]
cargo test -p gateway --test export_paper -- --ignored # real tectonic + pandoc compile
uv run pytest apps/agent-worker # 197 passed
uv run pytest apps/agent-worker -m mcp # real open-websearch subprocess (network)
pnpm --filter web typecheck && pnpm --filter web buildCI runs the equivalent on every PR — see .github/workflows/ci.yml.
| Phase | Status | Issues | Summary |
|---|---|---|---|
| Phase 1 — MVP | ✅ Shipped (M1–M8) | predates tracker | 4-agent linear pipeline, gateway, streaming UI, local Jupyter |
| Phase 2 — Knowledge base | ✅ Shipped (M9–M11) | #1 · #2 · #3 | HMML method library + BM25 retrieval · Searcher agent · hybrid search |
| Phase 3 — Award-grade output | ✅ Shipped (v0.3.0) | #4 · #5 | Award-mode prompts · 20-type chart catalog · MAX_ITER=7 · transport retry |
| Phase 3.5 — Export + MCP | ✅ Shipped (v0.3.0) | #6 · #7 · #8 | 4 formats × 3 templates · Tectonic + Pandoc · open-webSearch MCP |
| Phase 4 — Critic loop | 🟡 In branch | #9 | Multi-role Critic · score/checklist thresholds · 2-round self-refine · Coder rerun cap |
| Phase 5 — Productionization | 📋 Planned | #10 · #11 · #12 | E2B / Daytona cloud sandbox · multi-tenant JWT · usage metering |
| Phase 6 — Vision + multi-lang | 📋 Planned | #13 · #14 · #15 | GPT-4V for chart QA · R + MATLAB kernels · HITL review gates |
| Milestone | Commit | Summary |
|---|---|---|
| M1 | 97c1609 |
Monorepo skeleton, hello-world ping |
| M2 | 926fc5f |
Gateway + Postgres persistence + run lifecycle |
| M3 | f7aca8f |
Full LLM gateway + streaming UI |
| M4 | 8151668 |
BaseAgent + Analyzer calling real LLM gateway |
| M5 | 98d3e1e |
Jupyter kernel + CoderAgent + figure artifact serving |
| M6 | c578b66 |
Modeler + Writer → full 4-agent pipeline |
| M7 | 91833ba |
shadcn-vue + KaTeX + shiki UI polish |
| M8 | 3d21ad0 |
Anthropic adapter + provider fallback + CI |
| M9 | 7ec8e50 |
HMML knowledge base + Modeler integration |
| M10 | 4ebf7b4 |
Searcher agent + 5-agent pipeline |
| M11 | b7e8b08 |
Hybrid BM25 + vector retrieval via fastembed |
| v0.2.0 | 10a06c6 |
Editorial UI rebuild + reasoning effort + long context |
| v0.3.0 | b18df8d |
Competition paper export + award-mode pipeline + MCP search |
- LLM API keys are loaded from env. Put at least one of DeepSeek / Anthropic / OpenAI / Moonshot / cornna keys in
.env(see.env.example). Absent keys, pipelines fail at the first real LLM call. RUNS_DIRmust be an absolute path when the worker and gateway are launched from different working directories, otherwise figure URLs (/runs/:id/figures/...) won't resolve. The default./runsonly works when both processes share cwd — whichjust dev(overmind) guarantees.- Tectonic + Pandoc required for PDF/DOCX export. PDF/DOCX export returns HTTP 503 with a clear message if either binary is missing. TeX and Markdown export work without them.
- open-webSearch is a scraper, not an API. Rate limiting and captcha from Bing/Baidu are regular. Per-engine failures are isolated — others keep working. Set
OPEN_WEBSEARCH_DISABLED=1to skip it entirely. - Rust 1.83 is pinned via
rust-toolchain.toml. Several crates inCargo.lockare held to 1.83-compatible versions; bumping the toolchain is an explicit future task. - Contract codegen (
just gen) requires network.datamodel-code-generatoris not inuv.lock; it is installed ad-hoc. Thecontracts-driftCI job is non-blocking. - Upstream LLM flakiness. Some OpenAI-compat proxies occasionally return empty 200 OK responses mid-run. The
_stream_with_retryhelper (base.py) retries on this with 5s / 15s / 30s backoff, but a persistently unavailable provider will still cause the run to fail after 4 attempts.
See CONTRIBUTING.md. Short version: feature branch → PR to main → CI green → squash merge. All new agents / prompts / chart types should have corresponding unit tests; real-external-service tests go behind #[ignore] (Rust) or @pytest.mark.slow|mcp (Python).
MIT (core). See LICENSE. Commercial-grade modules (specialized templates, enterprise HITL, multi-tenant billing) may be relicensed under a commercial license in Phase 5+.
