Skip to content

feat: Fix subagent responses#740

Merged
ncrispino merged 14 commits intodev/v0.1.34from
fix_subagent_responses
Jan 5, 2026
Merged

feat: Fix subagent responses#740
ncrispino merged 14 commits intodev/v0.1.34from
fix_subagent_responses

Conversation

@ncrispino
Copy link
Collaborator

@ncrispino ncrispino commented Jan 3, 2026

PR: Recover partial persona output on timeout

Linear Issue

  • Closes MAS-206: [FEATURE] Use partial persona output on timeout instead of generic fallback

Summary

When persona generation subagent times out, the system now checks for valid personas.json files produced before the timeout instead of immediately falling back to generic personas.

Problem

The persona generation feature uses a 5-minute timeout. When diversity_mode: implementation triggers a nested multi-agent orchestration, this often exceeds the timeout. Previously, the code only checked for output files when result.success was True, causing the system to fall back to generic personas (analytical/creative/systematic) even when valid task-specific personas had been generated.

Changes

massgen/persona_generator.py

  1. Check output files regardless of success status (lines 615-659)

    • Now checks for personas.json even when subagent fails/times out
    • Logs when recovering personas from partial output vs successful completion
  2. Enhanced _find_personas_json search patterns (lines 452-469)

    • Searches multiple locations where personas might exist:
      • full_logs/final/agent_*/workspace/ - completed workspaces
      • full_logs/agent_*/*/*/ - timestamped run directories
      • workspace/snapshots/agent_*/ - snapshot locations
      • workspace/agent_*/ - direct workspace directories
      • workspace/temp/agent_*/agent*/ - temp directories from nested agents
    • Sorts by modification time (most recent first)
    • Returns first complete set of personas found

Testing

  • Verified fix against actual log directory log_20260102_120203_588712 that had the timeout issue
  • Successfully recovered task-specific personas that were previously discarded
  • All existing tests pass (pytest massgen/tests/ -k persona)

Before/After

Before (generic fallback):

Generated persona for agent_a: analytical
Generated persona for agent_b: creative
Generated persona for agent_c: systematic

After (recovered task-specific):

Recovered 3 personas from partial output (subagent status: Subagent cancelled)
agent_a: "Technical Performance Utility"
agent_b: "Narrative-Driven Experience"
agent_c: "Collaborative Ecosystem Hub"

PR: Fix subagent cancellation recovery and status consolidation

Linear Issues

  • Closes MAS-207: [BUG] Cancelled/timed-out subagents return null answers even when work was completed

Summary

This PR fixes subagent timeout handling to properly recover completed work and consolidates the status.json files to a single source of truth.

Problems Fixed

  1. Status overwriting bug: After successfully recovering an answer and costs from the subagent's internal full_logs/status.json, the code was overwriting this with hardcoded "timeout" or "failed" status
  2. Empty token_usage: Recovered cost data was not being returned
  3. Dual status.json confusion: Two separate status files caused sync issues and data inconsistency
  4. Hardcoded timeout bounds: Users couldn't configure shorter timeouts

Changes

Status File Consolidation

Removed the outer status.json file (subagents/{id}/status.json). Now there is a single source of truth: full_logs/status.json written by the subagent's Orchestrator.

Before:

subagents/{id}/
├── status.json           # Written by SubagentManager (often stale/wrong)
└── full_logs/
    └── status.json       # Written by Orchestrator (authoritative)

After:

subagents/{id}/
├── conversation.jsonl
└── full_logs/
    └── status.json       # Single source of truth

massgen/subagent/manager.py

  1. Removed _write_status() method - No longer writing outer status.json
  2. Updated get_subagent_status() - Now reads from full_logs/status.json
  3. Added _transform_orchestrator_status() - Transforms rich Orchestrator status into simplified view for MCP
  4. Added configurable timeout bounds - min_timeout and max_timeout parameters
  5. Added _clamp_timeout() method - Consistent timeout clamping
  6. Fixed answer extraction - Check full_logs/{agentId}/{timestamp}/answer.txt for persisted answers

massgen/subagent/models.py

  • Changed SUBAGENT_MIN_TIMEOUT from 300 to 60 (now a default, not hardcoded)
  • Removed timeout clamping from SubagentConfig (moved to manager level)

massgen/agent_config.py

  • Added subagent_min_timeout and subagent_max_timeout to CoordinationConfig

massgen/orchestrator.py

  • Pass subagent_min_timeout and subagent_max_timeout to subagent MCP server

massgen/mcp_tools/subagent/_subagent_mcp_server.py

  • Accept --min-timeout and --max-timeout CLI arguments
  • Commented out timeout_seconds parameter from spawn_subagents tool

Documentation

  • Updated docs/source/user_guide/advanced/subagents.rst with new directory structure and status values

Specs

  • openspec/changes/fix-subagent-cancellation-recovery/specs/subagent-orchestration/spec.md - Recovery requirements
  • openspec/changes/fix-subagent-cancellation-recovery/specs/subagent-status-consolidation/spec.md - Status consolidation

Testing

  • All 22 subagent recovery tests pass
  • Verified consolidation works with real log directories

Status Values

Status success Description
completed true Normal completion
completed_but_timeout true Timed out but full answer recovered
partial false Partial work recovered (no winner selected)
timeout false No recoverable work
error false Exception occurred

Before/After

Before (bug):

{
  "status": "failed",
  "token_usage": {},
  "answer": null
}

After (fixed):

{
  "status": "completed_but_timeout",
  "token_usage": {"input_tokens": 50000, "output_tokens": 3000, "estimated_cost": 0.05},
  "answer": "Research completed. Created movies.md with...",
  "completion_percentage": 100
}

PR: Release Automation and CodeRabbit Integration

Closes MAS-195

Summary

Adds automated release workflows, CodeRabbit AI code review configuration, and a release-prep skill to streamline the release process.

Changes

CodeRabbit Configuration

  • .coderabbit.yaml - AI code review configuration with:
    • Path-specific review instructions for backends, configs, docs
    • Pre-merge checks for documentation updates on release branches (dev/v0.1.X)
    • Documentation consistency and quality checks

Release Automation

  • .github/workflows/pypi-publish.yml - New workflow triggered on GitHub Release publish:

    • Builds and publishes package to PyPI
    • Triggers ReadTheDocs documentation build
    • Posts summary with PyPI URL, docs URL, install command
  • .github/workflows/release-docs-automation.yml - Modified to validate announcement files exist on tag push

Announcement System

  • docs/announcements/README.md - Documents the announcement workflow
  • docs/announcements/feature-highlights.md - Long-lived feature list (~3k chars for LinkedIn)
  • docs/announcements/current-release.md - Template for active release announcement
  • docs/announcements/archive/ - Directory for past announcements

Release Prep Skill

  • massgen/skills/release-prep/SKILL.md - Claude Code skill that:
    • Archives previous announcement
    • Generates CHANGELOG entry draft
    • Creates release announcement
    • Validates documentation updates
    • Checks LinkedIn character count (~3000 limit)
    • Suggests screenshot/media based on release changes

Documentation Updates

  • CLAUDE.md - Added release workflow section, now tracked in git
  • CONTRIBUTING.md - Added sections for:
    • CodeRabbit AI reviews and commands
    • Claude Code integration (skills, key files)
    • Release process (automated workflows, maintainer checklist, announcement files)
  • .gitignore - Removed CLAUDE.md from ignore list

Files Changed

File Change
.coderabbit.yaml New
.github/workflows/pypi-publish.yml New
.github/workflows/release-docs-automation.yml Modified
.gitignore Modified
CLAUDE.md New (was ignored)
CONTRIBUTING.md Modified
docs/announcements/README.md New
docs/announcements/feature-highlights.md New
docs/announcements/current-release.md New
docs/announcements/archive/.gitkeep New
massgen/skills/release-prep/SKILL.md New

Secrets Required

Secret Purpose
PYPI_API_TOKEN PyPI publishing (repo secret)
READTHEDOCS_API_TOKEN Trigger docs build (repo secret)

Usage

Release Workflow

  1. Merge release PR to main
  2. Run release-prep v0.1.X - generates CHANGELOG, announcement, screenshot suggestions
  3. Review and commit generated files
  4. Create tag: git tag v0.1.X && git push origin v0.1.X
  5. Publish GitHub Release → triggers PyPI + ReadTheDocs automatically
  6. Post to LinkedIn/X with suggested screenshot

CodeRabbit Commands

In PR comments:

  • @coderabbitai review - Trigger incremental review
  • @coderabbitai resolve - Mark all comments as resolved
  • `

Summary by CodeRabbit

  • New Features

    • Unified Task Context for tools/agents (CONTEXT.md) and contextual prompts; richer subagent statuses and completion metadata; release-prep skill and announcement workflow.
  • Bug Fixes

    • Improved subagent cancellation/timeout recovery to surface recovered or partial work; safer compression/retry behavior to avoid leaking prior identifiers.
  • Documentation

    • New announcement/feature highlight pages, contributor guides, OpenSpec docs, and CLAUDE guidance.
  • Tests

    • Extensive tests covering subagent cancellation and recovery flows.
  • Chores

    • Automated publish and docs validation workflows; precommit wrapper scripts and AI-review config.

✏️ Tip: You can customize this high-level summary in your review settings.

` - Regenerate PR summary

PR: Unified Task Context for Tools and Subagents

Linear Issue

  • Closes MAS-208: [FEATURE] Custom tools lack project context, causing hallucinations

Summary

Custom tools (read_media, generate_media) and subagents making external API calls now receive task context, preventing hallucinations like GPT-4.1 interpreting "MassGen" as "Massachusetts General Hospital."

Problem

External API calls from multimodal tools and subagents had no project context. When analyzing images or generating media for a "MassGen website," external models would misinterpret terminology because they lacked context about what the user was building.

Solution

Main agents dynamically create a CONTEXT.md file before spawning subagents or using multimodal tools. This file is automatically:

  1. Read by multimodal tools and injected into external API calls
  2. Copied to subagent workspaces for their tools to use
  3. Included in subagent system prompts

Changes

New Files

File Purpose
massgen/context/__init__.py Module init, exports all context utilities
massgen/context/task_context.py load_task_context(), load_task_context_with_warning(), format_prompt_with_context(), TaskContextResult class
openspec/changes/add-unified-project-context/ OpenSpec proposal, tasks, design, spec

Modified Files

File Change
massgen/system_prompt_sections.py Added TaskContextSection with instructions for creating CONTEXT.md
massgen/system_message_builder.py Include TaskContextSection when multimodal tools or subagents enabled
massgen/backend/base_with_custom_tool_and_mcp.py Added task_context field to ExecutionContext
massgen/tool/_multimodal_tools/read_media.py Dynamic CONTEXT.md loading, fails if missing, truncation warning in response
massgen/tool/_multimodal_tools/understand_image.py Inject context into OpenAI API call
massgen/tool/_multimodal_tools/understand_audio.py Inject context into API calls
massgen/tool/_multimodal_tools/understand_video.py Inject context into all backend API calls
massgen/tool/_multimodal_tools/understand_file.py Inject context into API call
massgen/tool/_multimodal_tools/generation/generate_media.py Dynamic loading, context injection, fails if missing, truncation warning in response
massgen/subagent/manager.py Auto-copy CONTEXT.md to subagent workspaces, include in system prompt, uses shared load_task_context(), propagate truncation warning to SubagentResult
massgen/subagent/models.py Added warning field to SubagentResult, updated all factory methods to accept warning parameter
massgen/tests/test_generate_media_batch.py Updated fixtures to include CONTEXT.md for tests

Subagent System Prompt Enhancement

Added SUBAGENT RELIABILITY section to main agent's subagent delegation instructions:

  • Emphasizes subagents are best for research and exploration
  • Clarifies outputs are raw materials the main agent should refine
  • Sets expectation to fix broken/incomplete work rather than reporting failure

Key Design Decisions

  1. Agent-created, not user-created - Main agent creates CONTEXT.md dynamically as part of workflow
  2. Dynamic loading - Tools reload CONTEXT.md at execution time (fixes timing bug where agent creates file after backend starts)
  3. Require CONTEXT.md - Tools fail with clear error if missing (forces agent to create it)
  4. Auto-copy for subagents - CONTEXT.md automatically copied to subagent workspace
  5. Read-only for subagents - Subagents can read but not create/modify CONTEXT.md
  6. Truncation warning - If CONTEXT.md exceeds 10,000 chars, tools include a warning in the response so agents know to shorten it
  7. Warning visibility for subagents - Truncation warning propagated to SubagentResult so parent agents can see it (not just logged)

Context Injection Format

# For understanding and generation tools
if task_context:
    augmented_prompt = f"[Task Context]\n{task_context}\n\n[Request]\n{prompt}"

Example CONTEXT.md (agent-generated)

# Task Context

Building a Bob Dylan tribute website. The site includes a biography,
discography, and timeline of his career.

## Visual Style
- Artistic, vintage, and cinematic.
- Capturing different eras (1960s folk, 1970s rock, modern era).
- High-quality, poetic aesthetic.

Testing

Verified with real MassGen runs:

  • Agent creates CONTEXT.md early in workflow
  • [TaskContext] Loaded 441 chars from .../CONTEXT.md - dynamic loading works
  • generate_media succeeds with "success": true
  • Subagents receive context and produce correctly-grounded outputs

Before/After

Before (hallucination):

generate_media for "MassGen logo" → Image of Massachusetts General Hospital logo

After (grounded):

generate_media for "MassGen logo" → Image of multi-agent AI system logo with terminal aesthetic

@coderabbitai
Copy link

coderabbitai bot commented Jan 3, 2026

📝 Walkthrough

Walkthrough

Adds unified task-context (CONTEXT.md) propagation to ExecutionContext and multimodal tools, implements subagent cancellation recovery with configurable min/max timeout clamping and richer SubagentResult semantics, adds CI/workflows and scripts, and introduces extensive docs, specs, and tests supporting these features.

Changes

Cohort / File(s) Summary
CI / Automation
/.coderabbit.yaml, .github/workflows/pypi-publish.yml, .github/workflows/release-docs-automation.yml
New CodeRabbit config; PyPI publish workflow; release-docs workflow refactored to validation-only with aggregated warnings/errors and reduced write permissions.
Repo config & scripts
/.gitignore, scripts/precommit_sync_readme_wrapper.sh, scripts/precommit_validate_configs_wrapper.sh
.gitignore updated (track CLAUDE.md, ignore node_modules/); added CI-aware wrapper scripts for precommit tools.
Docs / Announcements / OpenSpec
CLAUDE.md, CONTRIBUTING.md, docs/announcements/*, openspec/**, massgen/skills/release-prep/SKILL.md, openspec/project.md
Large documentation additions: release-announcements workflow, contributor guidance, OpenSpec design/proposals/specs (task-context, subagent recovery), release-prep and PR-check skills.
Task Context infra
massgen/backend/base_with_custom_tool_and_mcp.py, massgen/system_prompt_sections.py, massgen/system_message_builder.py, openspec/changes/...
ExecutionContext gains task_context; new TaskContextSection added (duplicate defs present); stream_with_tools loads/passes task_context; docs/specs added.
Multimodal tools — context propagation
massgen/tool/_multimodal_tools/generation/generate_media.py, .../read_media.py, .../understand_*.py
Added task_context param across generation/understand tools, load/validate CONTEXT.md, augment prompts with context, thread warnings into responses; read_media enforces CONTEXT.md presence.
Subagent models & config
massgen/agent_config.py, massgen/subagent/models.py
New CoordinationConfig fields subagent_min_timeout/subagent_max_timeout; SubagentResult extended with completion_percentage, warning, and status literals completed_but_timeout/partial; factory for timeout-with-recovery.
Subagent manager & orchestrator
massgen/subagent/manager.py, massgen/orchestrator.py, massgen/mcp_tools/subagent/_subagent_mcp_server.py
Timeout clamping to [min,max]; CONTEXT.md auto-copy and context_warning threading; recovery logic reads orchestrator full_logs/status.json and workspace to reconstruct recovered/partial results; MCP server adds CLI flags --min-timeout/--max-timeout, removes timeout_seconds from spawn_subagents, and expands result schema.
Persona generation
massgen/persona_generator.py
Robust persona discovery with multiple search paths, mtime-based selection, and fallbacks to subagent outputs.
Backend response / compression retry
massgen/backend/response.py, massgen/backend/base_with_custom_tool_and_mcp.py
Compression-retry refactor: compute compressed input_count, avoid passing previous_response_id on compression retries, and thread task_context into ExecutionContext.
Tests
massgen/tests/test_subagent_cancellation_recovery.py
Comprehensive tests for cancellation-recovery: new statuses, workspace parsing, answer extraction, token usage, and recovered-timeout behaviors.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Agent as Main Agent
    participant SysMsg as System Message Builder
    participant Exec as ExecutionContext
    participant Context as CONTEXT.md Loader
    participant Tool as Multimodal Tool
    participant API as External API

    Agent->>SysMsg: build_coordination_message()
    SysMsg->>SysMsg: add TaskContextSection if tools/subagents enabled
    Agent->>Exec: stream_with_tools()
    Exec->>Context: load_task_context_with_warning()
    Context-->>Exec: task_context + optional warning
    Exec->>Tool: call tool with task_context
    Tool->>Tool: format_prompt_with_context(task_context)
    Tool->>API: invoke external API with augmented prompt
    API-->>Tool: response
    Tool-->>Agent: result (+ context warning if present)
Loading
sequenceDiagram
    autonumber
    participant Parent as Parent Agent
    participant MCP as Subagent MCP Server
    participant SubMgr as SubagentManager
    participant Orch as Orchestrator
    participant WS as Workspace / full_logs/status.json

    Parent->>MCP: spawn_subagents() (no timeout_seconds)
    MCP->>SubMgr: start with configured min/max timeouts
    SubMgr->>SubMgr: _clamp_timeout(requested)
    SubMgr->>Orch: run subagent with clamped timeout
    Orch->>WS: write full_logs/status.json (phase, costs, outputs)
    Note over Orch,SubMgr: timeout occurs
    SubMgr->>WS: read full_logs/status.json + workspace artifacts
    SubMgr->>SubMgr: extract recovered_answer, completion_percentage, token_usage
    SubMgr-->>MCP: SubagentResult(status, workspace_path, completion_percentage, token_usage, warning?)
    MCP-->>Parent: deliver recovered result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related issues

Possibly related PRs

Suggested reviewers

  • a5507203

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 2 inconclusive)
Check name Status Explanation Resolution
Documentation Updated ⚠️ Warning PR lacks documentation for Task Context feature and breaking timeout_seconds change; CLAUDE.md contains errors referencing non-existent script and incorrect skill directory name. Add user guide for CONTEXT.md with multimodal/subagent usage; create migration guide for timeout changes; fix CLAUDE.md script reference and skill directory name.
Title check ❓ Inconclusive The PR title is vague and generic. 'Fix subagent responses' is overly broad and does not clearly convey the main changes: persona recovery on timeout, subagent status consolidation, task context propagation for tools/subagents, and release automation. Consider a more descriptive title that captures the primary changes, such as 'fix: recover partial subagent output and consolidate status handling' or 'feat: add task context propagation and release automation' to better reflect the scope.
Config Parameter Sync ❓ Inconclusive Referenced functions get_base_excluded_config_params() do not exist in specified files; cannot validate new config parameters without implementation details. Clarify validation mechanism: provide actual function names/locations, specify which parameters require synchronization and should be excluded from APIs, or update check instructions.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed The PR description is comprehensive and follows the template structure with clear sections covering all three major features (persona recovery, subagent status consolidation, task context propagation) plus release automation. It includes problem statements, changes, testing evidence, and before/after examples.
Docstring Coverage ✅ Passed Docstring coverage is 91.86% which is sufficient. The required threshold is 80.00%.
Capabilities Registry Check ✅ Passed No new backend or model changes requiring capabilities registry updates were detected in this PR.
✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@ncrispino ncrispino marked this pull request as ready for review January 5, 2026 00:30
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
massgen/tool/_multimodal_tools/understand_file.py (1)

203-283: Document the new task_context parameter in the docstring.

The task_context parameter was added to the function signature but is not documented in the docstring's Args section. This is inconsistent with understand_video.py (lines 480-482), which properly documents this parameter.

🔎 Suggested docstring addition

Add the following to the Args section of the docstring (after agent_cwd):

         agent_cwd: Agent's current working directory (automatically injected, optional)
+        task_context: Context string used to augment the prompt with task-specific information (Optional[str])
+                     - When provided, prepends contextual information to help the model better understand
+                       task-specific terminology and requirements.
+                     - If None (default), no context-based augmentation is applied.
massgen/tool/_multimodal_tools/generation/generate_media.py (1)

148-167: Image-to-image path drops CONTEXT.md truncation warning from generate_media

You now surface context_warning in the main generate_media responses (batch/single), but the image‑to‑image helper _generate_single_with_input_images doesn’t receive or emit that warning. As a result, callers using input_images never see truncation or context issues even though the prompt is augmented.

To keep behavior consistent with other paths and the spec (“warning” field should be visible on tool responses), propagate context_warning into this helper and include it in the JSON payload when present.

Concrete patch sketch
 async def _generate_single_with_input_images(
@@
-    timestamp: str,
-    ext: str,
-    task_context: Optional[str] = None,
+    timestamp: str,
+    ext: str,
+    task_context: Optional[str] = None,
+    context_warning: Optional[str] = None,
@@
-        if result.success:
-            metadata = dict(result.metadata or {})
-            if input_image_paths:
-                metadata["input_image_paths"] = input_image_paths
-
-            return ExecutionResult(
-                output_blocks=[
-                    TextContent(
-                        data=json.dumps(
-                            {
-                                "success": True,
-                                "operation": "generate_media",
-                                "mode": mode,
-                                "file_path": str(result.output_path),
-                                "filename": result.output_path.name if result.output_path else None,
-                                "backend": result.backend_name,
-                                "model": result.model_used,
-                                "file_size": result.file_size_bytes,
-                                "duration_seconds": result.duration_seconds,
-                                "metadata": metadata,
-                            },
-                            indent=2,
-                        ),
-                    ),
-                ],
-            )
+        if result.success:
+            metadata = dict(result.metadata or {})
+            if input_image_paths:
+                metadata["input_image_paths"] = input_image_paths
+
+            response_data = {
+                "success": True,
+                "operation": "generate_media",
+                "mode": mode,
+                "file_path": str(result.output_path),
+                "filename": result.output_path.name if result.output_path else None,
+                "backend": result.backend_name,
+                "model": result.model_used,
+                "file_size": result.file_size_bytes,
+                "duration_seconds": result.duration_seconds,
+                "metadata": metadata,
+            }
+            if context_warning:
+                response_data["warning"] = context_warning
+
+            return ExecutionResult(
+                output_blocks=[
+                    TextContent(
+                        data=json.dumps(response_data, indent=2),
+                    ),
+                ],
+            )

And in generate_media, pass the warning through:

-        task_context, context_warning = load_task_context_with_warning(agent_cwd, task_context)
@@
         if not is_batch and input_images and media_type == MediaType.IMAGE:
             return await _generate_single_with_input_images(
@@
-                timestamp=timestamp,
-                ext=ext,
-                task_context=task_context,
+                timestamp=timestamp,
+                ext=ext,
+                task_context=task_context,
+                context_warning=context_warning,
             )

Also applies to: 201-220, 235-255

🧹 Nitpick comments (22)
openspec/AGENTS.md (3)

43-43: Fix markdown heading syntax—use ### instead of bold emphasis.

Lines 43, 293, 297, and 301 use bold text (**Workflow**, **"Change must..."**, etc.) where proper markdown headings (###) should be used. This improves document structure and rendering consistency.

🔎 Proposed fixes
- **Workflow**
+ ### Workflow

- **"Change must have at least one delta"**
+ ### "Change must have at least one delta"

- **"Requirement must have at least one scenario"**
+ ### "Requirement must have at least one scenario"

- **"Silent scenario parsing failures"**
+ ### "Silent scenario parsing failures"

Also applies to: 293-293, 297-297, 301-301


125-125: Add language identifiers to fenced code blocks.

Lines 125, 147, and 351 contain fenced code blocks without language specifiers. Adding them improves syntax highlighting and document clarity.

🔎 Proposed fixes
- ```
+ ```yaml
  openspec/
  ├── project.md              # Project conventions

- ```
+ ```
  New request?
  ├─ Bug fix restoring spec behavior? → Fix directly

- ```
+ ```
  openspec/changes/add-2fa-notify/
  ├── proposal.md
  ├── tasks.md

Note: Line 125 should be yaml (directory tree structure); lines 147 and 351 are decision/directory trees—consider text or leave as bash depending on intent.

Also applies to: 147-147, 351-351


29-30: Minor: Reduce repeated phrasing in examples.

Lines 29–30 repeat "I want to create" twice in succession. Slightly varying the trigger examples would improve writing variety.

🔎 Suggested revision
  - "Help me create a change proposal"
  - "Help me plan a change"
  - "Help me create a proposal"
- - "I want to create a spec proposal"
- - "I want to create a spec"
+ - "I want to create a spec proposal"
+ - "Let me spec out a new capability"
massgen/orchestrator.py (1)

869-923: Subagent min/max timeout wiring looks consistent with existing config pattern

The new min_timeout / max_timeout defaults, overrides from coordination_config, and propagation via --min-timeout / --max-timeout into the subagent MCP args are straightforward and align with how max_concurrent / default_timeout are handled. I don't see correctness or safety issues in this wiring.

If subagent_min_timeout / subagent_max_timeout are allowed to be None or non-ints at the config level, consider normalizing/clamping them there so this function can continue to assume plain ints.

openspec/project.md (1)

14-24: Consider trimming brittle details (exact models and file sizes) to keep this doc durable

The overall structure and intent of this project context doc are strong, but the very specific bits (exact model lists under “LLM SDKs/Providers” and approximate byte sizes for key modules) are likely to drift quickly as the codebase and dependencies evolve. You might want to either:

  • Soften these to examples (“e.g., claude-3-5-sonnet, gpt-4o, gemini-2.x, …”) without implying completeness, and/or
  • Drop the (~XXXK bytes) annotations and instead describe modules by role only.

This keeps the document useful for both humans and tools without requiring frequent touch-ups.

Also applies to: 101-120

massgen/tool/_multimodal_tools/understand_file.py (1)

501-504: Move import to module level for consistency and performance.

The format_prompt_with_context import is performed inside the function, which has minor performance implications (repeated imports on each call). In contrast, understand_video.py imports this at the module level (line 25), which is the preferred pattern for consistency across multimodal tools.

🔎 Proposed refactor

Move the import to the top of the file with other imports:

 from dotenv import load_dotenv
 from openai import OpenAI
 
+from massgen.context.task_context import format_prompt_with_context
 from massgen.tool._result import ExecutionResult, TextContent

Then remove the import from inside the function:

-        # Inject task context into prompt if available
-        from massgen.context.task_context import format_prompt_with_context
-
         augmented_prompt = format_prompt_with_context(prompt, task_context)
massgen/skills/release-prep/SKILL.md (1)

18-20: Consider adding language specifiers to fenced code blocks.

Static analysis flagged three code blocks without language specifiers. While these are example outputs rather than executable code, adding specifiers improves rendering and accessibility.

🔎 Suggested improvements

Line 18 (usage example):

-```
+```bash
 /release-prep v0.1.34

Line 164 (example output):
```diff
-```
+```text
 ### 📸 Suggested Screenshot for v0.1.34
 ...

Line 184 (checklist output):
```diff
-```
+```text
 ## Release Prep Summary for v0.1.34
 ...
</details>


Also applies to: 164-178, 184-207

</blockquote></details>
<details>
<summary>openspec/changes/add-unified-project-context/design.md (1)</summary><blockquote>

`72-76`: **Consider adding language specifier to error message code block.**

The error message example would render better with a language specifier. Since this is an error message (text output), `text` would be appropriate.



<details>
<summary>🔎 Suggested improvement</summary>

```diff
 **Error message**:
-```
+```text
 CONTEXT.md not found in workspace. Before using multimodal tools or spawning subagents,
 create a CONTEXT.md file with task context. See system prompt for instructions.
</details>

</blockquote></details>
<details>
<summary>massgen/tool/_multimodal_tools/understand_audio.py (1)</summary><blockquote>

`310-327`: **Note: OpenAI Whisper doesn't use the prompt parameter.**

The augmented prompt is correctly used for Gemini (line 320), but OpenAI's Whisper transcription API (line 326) doesn't actually use the `prompt` parameter—it's only included for API consistency, as noted in the docstring at line 133. The actual API call at lines 150-154 ignores it.

This is harmless but creates a false impression that context affects Whisper transcriptions.



<details>
<summary>🔎 Optional: Use original prompt for OpenAI to clarify intent</summary>

```diff
                 if selected_backend == "gemini":
                     transcription = await _process_with_gemini(
                         audio_path=audio_path,
                         prompt=augmented_prompt,
                         model=selected_model,
                     )
                 else:  # openai
                     transcription = await _process_with_openai(
                         audio_path=audio_path,
-                        prompt=augmented_prompt,
+                        prompt=None,  # Whisper doesn't use prompts
                         model=selected_model,
                     )
docs/announcements/README.md (1)

19-21: Add language identifier to code block.

The fenced code block should specify a language for proper syntax highlighting.

🔎 Proposed fix
-```
+```bash
 /release-prep v0.1.34

</details>

</blockquote></details>
<details>
<summary>docs/announcements/current-release.md (1)</summary><blockquote>

`24-26`: **Format URLs as markdown links.**

The bare URLs should be formatted as proper markdown links for better rendering and accessibility.



<details>
<summary>🔎 Proposed fix</summary>

```diff
 ## Links
 
-- **Release notes:** https://github.com/massgen/MassGen/releases/tag/v0.1.33
-- **X post:** https://x.com/massgen_ai/status/2007169573196615950
-- **LinkedIn post:** https://www.linkedin.com/posts/massgen-ai_were-excited-to-release-massgen-v0133-activity-7412935810997678080-HH1H
+- **Release notes:** [v0.1.33](https://github.com/massgen/MassGen/releases/tag/v0.1.33)
+- **X post:** [massgen_ai](https://x.com/massgen_ai/status/2007169573196615950)
+- **LinkedIn post:** [massgen-ai](https://www.linkedin.com/posts/massgen-ai_were-excited-to-release-massgen-v0133-activity-7412935810997678080-HH1H)
 Install: `pip install massgen==0.1.33`
 
-Release notes: https://github.com/massgen/MassGen/releases/tag/v0.1.33
+Release notes: [v0.1.33](https://github.com/massgen/MassGen/releases/tag/v0.1.33)

Also applies to: 40-40

.github/workflows/pypi-publish.yml (2)

34-36: Quote variables to prevent word splitting.

The shellcheck warnings are valid. While these specific variables are unlikely to contain spaces, quoting is a best practice for shell safety.

🔎 Proposed fix
       - name: Get version from tag
         id: version
         run: |
-          echo "VERSION=${GITHUB_REF#refs/tags/}" >> $GITHUB_OUTPUT
-          echo "VERSION_WITHOUT_V=${GITHUB_REF#refs/tags/v}" >> $GITHUB_OUTPUT
+          echo "VERSION=${GITHUB_REF#refs/tags/}" >> "$GITHUB_OUTPUT"
+          echo "VERSION_WITHOUT_V=${GITHUB_REF#refs/tags/v}" >> "$GITHUB_OUTPUT"

75-87: Quote variables in step summary.

Multiple shellcheck warnings flagged unquoted variables. While the ${{ }} syntax is GitHub Actions interpolation (not bash variables) and is safe, the $GITHUB_STEP_SUMMARY redirections should be quoted.

🔎 Proposed fix
       - name: Post publish summary
         run: |
-          echo "### 🎉 Published to PyPI Successfully!" >> $GITHUB_STEP_SUMMARY
-          echo "" >> $GITHUB_STEP_SUMMARY
-          echo "**Version:** ${{ steps.version.outputs.VERSION }}" >> $GITHUB_STEP_SUMMARY
-          echo "" >> $GITHUB_STEP_SUMMARY
-          echo "**PyPI URL:** https://pypi.org/project/massgen/${{ steps.version.outputs.VERSION_WITHOUT_V }}/" >> $GITHUB_STEP_SUMMARY
-          echo "" >> $GITHUB_STEP_SUMMARY
-          echo "**Docs:** https://massgendoc.readthedocs.io/" >> $GITHUB_STEP_SUMMARY
-          echo "" >> $GITHUB_STEP_SUMMARY
-          echo "**Install:**" >> $GITHUB_STEP_SUMMARY
-          echo '```bash' >> $GITHUB_STEP_SUMMARY
-          echo "pip install massgen==${{ steps.version.outputs.VERSION_WITHOUT_V }}" >> $GITHUB_STEP_SUMMARY
-          echo '```' >> $GITHUB_STEP_SUMMARY
+          echo "### 🎉 Published to PyPI Successfully!" >> "$GITHUB_STEP_SUMMARY"
+          echo "" >> "$GITHUB_STEP_SUMMARY"
+          echo "**Version:** ${{ steps.version.outputs.VERSION }}" >> "$GITHUB_STEP_SUMMARY"
+          echo "" >> "$GITHUB_STEP_SUMMARY"
+          echo "**PyPI URL:** https://pypi.org/project/massgen/${{ steps.version.outputs.VERSION_WITHOUT_V }}/" >> "$GITHUB_STEP_SUMMARY"
+          echo "" >> "$GITHUB_STEP_SUMMARY"
+          echo "**Docs:** https://massgendoc.readthedocs.io/" >> "$GITHUB_STEP_SUMMARY"
+          echo "" >> "$GITHUB_STEP_SUMMARY"
+          echo "**Install:**" >> "$GITHUB_STEP_SUMMARY"
+          echo '```bash' >> "$GITHUB_STEP_SUMMARY"
+          echo "pip install massgen==${{ steps.version.outputs.VERSION_WITHOUT_V }}" >> "$GITHUB_STEP_SUMMARY"
+          echo '```' >> "$GITHUB_STEP_SUMMARY"
massgen/tool/_multimodal_tools/understand_image.py (1)

43-50: Function signature updated correctly for task_context.

The task_context parameter is added as the last optional parameter, maintaining backward compatibility. The docstring should be updated to document this new parameter.

🔎 Suggested docstring addition

Add to the Args section of the docstring:

         allowed_paths: List of allowed base paths for validation (optional)
         agent_cwd: Agent's current working directory (automatically injected)
+        task_context: Optional task context string to augment the prompt with
+                     project-specific terminology and context (auto-injected from CONTEXT.md)
massgen/tests/test_subagent_cancellation_recovery.py (1)

287-321: Verify test assertion on line 321.

The test test_extract_answer_selects_by_votes asserts that "agent_1" in answer on line 321. This assertion checks that the answer content mentions "agent_1", which works because the test creates answers with the pattern f"Answer from {agent_id}". However, this is a somewhat indirect assertion.

A more robust assertion might directly check the answer content:

assert answer == "Answer from agent_1"

This would make the test more precise and easier to debug if it fails.

openspec/changes/fix-subagent-cancellation-recovery/proposal.md (1)

20-39: Consider adding language specifiers to code blocks.

The flow diagrams at lines 20-26 and 29-39 are flagged by markdownlint for missing language specifiers. Since these are conceptual diagrams rather than executable code, you could use text or plaintext as the language:

-```
+```text
 Parent's asyncio.wait_for() times out

This is a minor documentation polish.

openspec/changes/add-unified-project-context/specs/task-context/spec.md (1)

45-51: Add a language identifier to the fenced code block for markdownlint compliance

The fenced code block showing the [Task Context]/[Request] template has no language tag, which trips MD040. Consider marking it as text (or markdown) for cleaner linting.

Suggested change
-``` 
+```text
 [Task Context]
 {context}

 [Request]
 {original_prompt}
massgen/tool/_multimodal_tools/generation/generate_media.py (1)

266-287: generate_media context handling is solid; consider documenting task_context in the docstring

The new flow—loading task_context via load_task_context_with_warning, requiring CONTEXT.md, and threading warnings into batch/single responses—looks correct and matches the intended behavior for multimodal tools.

Since task_context is now a first‑class context param, it would help future maintainers if the docstring’s Args section briefly described it (similar to agent_cwd / multimodal_config), e.g., “auto‑injected task context loaded from CONTEXT.md”.

Also applies to: 361-375, 449-471, 553-599

massgen/mcp_tools/subagent/_subagent_mcp_server.py (1)

45-47: Timeout configuration wiring is good; summary could include new recovery statuses

Passing _min_timeout/_max_timeout into SubagentManager and dropping timeout_seconds from the MCP interface is a clean way to centralize timeout policy and prevent models from picking bad values.

One behavioral nit: completed_but_timeout and partial results are not reflected in the summary counters, which still only track "completed" and "timeout". Since completed_but_timeout is a successful recovery and partial is a distinct state, you may want to:

  • Treat "completed_but_timeout" as completed in the summary, and/or
  • Add separate counts for "partial" and "completed_but_timeout".

This would make the high‑level summary better match the richer per‑subagent statuses you now expose.

Also applies to: 50-67, 94-99, 138-149, 199-204, 218-227, 330-356

.github/workflows/release-docs-automation.yml (1)

21-26: Quote GitHub output/summary file paths for shell robustness

The validation logic is solid, but shellcheck is right to flag unquoted $GITHUB_OUTPUT / $GITHUB_STEP_SUMMARY. Quoting avoids surprises if these paths ever contain spaces and silences SC2086.

You can safely tighten this without changing behavior:

Suggested shell quoting tweaks
-echo "tag=$TAG" >> $GITHUB_OUTPUT
-echo "version=${TAG#v}" >> $GITHUB_OUTPUT
+echo "tag=$TAG" >> "$GITHUB_OUTPUT"
+echo "version=${TAG#v}" >> "$GITHUB_OUTPUT"
@@
-            echo "has_errors=true" >> $GITHUB_OUTPUT
+            echo "has_errors=true" >> "$GITHUB_OUTPUT"
@@
-            echo "has_errors=false" >> $GITHUB_OUTPUT
+            echo "has_errors=false" >> "$GITHUB_OUTPUT"
@@
-            echo "has_warnings=true" >> $GITHUB_OUTPUT
+            echo "has_warnings=true" >> "$GITHUB_OUTPUT"
@@
-            echo "has_warnings=false" >> $GITHUB_OUTPUT
+            echo "has_warnings=false" >> "$GITHUB_OUTPUT"
@@
-          echo "## 📋 Release Documentation Validation" >> $GITHUB_STEP_SUMMARY
+          echo "## 📋 Release Documentation Validation" >> "$GITHUB_STEP_SUMMARY"
@@
-          echo "Release documentation validation failed. See logs above."
+          echo "Release documentation validation failed. See logs above."

(Apply the same quoting pattern to all >> $GITHUB_STEP_SUMMARY lines.)

Also applies to: 28-88, 90-118, 119-123

massgen/subagent/models.py (1)

355-414: Type hint mismatch for token_usage parameter.

The token_usage parameter type Optional[Dict[str, Any]] is inconsistent with the class field type Dict[str, int] (line 241). Since token usage includes estimated_cost (a float), consider updating the field type annotation to match actual usage:

🔎 Suggested fix
 # In SubagentResult class definition (around line 241)
-    token_usage: Dict[str, int] = field(default_factory=dict)
+    token_usage: Dict[str, Any] = field(default_factory=dict)
massgen/subagent/manager.py (1)

245-256: Consider narrowing the exception type for CONTEXT.md copy.

The broad Exception catch (line 254) works but could be more specific. Since shutil.copy2 typically raises OSError (or subclasses like PermissionError, FileNotFoundError), narrowing the catch would improve clarity while maintaining the graceful degradation behavior.

🔎 Optional refinement
             try:
                 shutil.copy2(context_md, dst)
                 copied.append("CONTEXT.md")
                 logger.info(f"[SubagentManager] Auto-copied CONTEXT.md for {subagent_id}")
-            except Exception as e:
+            except OSError as e:
                 logger.warning(f"[SubagentManager] Failed to copy CONTEXT.md: {e}")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 36f455b and 943dcbd.

📒 Files selected for processing (40)
  • .coderabbit.yaml
  • .github/workflows/pypi-publish.yml
  • .github/workflows/release-docs-automation.yml
  • .gitignore
  • CLAUDE.md
  • CONTRIBUTING.md
  • docs/announcements/README.md
  • docs/announcements/archive/.gitkeep
  • docs/announcements/current-release.md
  • docs/announcements/feature-highlights.md
  • docs/source/user_guide/advanced/subagents.rst
  • massgen/agent_config.py
  • massgen/backend/base_with_custom_tool_and_mcp.py
  • massgen/mcp_tools/subagent/_subagent_mcp_server.py
  • massgen/orchestrator.py
  • massgen/persona_generator.py
  • massgen/skills/release-prep/SKILL.md
  • massgen/subagent/manager.py
  • massgen/subagent/models.py
  • massgen/system_message_builder.py
  • massgen/system_prompt_sections.py
  • massgen/tests/test_subagent_cancellation_recovery.py
  • massgen/tool/_multimodal_tools/generation/generate_media.py
  • massgen/tool/_multimodal_tools/read_media.py
  • massgen/tool/_multimodal_tools/understand_audio.py
  • massgen/tool/_multimodal_tools/understand_file.py
  • massgen/tool/_multimodal_tools/understand_image.py
  • massgen/tool/_multimodal_tools/understand_video.py
  • openspec/AGENTS.md
  • openspec/changes/add-unified-project-context/design.md
  • openspec/changes/add-unified-project-context/proposal.md
  • openspec/changes/add-unified-project-context/specs/task-context/spec.md
  • openspec/changes/add-unified-project-context/tasks.md
  • openspec/changes/fix-subagent-cancellation-recovery/proposal.md
  • openspec/changes/fix-subagent-cancellation-recovery/specs/subagent-orchestration/spec.md
  • openspec/changes/fix-subagent-cancellation-recovery/specs/subagent-status-consolidation/spec.md
  • openspec/changes/fix-subagent-cancellation-recovery/tasks.md
  • openspec/project.md
  • scripts/precommit_sync_readme_wrapper.sh
  • scripts/precommit_validate_configs_wrapper.sh
🧰 Additional context used
📓 Path-based instructions (6)
massgen/skills/**/*.md

⚙️ CodeRabbit configuration file

massgen/skills/**/*.md: Skill documentation review:

  • Verify SKILL.md frontmatter is complete (name, description)
  • Check workflow steps are clear and actionable
  • Ensure reference files are accurate

Files:

  • massgen/skills/release-prep/SKILL.md
massgen/**/*.py

⚙️ CodeRabbit configuration file

massgen/**/*.py: Review Python code for:

  • PEP 8 compliance and Google-style docstrings
  • Proper async/await patterns
  • Type hints where appropriate
  • No security vulnerabilities (command injection, etc.)
  • Consistency with existing code patterns

Files:

  • massgen/tool/_multimodal_tools/understand_video.py
  • massgen/tool/_multimodal_tools/understand_file.py
  • massgen/orchestrator.py
  • massgen/tool/_multimodal_tools/generation/generate_media.py
  • massgen/tool/_multimodal_tools/understand_audio.py
  • massgen/tool/_multimodal_tools/read_media.py
  • massgen/system_message_builder.py
  • massgen/tool/_multimodal_tools/understand_image.py
  • massgen/agent_config.py
  • massgen/backend/base_with_custom_tool_and_mcp.py
  • massgen/subagent/models.py
  • massgen/system_prompt_sections.py
  • massgen/tests/test_subagent_cancellation_recovery.py
  • massgen/persona_generator.py
  • massgen/mcp_tools/subagent/_subagent_mcp_server.py
  • massgen/subagent/manager.py
docs/source/**/*.rst

⚙️ CodeRabbit configuration file

docs/source/**/*.rst: RST documentation review:

  1. Ensure proper RST syntax
  2. Check cross-references are valid
  3. Verify code examples are runnable and match current implementation
  4. For new features, ensure docs/source/index.rst "Recent Releases" is updated

Documentation quality:

  • Keep explanations concise - avoid bloated prose
  • Remove redundant or duplicate information
  • Ensure file paths and parameter names match actual code
  • Flag any examples that may be outdated or inconsistent with implementation

Files:

  • docs/source/user_guide/advanced/subagents.rst
docs/source/user_guide/**/*.rst

⚙️ CodeRabbit configuration file

docs/source/user_guide/**/*.rst: User guide documentation review:

Completeness:

  • Clear explanations with working code examples
  • Expected output where applicable
  • Links to related configuration examples

Consistency with implementation:

  • Parameter names must match actual code
  • File paths must be accurate
  • Behavior descriptions must match what the code does
  • Flag any discrepancies between docs and implementation

Usability:

  • Include runnable commands users can try immediately
  • Provide architecture diagrams for complex features
  • Show expected output so users know what to expect

Conciseness:

  • Avoid over-documentation and bloated explanations
  • One clear explanation is better than multiple redundant ones
  • Remove filler text and unnecessary verbosity

Files:

  • docs/source/user_guide/advanced/subagents.rst
massgen/backend/**/*.py

⚙️ CodeRabbit configuration file

massgen/backend/**/*.py: Backend code review - CHECK FOR THESE UPDATES:

  1. If adding new model support → ensure massgen/backend/capabilities.py is updated
  2. If adding new capabilities → update supported_capabilities in capabilities.py
  3. Verify inheritance from base classes is correct
  4. Check streaming buffer usage where applicable
  5. Ensure error handling follows existing patterns

Files:

  • massgen/backend/base_with_custom_tool_and_mcp.py
massgen/tests/**/*.py

⚙️ CodeRabbit configuration file

massgen/tests/**/*.py: Verify tests use pytest best practices:

  • @pytest.mark.expensive for API-calling tests
  • @pytest.mark.docker for Docker-dependent tests
  • @pytest.mark.asyncio for async tests
  • Clear test descriptions and proper fixtures

Files:

  • massgen/tests/test_subagent_cancellation_recovery.py
🧬 Code graph analysis (5)
massgen/tool/_multimodal_tools/generation/generate_media.py (3)
massgen/tool/_multimodal_tools/generation/_base.py (1)
  • GenerationConfig (27-56)
massgen/tool/_decorators.py (1)
  • context_params (7-52)
massgen/tool/_result.py (2)
  • ExecutionResult (47-69)
  • TextContent (23-27)
massgen/tool/_multimodal_tools/read_media.py (4)
massgen/tool/_decorators.py (1)
  • context_params (7-52)
massgen/tool/_multimodal_tools/understand_image.py (1)
  • understand_image (43-322)
massgen/tool/_multimodal_tools/understand_audio.py (1)
  • understand_audio (159-367)
massgen/tool/_multimodal_tools/understand_video.py (1)
  • understand_video (443-662)
massgen/system_message_builder.py (1)
massgen/system_prompt_sections.py (2)
  • TaskContextSection (2280-2340)
  • add_section (2363-2374)
massgen/persona_generator.py (1)
massgen/structured_logging.py (4)
  • debug (284-290)
  • info (276-282)
  • error (300-306)
  • warning (292-298)
massgen/subagent/manager.py (1)
massgen/subagent/models.py (6)
  • SubagentConfig (21-127)
  • create (50-93)
  • SubagentResult (212-414)
  • create_error (333-353)
  • SubagentState (478-506)
  • create_timeout_with_recovery (356-414)
🪛 actionlint (1.7.9)
.github/workflows/pypi-publish.yml

34-34: shellcheck reported issue in this script: SC2086:info:1:44: Double quote to prevent globbing and word splitting

(shellcheck)


34-34: shellcheck reported issue in this script: SC2086:info:2:55: Double quote to prevent globbing and word splitting

(shellcheck)


75-75: shellcheck reported issue in this script: SC2086:info:10:19: Double quote to prevent globbing and word splitting

(shellcheck)


75-75: shellcheck reported issue in this script: SC2086:info:11:79: Double quote to prevent globbing and word splitting

(shellcheck)


75-75: shellcheck reported issue in this script: SC2086:info:12:15: Double quote to prevent globbing and word splitting

(shellcheck)


75-75: shellcheck reported issue in this script: SC2086:info:1:49: Double quote to prevent globbing and word splitting

(shellcheck)


75-75: shellcheck reported issue in this script: SC2086:info:2:12: Double quote to prevent globbing and word splitting

(shellcheck)


75-75: shellcheck reported issue in this script: SC2086:info:3:61: Double quote to prevent globbing and word splitting

(shellcheck)


75-75: shellcheck reported issue in this script: SC2086:info:4:12: Double quote to prevent globbing and word splitting

(shellcheck)


75-75: shellcheck reported issue in this script: SC2086:info:5:106: Double quote to prevent globbing and word splitting

(shellcheck)


75-75: shellcheck reported issue in this script: SC2086:info:6:12: Double quote to prevent globbing and word splitting

(shellcheck)


75-75: shellcheck reported issue in this script: SC2086:info:7:56: Double quote to prevent globbing and word splitting

(shellcheck)


75-75: shellcheck reported issue in this script: SC2086:info:8:12: Double quote to prevent globbing and word splitting

(shellcheck)


75-75: shellcheck reported issue in this script: SC2086:info:9:24: Double quote to prevent globbing and word splitting

(shellcheck)


75-75: shellcheck reported issue in this script: SC2129:style:1:1: Consider using { cmd1; cmd2; } >> file instead of individual redirects

(shellcheck)

.github/workflows/release-docs-automation.yml

30-30: shellcheck reported issue in this script: SC2086:info:48:29: Double quote to prevent globbing and word splitting

(shellcheck)


30-30: shellcheck reported issue in this script: SC2086:info:50:30: Double quote to prevent globbing and word splitting

(shellcheck)


30-30: shellcheck reported issue in this script: SC2086:info:55:31: Double quote to prevent globbing and word splitting

(shellcheck)


30-30: shellcheck reported issue in this script: SC2086:info:57:32: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:10:86: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:12:52: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:13:14: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:14:39: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:16:37: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:17:14: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:18:52: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:21:12: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:22:29: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:23:58: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:24:55: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:25:30: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:2:49: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:3:12: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:4:29: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:5:12: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:8:37: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2086:info:9:14: Double quote to prevent globbing and word splitting

(shellcheck)


92-92: shellcheck reported issue in this script: SC2129:style:12:3: Consider using { cmd1; cmd2; } >> file instead of individual redirects

(shellcheck)


92-92: shellcheck reported issue in this script: SC2129:style:16:3: Consider using { cmd1; cmd2; } >> file instead of individual redirects

(shellcheck)


92-92: shellcheck reported issue in this script: SC2129:style:21:1: Consider using { cmd1; cmd2; } >> file instead of individual redirects

(shellcheck)


92-92: shellcheck reported issue in this script: SC2129:style:2:1: Consider using { cmd1; cmd2; } >> file instead of individual redirects

(shellcheck)


92-92: shellcheck reported issue in this script: SC2129:style:8:3: Consider using { cmd1; cmd2; } >> file instead of individual redirects

(shellcheck)

🪛 LanguageTool
CLAUDE.md

[style] ~192-~192: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ...d changes - --type committed - Review only committed changes - --base develop - ...

(ADVERB_REPETITION_PREMIUM)

openspec/AGENTS.md

[style] ~30-~30: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ..."I want to create a spec proposal" - "I want to create a spec" Loose matching guidance...

(REP_WANT_TO_VB)

openspec/changes/add-unified-project-context/specs/task-context/spec.md

[style] ~19-~19: You have already used this phrasing in nearby sentences. Consider replacing it to add variety to your writing.
Context: ...ng multimodal tools - WHEN an agent needs to use read_media or generate_media - **TH...

(REP_NEED_TO_VB)

🪛 markdownlint-cli2 (0.18.1)
openspec/changes/fix-subagent-cancellation-recovery/specs/subagent-status-consolidation/spec.md

92-92: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)


150-150: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

massgen/skills/release-prep/SKILL.md

18-18: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


164-164: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


184-184: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

openspec/changes/add-unified-project-context/design.md

73-73: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/announcements/README.md

19-19: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

openspec/changes/fix-subagent-cancellation-recovery/proposal.md

20-20: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


29-29: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/announcements/feature-highlights.md

11-11: Bare URL used

(MD034, no-bare-urls)


13-13: Bare URL used

(MD034, no-bare-urls)


15-15: Bare URL used

(MD034, no-bare-urls)


17-17: Bare URL used

(MD034, no-bare-urls)


19-19: Bare URL used

(MD034, no-bare-urls)


21-21: Bare URL used

(MD034, no-bare-urls)


23-23: Bare URL used

(MD034, no-bare-urls)


25-25: Bare URL used

(MD034, no-bare-urls)


27-27: Bare URL used

(MD034, no-bare-urls)

openspec/AGENTS.md

43-43: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


125-125: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


147-147: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


293-293: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


297-297: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


301-301: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


351-351: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

openspec/changes/add-unified-project-context/specs/task-context/spec.md

45-45: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

docs/announcements/current-release.md

24-24: Bare URL used

(MD034, no-bare-urls)


25-25: Bare URL used

(MD034, no-bare-urls)


26-26: Bare URL used

(MD034, no-bare-urls)


40-40: Bare URL used

(MD034, no-bare-urls)

🪛 Ruff (0.14.10)
massgen/subagent/manager.py

254-254: Do not catch blind exception: Exception

(BLE001)


1151-1151: Do not catch blind exception: Exception

(BLE001)

fix: Fix responses compression input issue
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
massgen/backend/response.py (1)

183-234: LGTM - Compression recovery flow enhanced with better user feedback.

The implementation correctly handles context length errors by:

  • Gathering the streaming buffer for partial response content
  • Compressing messages via LLM-based summarization
  • Removing previous_response_id to prevent server-side context re-accumulation (which would defeat compression)
  • Retrying with compressed messages
  • Reporting success with accurate input counts

The logic is sound and the user-facing messages are informative.

Note: The compression recovery logic is duplicated between _stream_without_custom_and_mcp_tools (lines 183-234) and _stream_with_custom_and_mcp_tools (lines 395-452). Consider extracting the common compression flow into a helper method to improve maintainability and reduce the risk of divergent bug fixes.

💡 Suggested refactor to reduce duplication
async def _handle_compression_recovery(
    self,
    e: Exception,
    messages: List[Dict[str, Any]],
    all_params: Dict[str, Any],
    tools: List[Dict[str, Any]],
    agent_id: Optional[str],
    context_label: str = "",
) -> Tuple[Any, List[Dict[str, Any]]]:
    """Handle compression recovery for context length errors.
    
    Returns:
        Tuple of (retry_stream, compressed_messages)
    """
    logger.warning(
        f"[{self.get_provider_name()}] Context length exceeded{context_label}, "
        f"attempting compression recovery...",
    )
    
    # Notify user
    yield StreamChunk(
        type="compression_status",
        status="compressing",
        content=f"\n📦 [Compression] Context limit exceeded - compressing {len(messages)} messages...",
        source=agent_id,
    )
    
    # Compress messages
    buffer_for_compression = self._get_streaming_buffer()
    compressed_messages = await self._compress_messages_for_context_recovery(
        messages,
        buffer_content=buffer_for_compression,
    )
    
    # Remove previous_response_id
    compression_params = {k: v for k, v in all_params.items() if k != "previous_response_id"}
    api_params = await self.api_params_handler.build_api_params(
        compressed_messages,
        tools,
        compression_params,
    )
    
    # Retry
    client = self._create_client(**all_params)
    stream = await client.responses.create(**api_params)
    
    # Notify success
    input_count = len(compressed_messages) if compressed_messages else 0
    result_msg = f"✅ [Compression] Recovered via summarization ({input_count} items) - continuing..."
    yield StreamChunk(
        type="compression_status",
        status="compression_complete",
        content=result_msg,
        source=agent_id,
    )
    
    logger.info(
        f"[{self.get_provider_name()}] Compression recovery successful via summarization "
        f"({input_count} items)",
    )
    
    return stream, compressed_messages

Then call it from both locations:

if is_context_length_error(e) and not _compression_retry:
    stream, compressed_messages = await self._handle_compression_recovery(
        e, processed_messages, all_params, tools, agent_id, context_label=" in MCP mode"
    )
    current_messages = compressed_messages  # Update for subsequent recursions if needed
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 943dcbd and 84e657b.

📒 Files selected for processing (2)
  • massgen/backend/base_with_custom_tool_and_mcp.py
  • massgen/backend/response.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • massgen/backend/base_with_custom_tool_and_mcp.py
🧰 Additional context used
📓 Path-based instructions (2)
massgen/**/*.py

⚙️ CodeRabbit configuration file

massgen/**/*.py: Review Python code for:

  • PEP 8 compliance and Google-style docstrings
  • Proper async/await patterns
  • Type hints where appropriate
  • No security vulnerabilities (command injection, etc.)
  • Consistency with existing code patterns

Files:

  • massgen/backend/response.py
massgen/backend/**/*.py

⚙️ CodeRabbit configuration file

massgen/backend/**/*.py: Backend code review - CHECK FOR THESE UPDATES:

  1. If adding new model support → ensure massgen/backend/capabilities.py is updated
  2. If adding new capabilities → update supported_capabilities in capabilities.py
  3. Verify inheritance from base classes is correct
  4. Check streaming buffer usage where applicable
  5. Ensure error handling follows existing patterns

Files:

  • massgen/backend/response.py
🧬 Code graph analysis (1)
massgen/backend/response.py (3)
massgen/backend/_streaming_buffer_mixin.py (1)
  • _get_streaming_buffer (183-189)
massgen/backend/base_with_custom_tool_and_mcp.py (1)
  • _compress_messages_for_context_recovery (2890-2924)
massgen/api_params_handler/_response_api_params_handler.py (1)
  • build_api_params (64-176)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: pre-commit (macos-14, 3.10)
  • GitHub Check: pre-commit (macos-latest, 3.11)
  • GitHub Check: pre-commit (macos-latest, 3.10)
  • GitHub Check: pre-commit (macos-latest, 3.12)
  • GitHub Check: pre-commit (macos-14, 3.10)
🔇 Additional comments (3)
massgen/backend/response.py (3)

395-452: LGTM - Consistent compression recovery implementation.

This block mirrors the compression recovery logic from _stream_without_custom_and_mcp_tools with appropriate adaptations for the MCP tool execution path. The implementation is correct and properly:

  • Updates current_messages for subsequent recursive calls (line 432)
  • Tracks API timing with end_api_call_timing (line 402)
  • Provides context-appropriate log messages ("in MCP mode")

The duplication concern noted in the previous comment applies here as well.


587-605: LGTM - Clearer logic for response item handling.

The introduction of the will_use_previous_response_id variable (line 595) significantly improves readability by making the conditional logic explicit:

  • When previous_response_id will be used, the API automatically includes all items from that response
  • When staying stateless (compression retry), items must be manually added to avoid missing reasoning items

The extended comments (lines 587-593) provide valuable context about reasoning model requirements and the rationale for the conditional injection. This defensive approach prevents "Duplicate item found with id" errors while ensuring reasoning continuity.


1488-1520: LGTM - Improved docstring clarity.

The updated docstring accurately describes the compression mechanism as "LLM-based summarization" (line 1493) and provides a more precise return description (line 1500). This aligns with the implementation details and the comments added in the compression recovery blocks.

The method signature and implementation remain consistent with the parent class (base_with_custom_tool_and_mcp.py).

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Fix all issues with AI Agents 🤖
In @CLAUDE.md:
- Around line 1-337: The CLAUDE.md doc references missing files/dirs and a
misnamed skill; update the doc to either create or stop referencing them: remove
or correct the reference to docs/scripts/generate_backend_tables.py (or add that
script to docs/scripts if intended), remove or create the missing directories
docs/announcements/, docs/dev_notes/, and docs/source/ (or replace their
mentions with the actual existing paths), and fix the symlink instruction to use
the real skill directory name evolving-skill-creator instead of skill-creator so
the command and examples match the repository.

In @massgen/skills/pr-checks/SKILL.md:
- Around line 156-162: Add a short explanatory comment above the gh api graphql
mutation that tells the reader to replace THREAD_ID with the actual review
thread ID obtained from the /pr-comments output (step 3) and that the
resolveReviewThread mutation will mark that review thread as resolved after
addressing CodeRabbit's suggestion; reference the mutation name
resolveReviewThread and the placeholder THREAD_ID so it's clear what to
substitute and where the ID comes from.
🧹 Nitpick comments (5)
massgen/skills/pr-checks/SKILL.md (1)

83-83: Minor: Templates reference Claude-specific tooling.

The PR description and commit templates (lines 83 and 218–220) include footers referencing "Claude Code" and Claude authorship. While these are in template form meant to be customized, consider clarifying that users should adapt these for their development environment and team conventions.

For example, you might note inline:

🤖 Generated with [Claude Code](https://claude.com/claude-code)  
← Adapt this footer to your team's conventions

Also applies to: 218-220

massgen/agent_config.py (1)

146-148: Consider simplifying f-string concatenation.

The adjacent f-strings on line 147 could be combined into a single f-string for clarity:

f"subagent_min_timeout ({self.subagent_min_timeout}) must be <= subagent_max_timeout ({self.subagent_max_timeout})"

This also applies to line 152. Minor stylistic improvement—not blocking.

massgen/orchestrator.py (2)

870-923: Timeout min/max propagation looks correct; consider documenting expected units and valid ranges.

The new min_timeout/max_timeout defaults and coordination-config overrides are wired cleanly into the subagent MCP args, and the CLI flag names are consistent within this file. The only thing I’d consider is making it explicit (in config docs or a brief comment) that these values are seconds and should satisfy 0 < min_timeout <= default_timeout <= max_timeout so misconfigurations are easier to spot and don’t rely solely on _clamp_timeout() downstream.


1788-1898: Reading subagent metrics from full_logs/status.json is aligned; maybe distinguish “phase” vs “status”.

Using full_logs/status.json as the single source of truth and pulling costs.total_*, meta.elapsed_seconds, and meta.question looks robust and matches the new status file shape. One minor nit: you currently expose the coordination phase under the key "status", which is fine for internal metrics but can be confusing next to the spec’s notion of a simplified status (running/pending/failed/etc.). Either rename this field to "phase" or add a short comment that, in this metrics summary, "status" reflects raw coordination phase rather than the model-facing derived status.

openspec/changes/fix-subagent-cancellation-recovery/specs/subagent-status-consolidation/spec.md (1)

65-86: Clarify source of elapsed_seconds in the status transformation example.

In the “Status transformation” scenario, the input example JSON doesn’t show a meta block, but the derived status includes elapsed_seconds. To avoid confusion, consider either adding a meta section (with elapsed_seconds) to the input example or a short note that this field comes from meta.elapsed_seconds in full_logs/status.json, matching the implementation.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 56bfd8f and 59656ca.

📒 Files selected for processing (7)
  • CLAUDE.md
  • docs/source/user_guide/advanced/subagents.rst
  • massgen/agent_config.py
  • massgen/orchestrator.py
  • massgen/skills/pr-checks/SKILL.md
  • massgen/subagent/models.py
  • openspec/changes/fix-subagent-cancellation-recovery/specs/subagent-status-consolidation/spec.md
🧰 Additional context used
📓 Path-based instructions (4)
docs/source/**/*.rst

⚙️ CodeRabbit configuration file

docs/source/**/*.rst: RST documentation review:

  1. Ensure proper RST syntax
  2. Check cross-references are valid
  3. Verify code examples are runnable and match current implementation
  4. For new features, ensure docs/source/index.rst "Recent Releases" is updated

Documentation quality:

  • Keep explanations concise - avoid bloated prose
  • Remove redundant or duplicate information
  • Ensure file paths and parameter names match actual code
  • Flag any examples that may be outdated or inconsistent with implementation

Files:

  • docs/source/user_guide/advanced/subagents.rst
docs/source/user_guide/**/*.rst

⚙️ CodeRabbit configuration file

docs/source/user_guide/**/*.rst: User guide documentation review:

Completeness:

  • Clear explanations with working code examples
  • Expected output where applicable
  • Links to related configuration examples

Consistency with implementation:

  • Parameter names must match actual code
  • File paths must be accurate
  • Behavior descriptions must match what the code does
  • Flag any discrepancies between docs and implementation

Usability:

  • Include runnable commands users can try immediately
  • Provide architecture diagrams for complex features
  • Show expected output so users know what to expect

Conciseness:

  • Avoid over-documentation and bloated explanations
  • One clear explanation is better than multiple redundant ones
  • Remove filler text and unnecessary verbosity

Files:

  • docs/source/user_guide/advanced/subagents.rst
massgen/skills/**/*.md

⚙️ CodeRabbit configuration file

massgen/skills/**/*.md: Skill documentation review:

  • Verify SKILL.md frontmatter is complete (name, description)
  • Check workflow steps are clear and actionable
  • Ensure reference files are accurate

Files:

  • massgen/skills/pr-checks/SKILL.md
massgen/**/*.py

⚙️ CodeRabbit configuration file

massgen/**/*.py: Review Python code for:

  • PEP 8 compliance and Google-style docstrings
  • Proper async/await patterns
  • Type hints where appropriate
  • No security vulnerabilities (command injection, etc.)
  • Consistency with existing code patterns

Files:

  • massgen/agent_config.py
  • massgen/orchestrator.py
  • massgen/subagent/models.py
🧬 Code graph analysis (2)
massgen/agent_config.py (1)
massgen/structured_logging.py (1)
  • warning (292-298)
massgen/subagent/models.py (1)
massgen/agent_config.py (1)
  • to_dict (875-914)
🪛 LanguageTool
CLAUDE.md

[style] ~192-~192: This adverb was used twice in the sentence. Consider removing one of them or replacing them with a synonym.
Context: ...d changes - --type committed - Review only committed changes - --base develop - ...

(ADVERB_REPETITION_PREMIUM)

🪛 Ruff (0.14.10)
massgen/agent_config.py

140-140: Avoid specifying long messages outside the exception class

(TRY003)


143-143: Avoid specifying long messages outside the exception class

(TRY003)


146-148: Avoid specifying long messages outside the exception class

(TRY003)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: pre-commit (macos-14, 3.10)
  • GitHub Check: pre-commit (macos-14, 3.11)
  • GitHub Check: pre-commit (macos-latest, 3.11)
🔇 Additional comments (17)
CLAUDE.md (1)

191-192: Duplicate comment on adverbial structure.

This flagged item was already dismissed in a previous review as intentional parallel structure for documentation clarity. The "only" adverbs in both lines maintain consistent structure across the options list.

massgen/skills/pr-checks/SKILL.md (1)

200-204: The script reference is correct—scripts/validate_all_configs.py exists and the invocation pattern is consistent with other steps in the workflow.

docs/source/user_guide/advanced/subagents.rst (6)

77-85: LGTM!

The timeout configuration documentation is clear and matches the implementation defaults in agent_config.py (min=60, max=600). The note about clamping behavior provides helpful context for users.


210-241: LGTM!

The Status Values table provides clear documentation of all possible status outcomes with appropriate success flags and descriptions. The note about completed_but_timeout being a success case is helpful for users who might otherwise treat it as a failure.


296-302: LGTM - past concern addressed.

The completion_percentage documentation now correctly describes it as a continuous 0-100 value with the formula (answers + votes) / (agents × 2) × 100. The percentages are appropriately labeled as approximate milestones rather than discrete phase markers.


346-378: LGTM - past concern addressed.

The Mixed Results example now includes all fields from the actual return structure: workspace, execution_time_seconds, token_usage, completion_percentage per result, and the summary object at top level. The example effectively demonstrates handling of mixed outcomes (completed, completed_but_timeout, timeout).


465-479: LGTM - past concern addressed.

The check_subagent_status example now includes all fields: task, workspace, started_at, along with the existing status and metrics fields. This matches the actual transformed output from _transform_orchestrator_status.


389-429: The fix has been confirmed. The _collect_subagent_costs() method correctly reads from full_logs/status.json (line 1814) and uses the proper field paths: costs.total_input_tokens, costs.total_output_tokens, costs.total_estimated_cost, meta.elapsed_seconds, and coordination.phase. The implementation aligns with the documentation's claim that full_logs/status.json is the single source of truth.

massgen/agent_config.py (3)

94-95: LGTM!

The field docstrings clearly describe the purpose of each timeout bound.


125-126: LGTM!

The default values (60s min, 600s max) are reasonable bounds that match the documentation. The inline comments clarify the human-readable durations.


135-153: LGTM - past concern addressed.

The _validate_timeout_config method implements comprehensive validation:

  • Positive value checks for both min and max
  • Logical ordering constraint (min ≤ max)
  • Warning when default falls outside the configured bounds

This provides early feedback to users at config time, complementing the runtime clamping in SubagentManager.

massgen/subagent/models.py (6)

147-208: LGTM: enable_web_search field properly implemented.

The enable_web_search field is correctly added with:

  • Optional type with None default for inheritance semantics
  • Conditional serialization in to_dict (only when not None) for backwards compatibility
  • Clear documentation of inheritance behavior

218-284: LGTM: SubagentResult expanded with recovery semantics.

The expanded status literals and new fields properly support timeout recovery scenarios:

  • Clear documentation distinguishes "completed_but_timeout" vs "partial" vs "timeout"
  • completion_percentage and warning fields properly typed and serialized
  • Backwards-compatible serialization (conditional inclusion in to_dict)

286-353: LGTM: Factory methods consistently updated with warning parameter.

All factory methods (create_success, create_timeout, create_error) consistently accept an optional warning parameter and properly pass it to the constructor.


468-475: Verify: SubagentPointer doesn't distinguish "partial" status.

Line 470 maps SubagentResult statuses to SubagentPointer statuses. The new "partial" status (indicating partial work with no winner) gets mapped to "failed" because result.success = False and result.status != "timeout". This loses the distinction between partial work and outright failure.

While this may be an intentional simplification (SubagentPointer's status type doesn't include "partial"), consider whether tracking partial completions separately would be valuable for debugging and observability.


477-506: LGTM: SubagentState status expanded consistently.

The status literal in SubagentState is correctly expanded to include "completed_but_timeout" and "partial", maintaining consistency with the SubagentResult status values.


355-414: The parameter type at line 363 is already Optional[Dict[str, Any]] — the correct type for token usage data that includes float values like estimated_cost.

The review comment incorrectly claims line 363 uses Optional[Dict[str, Any]] due to a type mismatch with the field. However, examination of the code shows:

  • The parameter is already Optional[Dict[str, Any]] (line 363)
  • Tests demonstrate token_usage contains float values (e.g., "estimated_cost": 0.048142 at line 57-61 of test file)
  • The manager constructs token_usage with "estimated_cost": costs.get("total_estimated_cost", 0.0), explicitly using float defaults
  • The field definition at line 241 incorrectly declares Dict[str, int], which cannot accommodate these floats

The error message set at line 409 even when success=True is intentional design—the test at line 71 explicitly asserts this behavior ("Still notes it was a timeout").

No changes are needed for this method; the type annotation is correct.

Likely an incorrect or invalid review comment.

Comment on lines +1565 to +1571
# Find most recent log directory
log_dirs = sorted(massgen_logs.glob("log_*"), reverse=True)
for log_dir in log_dirs:
nested_status = log_dir / "turn_1" / "attempt_1" / "status.json"
if nested_status.exists():
status_file = nested_status
break
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessarily true because we could've run another massgen cmd afterwards so most recent doesn't belong here.

@ncrispino ncrispino changed the base branch from main to dev/v0.1.34 January 5, 2026 15:57
@ncrispino ncrispino merged commit d7f5da1 into dev/v0.1.34 Jan 5, 2026
24 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Jan 21, 2026
18 tasks
@coderabbitai coderabbitai bot mentioned this pull request Feb 6, 2026
18 tasks
@coderabbitai coderabbitai bot mentioned this pull request Feb 18, 2026
18 tasks
@coderabbitai coderabbitai bot mentioned this pull request Feb 27, 2026
18 tasks
@coderabbitai coderabbitai bot mentioned this pull request Mar 9, 2026
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant