Skip to content

feat: add task tool for spawning subagents#80

Merged
evansenter merged 17 commits intomainfrom
feat/task-tool
Jan 19, 2026
Merged

feat: add task tool for spawning subagents#80
evansenter merged 17 commits intomainfrom
feat/task-tool

Conversation

@evansenter
Copy link
Copy Markdown
Owner

Summary

  • Add task tool that spawns clemini subprocesses to handle delegated work
  • Supports foreground (wait for result) and background (fire-and-forget) modes
  • Reuses BACKGROUND_TASKS registry so kill_shell works for both bash and task processes

Changes

  • New src/tools/task.rs with TaskTool implementation
  • Make NEXT_TASK_ID pub(crate) in bash.rs for reuse
  • Register TaskTool in CleminiToolService

Design Decisions

Decision Choice Rationale
Registry Reuse BACKGROUND_TASKS KillShell works for both, unified management
Invocation Configurable current_exe() first, cargo run fallback
Output capture Fire-and-forget (MVP) Matches bash behavior, #79 tracks enhancement

Test plan

  • Unit tests for get_clemini_command() and declaration
  • make clippy passes
  • make fmt passes
  • make test passes
  • Manual testing with foreground/background modes

Closes #78

🤖 Generated with Claude Code

evansenter and others added 11 commits January 18, 2026 23:12
- Refactor clemini from pure binary to library+binary architecture
- Add src/lib.rs exposing AgentEvent, run_interaction, CleminiToolService
- Move logging infrastructure to src/logging.rs with OutputSink trait
- Add tests/confirmation_tests.rs with 4 live API tests:
  - test_destructive_command_requests_confirmation
  - test_confirmation_approval_executes_command
  - test_safe_command_no_confirmation
  - test_confirmation_response_is_semantic
- Add semantic validation using Gemini structured output
- Add serial_test dependency for global state tests

Run live tests with: cargo test --test confirmation_tests -- --include-ignored

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Drop MutexGuard before awaiting child.kill() in bash tests
- Remove unnecessary let binding in init_test_logging
- Collapse nested if statements using let chain

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add comment explaining unsafe env var usage in Rust 2024 edition
- Expand re-export pattern comment for clarity
- Improve semantic validator comment and use eprintln for warnings
- Add Makefile comment explaining separate test invocations

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change tool executing prefix from 🔧 to ┌─ to create visual
bracket with tool result line (└─), making tool call output
more structured:

┌─ bash command="ls"
└─ bash 0.01s ~20 tok

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Change tool executing format from:
  ┌─ (dimmed) name (cyan) args (dimmed)
to:
  ┌─ name (cyan) args

This makes the opening bracket match the closing bracket (└─)
and improves readability of tool arguments.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Fix ordering issues where tools called log_event() directly, causing
race conditions with async events. Tools now emit AgentEvent::ToolOutput
events via channels, ensuring all output flows through the event system
in correct order.

Changes:
- Add ToolOutput variant to AgentEvent
- Add on_tool_output() to EventHandler trait
- All 11 tools receive events_tx and emit via emit() helper
- Add graceful RwLock poisoning recovery
- Add 3 integration test files with semantic validation
- Wire integration tests into CI with GEMINI_API_KEY
- Add make test-all target for running full test suite

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add .gradle/, bin/, node_modules/, build/, .class files, and
package-lock.json patterns for benchmark exercises.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Allow dead_code on EXTENDED_TIMEOUT constant
- Remove unnecessary let binding in create_test_tool_service

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add --reset flag to restore exercises to clean state before running
- Warn users when exercises have uncommitted changes from previous runs
- Add -y/--yes flag to skip confirmation prompts
- Show list of modified files when dirty state detected

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The test_file_not_found_helpful_response test can take longer than
60s because the model may try multiple approaches (glob, read) before
giving up and suggesting alternatives. Use EXTENDED_TIMEOUT (120s) to
avoid flaky CI failures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add a task tool that spawns clemini subprocesses to handle delegated
work. Supports foreground (wait for result) and background (fire-and-
forget) modes.

Key changes:
- New src/tools/task.rs with TaskTool implementation
- Make NEXT_TASK_ID pub(crate) for reuse across tools
- Register TaskTool in CleminiToolService

Design decisions:
- Reuses BACKGROUND_TASKS registry so kill_shell works for both
- Configurable invocation: tries current_exe(), falls back to cargo run
- Fire-and-forget MVP; output capture tracked in #79

Closes #78

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Jan 19, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR adds a task tool for spawning clemini subprocesses to handle delegated work, supporting both foreground (wait for result) and background (fire-and-forget) modes. It also refactors event handling to use AgentEvent::ToolOutput for tool visual output, adds integration tests with semantic validation, and updates CI to run integration tests.

Issues Found

Critical

None

Important

  • src/tools/task.rs:72-98 - Background tasks inherit parent's stdout/stderr set to null, but foreground mode captures output. The foreground subagent's streaming output to stderr will be captured and returned in the stderr field, but this may contain the assistant's response text mixed with tool output. Consider documenting this behavior or separating response text from debug/tool output in the returned JSON.

  • src/tools/task.rs:65 - The --cwd flag is passed to the subagent, but the CLAUDE.md or issue Implement Task tool for spawning subagents #78 doesn't specify if the subagent should have the same allowed_paths restrictions. Currently it would default to the cwd which may be more permissive than intended.

Suggestions

  • src/tools/task.rs:27-34 - The get_clemini_command() fallback to cargo run --quiet only makes sense during development. In production, if current_exe() fails (which is unusual), this fallback would attempt to compile the project. Consider logging a warning or returning an error instead of silently falling back to cargo.

  • src/tools/task.rs:89 - The stdin(Stdio::null()) for foreground mode means the subagent cannot use ask_user tool since it needs stdin. This limitation should be documented in the tool description.

  • tests/common/mod.rs:99-106 - The #[allow(dead_code)] on get_api_key and other functions suggests these may be unused in some test files. Consider moving only shared utilities here and keeping test-specific helpers in their respective test files.

  • src/tools/mod.rs:63-89 - The set_events_tx and events_tx() methods use interior mutability with RwLock. While this works, the lock is held very briefly. Consider documenting why RwLock was chosen over Mutex (reader performance isn't the reason since writes are also frequent).

Verdict

REQUEST_CHANGES - The task tool implementation is solid overall, but there are a couple of important issues around subagent behavior (output capture semantics, path restrictions) that should be clarified either in code or documentation before merging.


Automated review by Claude Code

- Add tracing::warn when falling back to cargo run (clarifies dev vs prod)
- Document stdin limitation in tool description (ask_user not available)
- Add code comment explaining allowed_paths inheritance is intentional

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@evansenter
Copy link
Copy Markdown
Owner Author

Feedback Addressed

Implemented

  • [Important] Documented paths/sandbox behavior - Added code comment explaining allowed_paths inheritance is intentional (subagent operates independently)
  • [Suggestion] Added tracing::warn for cargo run fallback - Clarifies this is expected during development but indicates an issue in production
  • [Suggestion] Documented stdin limitation - Added to tool description that subagents cannot use interactive tools (ask_user)

Skipped

  • [Important] Output capture behavior - This is intentional MVP behavior. Fire-and-forget for background mode matches bash tool. Issue Add TaskOutput tool for retrieving background task results #79 tracks streaming output enhancement.
  • [Suggestion] cargo run fallback coverage - This fallback only triggers in edge cases (broken install, deleted binary). The warning log makes the behavior visible for debugging.

@claude
Copy link
Copy Markdown

claude bot commented Jan 19, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR adds a task tool for spawning clemini subprocesses (subagents) to handle delegated work, supporting both foreground and background modes. It also refactors event handling to use AgentEvent::ToolOutput for tool visual output, adds semantic integration tests with a common test framework, updates CI to run integration tests, and makes various visual output improvements.

Issues Found

Critical

None

Important

None

Suggestions

  • src/tools/mod.rs:63-89 - The set_events_tx and events_tx() methods use interior mutability with RwLock. The comments explain what but not why RwLock was chosen over Mutex. Since write operations are frequent (before/after every interaction) and reads only happen during tool creation, reader-writer lock performance benefits don't apply. Consider adding a brief comment explaining the choice, or using Mutex for simplicity since the lock is held very briefly in both cases.

  • tests/common/mod.rs:99-106 - Several functions have #[allow(dead_code)] annotations (get_api_key, create_tool_service, create_temp_dir, init_test_logging). While these are genuinely used across different test files, the pattern suggests these might benefit from being marked pub(crate) or having the test modules explicitly import them to suppress the warnings organically rather than with attributes.

Previously Addressed (Filtered)

  • src/tools/task.rs:65 - paths/sandbox behavior (Implemented: added code comment explaining intentional behavior)
  • src/tools/task.rs:27-34 - cargo run fallback (Implemented: added tracing::warn for visibility)
  • Tool description - stdin limitation (Implemented: documented that subagents cannot use ask_user)
  • Output capture behavior (Skipped: intentional MVP, tracked in Add TaskOutput tool for retrieving background task results #79)

4 items from prior feedback rounds were not re-raised.

Verdict

REQUEST_CHANGES - Two minor suggestions around code clarity. The implementation is solid and addresses the linked issue #78 well.


Automated review by Claude Code

evansenter and others added 3 commits January 19, 2026 10:45
- Document task tool parameters, returns, limitations
- Update kill_shell to mention it works with task tool too
- Add task tool use cases to "When to Use Which Tool" table

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Jan 19, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR adds a task tool for spawning clemini subprocesses (subagents) to handle delegated work, supporting both foreground and background modes. It also refactors event handling so tools emit AgentEvent::ToolOutput instead of calling log_event() directly, adds a library crate for integration tests with semantic validation, updates CI to run integration tests, and includes various visual output improvements.

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

  • src/tools/task.rs:65 - Paths/sandbox behavior (Implemented: added code comment explaining intentional behavior)
  • src/tools/task.rs:27-34 - Cargo run fallback warning (Implemented: added tracing::warn for visibility)
  • Tool description - stdin limitation (Implemented: documented that subagents cannot use ask_user)
  • Output capture behavior (Skipped: intentional MVP, tracked in Add TaskOutput tool for retrieving background task results #79)
  • src/tools/mod.rs:63-89 - RwLock vs Mutex choice (Implemented: prior feedback adequately addressed; the choice is reasonable for the use case)
  • tests/common/mod.rs - #[allow(dead_code)] annotations (Prior feedback: these are genuinely used across different test files via mod common; the pattern is acceptable)

6 items from prior feedback rounds were not re-raised.

Verdict

APPROVE - The implementation is solid and addresses the linked issue #78 well. The task tool design is clean, the event-driven refactoring for tool output is well-structured, and the integration test framework provides good coverage. Previous feedback has been addressed appropriately.


Automated review by Claude Code

@claude
Copy link
Copy Markdown

claude bot commented Jan 19, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR adds a task tool for spawning clemini subprocesses (subagents) to handle delegated work, supporting both foreground and background modes. It also refactors tool output to use AgentEvent::ToolOutput for proper event ordering, adds a library crate for integration tests with semantic validation, updates CI to run integration tests, and includes visual output improvements (replacing emoji with box-drawing characters).

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

  • src/tools/task.rs:65 - Paths/sandbox behavior (Implemented: added code comment explaining intentional behavior)
  • src/tools/task.rs:27-34 - Cargo run fallback warning (Implemented: added tracing::warn for visibility)
  • Tool description - stdin limitation (Implemented: documented that subagents cannot use ask_user)
  • Output capture behavior (Skipped: intentional MVP, tracked in Add TaskOutput tool for retrieving background task results #79)
  • src/tools/mod.rs:63-89 - RwLock vs Mutex choice (Prior feedback: reasonable for the use case)
  • tests/common/mod.rs - #[allow(dead_code)] annotations (Prior feedback: acceptable pattern for shared test utilities)

6 items from prior feedback rounds were not re-raised.

Verdict

APPROVE - The implementation is solid and fully addresses the linked issue #78. The task tool design is clean, reusing the existing BACKGROUND_TASKS registry for unified process management. The event-driven refactoring ensures correct output ordering across all UI modes. The integration test framework with semantic validation provides meaningful coverage. All prior feedback has been appropriately addressed.


Automated review by Claude Code

Makes MCP config portable across machines by using $HOME instead of
hardcoded absolute path.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Jan 19, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR adds a task tool for spawning clemini subprocesses (subagents) to handle delegated work, supporting both foreground and background modes. It also refactors tool output to use AgentEvent::ToolOutput for proper event ordering, creates a library crate for integration tests with semantic validation, updates CI to run integration tests, and replaces the emoji tool indicator (🔧) with box-drawing characters (┌─).

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

  • src/tools/task.rs:65 - Paths/sandbox behavior (Implemented: added code comment explaining intentional behavior - subagent operates independently)
  • src/tools/task.rs:27-34 - Cargo run fallback warning (Implemented: added tracing::warn for visibility)
  • Tool description - stdin limitation (Implemented: documented that subagents cannot use ask_user)
  • Output capture behavior (Skipped: intentional MVP, tracked in Add TaskOutput tool for retrieving background task results #79)
  • src/tools/mod.rs:63-89 - RwLock vs Mutex choice (Prior feedback: reasonable for the use case)
  • tests/common/mod.rs - #[allow(dead_code)] annotations (Prior feedback: acceptable pattern for shared test utilities used via mod common)

6 items from prior feedback rounds were not re-raised.

Verdict

APPROVE - The implementation is solid and fully addresses the linked issue #78. Key strengths:

  1. Clean task tool design - Reuses the existing BACKGROUND_TASKS registry for unified process management, allowing kill_shell to work for both bash and task processes
  2. Event-driven refactoring - Tools now emit AgentEvent::ToolOutput instead of calling log_event() directly, ensuring correct output ordering across all UI modes (Terminal, TUI, MCP)
  3. Comprehensive test coverage - New integration tests with semantic validation (validate_response_semantically()) provide meaningful coverage without brittle string assertions
  4. Well-documented limitations - Tool description clearly states that subagents cannot use interactive tools (ask_user) and have their own sandbox
  5. All prior feedback appropriately addressed - Either implemented or consciously skipped with tracked follow-up (Add TaskOutput tool for retrieving background task results #79)

Automated review by Claude Code

Machine-specific paths don't work with $HOME expansion, so keep .mcp.json
gitignored and provide an example template for users to copy.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Jan 19, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR adds a task tool for spawning clemini subprocesses (subagents) to handle delegated work, supporting both foreground and background modes. It also refactors tool output to use AgentEvent::ToolOutput for proper event ordering, creates a library crate for integration tests with semantic validation, updates CI to run integration tests, and replaces the emoji tool indicator (🔧) with box-drawing characters (┌─).

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

  • src/tools/task.rs:65 - Paths/sandbox behavior (Implemented: added code comment explaining intentional behavior - subagent operates independently)
  • src/tools/task.rs:27-34 - Cargo run fallback warning (Implemented: added tracing::warn for visibility)
  • Tool description - stdin limitation (Implemented: documented that subagents cannot use ask_user)
  • Output capture behavior (Skipped: intentional MVP, tracked in Add TaskOutput tool for retrieving background task results #79)
  • src/tools/mod.rs:63-89 - RwLock vs Mutex choice (Prior feedback: reasonable for the use case)
  • tests/common/mod.rs - #[allow(dead_code)] annotations (Prior feedback: acceptable pattern for shared test utilities used via mod common)

6 items from prior feedback rounds were not re-raised.

Verdict

APPROVE - The implementation is solid and fully addresses the linked issue #78. Key strengths:

  1. Clean task tool design - Reuses the existing BACKGROUND_TASKS registry for unified process management, allowing kill_shell to work for both bash and task processes
  2. Event-driven refactoring - Tools now emit AgentEvent::ToolOutput instead of calling log_event() directly, ensuring correct output ordering across all UI modes (Terminal, TUI, MCP)
  3. Comprehensive test coverage - New integration tests with semantic validation (validate_response_semantically()) provide meaningful coverage without brittle string assertions
  4. Well-documented limitations - Tool description clearly states that subagents cannot use interactive tools (ask_user) and have their own sandbox
  5. All prior feedback appropriately addressed - Either implemented or consciously skipped with tracked follow-up (Add TaskOutput tool for retrieving background task results #79)

Automated review by Claude Code

@evansenter evansenter merged commit fd0fa4e into main Jan 19, 2026
8 checks passed
@evansenter evansenter deleted the feat/task-tool branch January 19, 2026 10:53
@claude
Copy link
Copy Markdown

claude bot commented Jan 19, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR adds a task tool for spawning clemini subprocesses (subagents) to handle delegated work, supporting both foreground and background modes. It also refactors tool output to use AgentEvent::ToolOutput for proper event ordering, creates a library crate for integration tests with semantic validation, updates CI to run integration tests, and replaces the emoji tool indicator (🔧) with box-drawing characters (┌─).

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

  • src/tools/task.rs:65 - Paths/sandbox behavior (Implemented: added code comment explaining intentional behavior - subagent operates independently)
  • src/tools/task.rs:27-34 - Cargo run fallback warning (Implemented: added tracing::warn for visibility)
  • Tool description - stdin limitation (Implemented: documented that subagents cannot use ask_user)
  • Output capture behavior (Skipped: intentional MVP, tracked in Add TaskOutput tool for retrieving background task results #79)
  • src/tools/mod.rs:63-89 - RwLock vs Mutex choice (Prior feedback: reasonable for the use case)
  • tests/common/mod.rs - #[allow(dead_code)] annotations (Prior feedback: acceptable pattern for shared test utilities used via mod common)

6 items from prior feedback rounds were not re-raised.

Verdict

APPROVE - The implementation is solid and fully addresses the linked issue #78. Key strengths:

  1. Clean task tool design - Reuses the existing BACKGROUND_TASKS registry for unified process management, allowing kill_shell to work for both bash and task processes
  2. Event-driven refactoring - Tools now emit AgentEvent::ToolOutput instead of calling log_event() directly, ensuring correct output ordering across all UI modes (Terminal, TUI, MCP)
  3. Comprehensive test coverage - New integration tests with semantic validation (validate_response_semantically()) provide meaningful coverage without brittle string assertions
  4. Well-documented limitations - Tool description clearly states that subagents cannot use interactive tools (ask_user) and have their own sandbox
  5. All prior feedback appropriately addressed - Either implemented or consciously skipped with tracked follow-up (Add TaskOutput tool for retrieving background task results #79)

Automated review by Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement Task tool for spawning subagents

1 participant