feat: expose agent as library with live integration tests#74
feat: expose agent as library with live integration tests#74evansenter wants to merge 10 commits intomainfrom
Conversation
- Refactor clemini from pure binary to library+binary architecture - Add src/lib.rs exposing AgentEvent, run_interaction, CleminiToolService - Move logging infrastructure to src/logging.rs with OutputSink trait - Add tests/confirmation_tests.rs with 4 live API tests: - test_destructive_command_requests_confirmation - test_confirmation_approval_executes_command - test_safe_command_no_confirmation - test_confirmation_response_is_semantic - Add semantic validation using Gemini structured output - Add serial_test dependency for global state tests Run live tests with: cargo test --test confirmation_tests -- --include-ignored Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Code ReviewSummaryThis PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into Issues FoundCriticalNone ImportantNone Suggestions
VerdictREQUEST_CHANGES - Minor suggestions for code clarity and robustness. Automated review by Claude Code |
- Drop MutexGuard before awaiting child.kill() in bash tests - Remove unnecessary let binding in init_test_logging - Collapse nested if statements using let chain Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Code ReviewSummaryThis PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into Issues FoundCriticalNone ImportantNone Suggestions
VerdictREQUEST_CHANGES - Four suggestions for code clarity and robustness. The library extraction and test infrastructure look well-designed overall. Automated review by Claude Code |
- Add comment explaining unsafe env var usage in Rust 2024 edition - Expand re-export pattern comment for clarity - Improve semantic validator comment and use eprintln for warnings - Add Makefile comment explaining separate test invocations Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Code ReviewSummaryThis PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into Issues FoundCriticalNone ImportantNone Suggestions
VerdictREQUEST_CHANGES - Four minor suggestions for code clarity and robustness. The library extraction architecture and test infrastructure are well-designed. The changes follow project conventions appropriately. Automated review by Claude Code |
Change tool executing prefix from 🔧 to ┌─ to create visual bracket with tool result line (└─), making tool call output more structured: ┌─ bash command="ls" └─ bash 0.01s ~20 tok Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Code ReviewSummaryThis PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into Issues FoundCriticalNone ImportantNone SuggestionsNone - Previous feedback has been addressed:
VerdictAPPROVE - Code looks good. The library extraction is well-designed with clean separation of concerns. The integration tests provide valuable coverage for the confirmation flow. Previous review suggestions have been addressed with appropriate comments and logging. Automated review by Claude Code |
Change tool executing format from: ┌─ (dimmed) name (cyan) args (dimmed) to: ┌─ name (cyan) args This makes the opening bracket match the closing bracket (└─) and improves readability of tool arguments. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Code ReviewSummaryThis PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into Issues FoundCriticalNone ImportantNone SuggestionsNone Previously Addressed (Filtered)Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:
VerdictAPPROVE - The library extraction is well-designed with clean separation of concerns. The integration tests provide valuable coverage for the confirmation flow. All previous review suggestions have been addressed with appropriate comments and error handling. The code follows project conventions and the architecture maintains the agent isolation principles documented in CLAUDE.md. Automated review by Claude Code |
Fix ordering issues where tools called log_event() directly, causing race conditions with async events. Tools now emit AgentEvent::ToolOutput events via channels, ensuring all output flows through the event system in correct order. Changes: - Add ToolOutput variant to AgentEvent - Add on_tool_output() to EventHandler trait - All 11 tools receive events_tx and emit via emit() helper - Add graceful RwLock poisoning recovery - Add 3 integration test files with semantic validation - Wire integration tests into CI with GEMINI_API_KEY - Add make test-all target for running full test suite Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Code ReviewSummaryThis PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into Issues FoundCriticalNone ImportantNone SuggestionsNone Previously Addressed (Filtered)Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:
VerdictAPPROVE - The library extraction is well-designed with clean separation of concerns. The integration tests provide valuable coverage for the confirmation flow, tool output events, and multi-turn state preservation. All previous review suggestions have been addressed. The code follows project conventions documented in CLAUDE.md. Automated review by Claude Code |
Add .gradle/, bin/, node_modules/, build/, .class files, and package-lock.json patterns for benchmark exercises. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Allow dead_code on EXTENDED_TIMEOUT constant - Remove unnecessary let binding in create_test_tool_service Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Code ReviewSummaryThis PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into Issues FoundCriticalNone ImportantNone SuggestionsNone Previously Addressed (Filtered)Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:
VerdictAPPROVE - The library extraction is well-designed with clean separation of concerns. The integration tests provide valuable coverage for the confirmation flow, tool output events, and multi-turn state preservation. All previous review suggestions have been addressed. The code follows project conventions documented in CLAUDE.md, including the event-driven architecture pattern where tools emit Automated review by Claude Code |
Code ReviewSummaryThis PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into Issues FoundCriticalNone ImportantNone SuggestionsNone Previously Addressed (Filtered)Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:
VerdictAPPROVE - The library extraction is well-designed with clean separation of concerns. The integration tests provide valuable coverage for the confirmation flow, tool output events, and multi-turn state preservation. All previous review suggestions have been addressed. The code follows project conventions documented in CLAUDE.md, including the event-driven architecture pattern where tools emit Automated review by Claude Code |
- Add --reset flag to restore exercises to clean state before running - Warn users when exercises have uncommitted changes from previous runs - Add -y/--yes flag to skip confirmation prompts - Show list of modified files when dirty state detected Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Code ReviewSummaryThis PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into Issues FoundCriticalNone ImportantNone SuggestionsNone Previously Addressed (Filtered)Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:
VerdictAPPROVE - The library extraction is well-designed with clean separation of concerns. The new Automated review by Claude Code |
The test_file_not_found_helpful_response test can take longer than 60s because the model may try multiple approaches (glob, read) before giving up and suggesting alternatives. Use EXTENDED_TIMEOUT (120s) to avoid flaky CI failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Code ReviewSummaryThis PR refactors clemini from a pure binary to a library+binary architecture, exposing core functionality through Issues FoundCriticalNone ImportantNone SuggestionsNone Previously Addressed (Filtered)Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:
VerdictAPPROVE - The library extraction is well-designed with clean separation of concerns. The Automated review by Claude Code |
Summary
Changes
AgentEvent,run_interaction,CleminiToolServiceOutputSinktraitTest plan
make clippypassesmake fmtis cleanmake testpasses (237 tests)cargo test --test confirmation_tests -- --include-ignored --nocapture🤖 Generated with Claude Code