feat: expose agent as library with live integration tests by evansenter · Pull Request #74 · evansenter/clemini

evansenter · 2026-01-18T23:13:02Z

Summary

Refactors clemini from pure binary to library+binary architecture
Adds live integration tests for the confirmation flow using Gemini API
Adds semantic validation capability using Gemini structured output

Changes

src/lib.rs: New library crate exposing AgentEvent, run_interaction, CleminiToolService
src/logging.rs: Moved logging infrastructure with OutputSink trait
tests/confirmation_tests.rs: 4 live API tests for confirmation flow
tests/common/mod.rs: Shared test utilities including semantic validation

Test plan

make clippy passes
make fmt is clean
make test passes (237 tests)
Live tests pass: cargo test --test confirmation_tests -- --include-ignored --nocapture

🤖 Generated with Claude Code

- Refactor clemini from pure binary to library+binary architecture - Add src/lib.rs exposing AgentEvent, run_interaction, CleminiToolService - Move logging infrastructure to src/logging.rs with OutputSink trait - Add tests/confirmation_tests.rs with 4 live API tests: - test_destructive_command_requests_confirmation - test_confirmation_approval_executes_command - test_safe_command_no_confirmation - test_confirmation_response_is_semantic - Add semantic validation using Gemini structured output - Add serial_test dependency for global state tests Run live tests with: cargo test --test confirmation_tests -- --include-ignored Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2026-01-18T23:14:03Z

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs, enabling integration tests. It adds 4 live API tests for the confirmation flow with a semantic validation utility using Gemini structured output.

Issues Found

Critical

None

Important

None

Suggestions

src/logging.rs:187,201,220,223 - The tests use unsafe blocks for std::env::set_var and std::env::remove_var which is sound here but unconventional. Consider using a crate like temp-env or documenting why unsafe is needed (Rust 2024 made env var manipulation unsafe due to thread-safety concerns).
tests/common/mod.rs:873-876 - The semantic validator defaults to true (valid) when parsing fails. This could mask validation errors. Consider returning Err or at least logging at warn level rather than just printing.
src/main.rs:281-284 - The re-export pattern pub(crate) mod logging { pub use clemini::logging::*; } is a bit unusual. A comment explaining this is for crate::logging:: compatibility with mcp.rs would help future readers understand the indirection.
Makefile:15-17 - The make test target now runs three separate cargo test commands sequentially. This works but is slower than cargo test --all-targets which runs lib, bin, and integration tests in one invocation. Consider if the separation is intentional (e.g., for isolation) or if a single command would suffice.

Verdict

REQUEST_CHANGES - Minor suggestions for code clarity and robustness.

Automated review by Claude Code

- Drop MutexGuard before awaiting child.kill() in bash tests - Remove unnecessary let binding in init_test_logging - Collapse nested if statements using let chain Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2026-01-18T23:16:56Z

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs, enabling integration testing. It adds 4 live API tests for the confirmation flow with a semantic validation utility.

Issues Found

Critical

None

Important

None

Suggestions

src/logging.rs:187,201,220,223 - The tests use unsafe blocks for std::env::set_var and std::env::remove_var. This is correct for Rust 2024 edition but could benefit from a brief comment explaining why unsafe is needed (thread-safety concerns in stdlib). Alternatively, consider the temp-env crate for a safer abstraction.
tests/common/mod.rs:993-994 - The semantic validator defaults to true (valid) when JSON parsing fails (.unwrap_or(true)). This design choice could mask validation errors silently. Consider either returning an error, logging at warn level instead of just println, or documenting this permissive behavior as intentional to avoid blocking tests on validator issues.
src/main.rs:281-284 - The re-export pattern pub(crate) mod logging { pub use clemini::logging::*; } enables crate::logging:: access from mcp.rs. A one-line comment explaining this indirection would help future readers understand why the module wrapper exists rather than direct clemini::logging imports.
Makefile:15-17 - The make test target now runs three separate cargo test invocations sequentially (--lib, --bin clemini, --test confirmation_tests). This works but is slower than a single cargo test --all-targets. If the separation is intentional for isolation or clarity, a comment would help; otherwise, consolidating would speed up CI.

Verdict

REQUEST_CHANGES - Four suggestions for code clarity and robustness. The library extraction and test infrastructure look well-designed overall.

Automated review by Claude Code

- Add comment explaining unsafe env var usage in Rust 2024 edition - Expand re-export pattern comment for clarity - Improve semantic validator comment and use eprintln for warnings - Add Makefile comment explaining separate test invocations Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2026-01-18T23:19:40Z

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs, enabling integration tests. It adds 4 live API tests for the confirmation flow with a semantic validation utility using Gemini structured output.

Issues Found

Critical

None

Important

None

Suggestions

src/logging.rs:186-236 - The tests use unsafe blocks for std::env::set_var and std::env::remove_var. While the code is correct for Rust 2024 edition and the #[serial] attribute makes it sound, a brief comment explaining the reason for unsafe (thread-safety concerns in stdlib) was added at line 186-188 which is good. However, consider using the temp-env crate for a cleaner abstraction that handles environment variable cleanup automatically.
tests/common/mod.rs:1003 - The semantic validator defaults to true (valid) when JSON parsing fails (.unwrap_or(true)). While the comment at lines 1012-1015 explains this is intentional to avoid blocking tests on validator issues, consider logging at warn level via eprintln! consistently rather than mixing println! (line 1007) with eprintln! (line 1015). The permissive default could mask subtle validation failures.
src/main.rs:288-293 - The re-export pattern pub(crate) mod logging { pub use clemini::logging::*; } has a good comment explaining the indirection. This is a reasonable workaround for the crate:: vs clemini:: path discrepancy, but consider whether future refactoring to use clemini::logging directly in mcp.rs would be cleaner (though it would require changing many call sites).
Makefile:15-20 - The make test target now runs three separate cargo test invocations. The comment explains this is for "clearer output when one type fails" which is reasonable. However, this triples the test compilation overhead. If CI performance becomes a concern, cargo test --all-targets in a single invocation would be faster.

Verdict

REQUEST_CHANGES - Four minor suggestions for code clarity and robustness. The library extraction architecture and test infrastructure are well-designed. The changes follow project conventions appropriately.

Automated review by Claude Code

Change tool executing prefix from 🔧 to ┌─ to create visual bracket with tool result line (└─), making tool call output more structured: ┌─ bash command="ls" └─ bash 0.01s ~20 tok Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2026-01-18T23:39:18Z

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs, enabling integration testing. It adds 4 live API tests for the confirmation flow with a semantic validation utility using Gemini structured output.

Issues Found

Critical

None

Important

None

Suggestions

None - Previous feedback has been addressed:

src/logging.rs:277-279 now has a clear comment explaining why unsafe is needed for env var manipulation in Rust 2024 and why #[serial] makes it sound
src/main.rs:279-284 now has a comment explaining the re-export pattern for crate::logging:: compatibility
tests/common/mod.rs:1216-1219 includes an eprintln! warning when parse fails, making the permissive default visible
Makefile:40-41 has a comment explaining the rationale for separate test invocations

Verdict

APPROVE - Code looks good. The library extraction is well-designed with clean separation of concerns. The integration tests provide valuable coverage for the confirmation flow. Previous review suggestions have been addressed with appropriate comments and logging.

Automated review by Claude Code

Change tool executing format from: ┌─ (dimmed) name (cyan) args (dimmed) to: ┌─ name (cyan) args This makes the opening bracket match the closing bracket (└─) and improves readability of tool arguments. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2026-01-19T00:00:41Z

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs. It adds 4 live API integration tests for the confirmation flow with a semantic validation utility using Gemini structured output, and includes minor formatting improvements (emoji → box-drawing character for tool output).

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:

src/logging.rs:277-279 - Comment explaining unsafe for env var manipulation (Implemented)
src/main.rs:279-284 - Comment explaining re-export pattern for crate::logging:: compatibility (Implemented)
tests/common/mod.rs:1216-1219 - eprintln! warning when semantic validator parse fails (Implemented)
Makefile:40-41 - Comment explaining rationale for separate test invocations (Implemented)

Verdict

APPROVE - The library extraction is well-designed with clean separation of concerns. The integration tests provide valuable coverage for the confirmation flow. All previous review suggestions have been addressed with appropriate comments and error handling. The code follows project conventions and the architecture maintains the agent isolation principles documented in CLAUDE.md.

Automated review by Claude Code

Fix ordering issues where tools called log_event() directly, causing race conditions with async events. Tools now emit AgentEvent::ToolOutput events via channels, ensuring all output flows through the event system in correct order. Changes: - Add ToolOutput variant to AgentEvent - Add on_tool_output() to EventHandler trait - All 11 tools receive events_tx and emit via emit() helper - Add graceful RwLock poisoning recovery - Add 3 integration test files with semantic validation - Wire integration tests into CI with GEMINI_API_KEY - Add make test-all target for running full test suite Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2026-01-19T03:39:31Z

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs. It adds comprehensive integration tests for the confirmation flow, tool output events, and multi-turn interactions using semantic validation via Gemini structured output. Visual formatting is improved by replacing the tool emoji (🔧) with box-drawing characters (┌─).

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:

src/logging.rs:277-279 - Comment explaining unsafe for env var manipulation (Implemented)
src/main.rs:279-284 - Comment explaining re-export pattern for crate::logging:: compatibility (Implemented)
tests/common/mod.rs:1216-1219 - eprintln! warning when semantic validator parse fails (Implemented)
Makefile:40-41 - Comment explaining rationale for separate test invocations (Implemented)

Verdict

APPROVE - The library extraction is well-designed with clean separation of concerns. The integration tests provide valuable coverage for the confirmation flow, tool output events, and multi-turn state preservation. All previous review suggestions have been addressed. The code follows project conventions documented in CLAUDE.md.

Automated review by Claude Code

Add .gradle/, bin/, node_modules/, build/, .class files, and package-lock.json patterns for benchmark exercises. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Allow dead_code on EXTENDED_TIMEOUT constant - Remove unnecessary let binding in create_test_tool_service Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2026-01-19T03:42:11Z

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs. It adds comprehensive integration tests for the confirmation flow, tool output events, and multi-turn state preservation using semantic validation via Gemini structured output. Visual formatting is improved by replacing the tool emoji (🔧) with box-drawing characters (┌─) for cleaner output.

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:

src/logging.rs:277-279 - Comment explaining unsafe for env var manipulation in Rust 2024 (Implemented)
src/main.rs:279-284 - Comment explaining re-export pattern for crate::logging:: compatibility (Implemented)
tests/common/mod.rs:1216-1219 - eprintln! warning when semantic validator parse fails (Implemented)
Makefile:40-41 - Comment explaining rationale for separate test invocations (Implemented)

Verdict

APPROVE - The library extraction is well-designed with clean separation of concerns. The integration tests provide valuable coverage for the confirmation flow, tool output events, and multi-turn state preservation. All previous review suggestions have been addressed. The code follows project conventions documented in CLAUDE.md, including the event-driven architecture pattern where tools emit AgentEvent::ToolOutput for visual output instead of calling log_event() directly.

Automated review by Claude Code

claude · 2026-01-19T03:43:19Z

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs. It introduces comprehensive integration tests for the confirmation flow, tool output events, semantic validation, and multi-turn state preservation. Visual output is improved by replacing the tool emoji (🔧) with box-drawing characters (┌─), and tools now emit AgentEvent::ToolOutput events for proper output ordering.

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:

src/logging.rs:277-279 - Comment explaining unsafe for env var manipulation in Rust 2024 (Implemented)
src/main.rs:279-284 - Comment explaining re-export pattern for crate::logging:: compatibility (Implemented)
tests/common/mod.rs:1216-1219 - eprintln! warning when semantic validator parse fails (Implemented)
Makefile:40-41 - Comment explaining rationale for separate test invocations (Implemented)

Verdict

APPROVE - The library extraction is well-designed with clean separation of concerns. The integration tests provide valuable coverage for the confirmation flow, tool output events, and multi-turn state preservation. All previous review suggestions have been addressed. The code follows project conventions documented in CLAUDE.md, including the event-driven architecture pattern where tools emit AgentEvent::ToolOutput for visual output instead of calling log_event() directly. The emit() helper pattern documented in CLAUDE.md is consistently implemented across all tools.

Automated review by Claude Code

- Add --reset flag to restore exercises to clean state before running - Warn users when exercises have uncommitted changes from previous runs - Add -y/--yes flag to skip confirmation prompts - Show list of modified files when dirty state detected Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2026-01-19T03:45:12Z

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs. It introduces comprehensive integration tests with semantic validation using Gemini structured output, adds the AgentEvent::ToolOutput variant for proper event ordering, and updates visual formatting (replacing 🔧 with ┌─ for tool output). All tools now emit events through the channel instead of calling log_event() directly, ensuring correct output ordering.

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:

src/logging.rs:277-279 - Comment explaining unsafe for env var manipulation in Rust 2024 (Implemented)
src/main.rs:279-284 - Comment explaining re-export pattern for crate::logging:: compatibility (Implemented)
tests/common/mod.rs:1216-1219 - eprintln! warning when semantic validator parse fails (Implemented)
Makefile:40-41 - Comment explaining rationale for separate test invocations (Implemented)

Verdict

APPROVE - The library extraction is well-designed with clean separation of concerns. The new AgentEvent::ToolOutput variant and emit() helper pattern follow the architecture principles documented in CLAUDE.md. Integration tests provide valuable coverage for confirmation flow, tool output events, and multi-turn state preservation. Visual formatting improvements (┌─ instead of 🔧) create cleaner output. All previous review suggestions have been addressed. The code follows Rust 2024 edition idioms and project conventions.

Automated review by Claude Code

The test_file_not_found_helpful_response test can take longer than 60s because the model may try multiple approaches (glob, read) before giving up and suggesting alternatives. Use EXTENDED_TIMEOUT (120s) to avoid flaky CI failures. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

claude · 2026-01-19T03:52:44Z

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture, exposing core functionality through src/lib.rs and src/logging.rs for integration testing. It introduces comprehensive integration tests with semantic validation using Gemini structured output, adds the AgentEvent::ToolOutput variant for proper event ordering, and updates visual formatting (replacing 🔧 with ┌─ box-drawing characters). All tools now emit events through the channel instead of calling log_event() directly, following the documented architecture principles.

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:

src/logging.rs:277-279 - Comment explaining unsafe for env var manipulation in Rust 2024 (Implemented)
src/main.rs:279-284 - Comment explaining re-export pattern for crate::logging:: compatibility (Implemented)
tests/common/mod.rs:1216-1219 - eprintln! warning when semantic validator parse fails (Implemented)
Makefile:40-41 - Comment explaining rationale for separate test invocations (Implemented)

Verdict

APPROVE - The library extraction is well-designed with clean separation of concerns. The AgentEvent::ToolOutput variant and emit() helper pattern follow the architecture principles documented in CLAUDE.md (tool output via events, not log_event() directly). Integration tests provide valuable coverage for confirmation flow, tool output events, semantic validation, and multi-turn state preservation. Visual formatting improvements create cleaner terminal output. All previous review suggestions have been addressed.

Automated review by Claude Code

evansenter · 2026-01-22T02:08:12Z

Closing as stale - this work was implemented incrementally through PRs #80, #84, and #87. The library architecture and integration test infrastructure now exist on main.

fix: resolve clippy warnings in tests

bfab2f7

- Drop MutexGuard before awaiting child.kill() in bash tests - Remove unnecessary let binding in init_test_logging - Collapse nested if statements using let chain Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

evansenter and others added 2 commits January 19, 2026 03:40

chore: ignore benchmark build artifacts

770e0d3

Add .gradle/, bin/, node_modules/, build/, .class files, and package-lock.json patterns for benchmark exercises. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

fix: resolve clippy warnings in tests

fac6e0b

- Allow dead_code on EXTENDED_TIMEOUT constant - Remove unnecessary let binding in create_test_tool_service Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

evansenter closed this Jan 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expose agent as library with live integration tests#74

feat: expose agent as library with live integration tests#74
evansenter wants to merge 10 commits intomainfrom
feat/library-integration-tests

evansenter commented Jan 18, 2026

Uh oh!

claude bot commented Jan 18, 2026

Uh oh!

claude bot commented Jan 18, 2026

Uh oh!

claude bot commented Jan 18, 2026

Uh oh!

claude bot commented Jan 18, 2026

Uh oh!

claude bot commented Jan 19, 2026

Uh oh!

claude bot commented Jan 19, 2026

Uh oh!

claude bot commented Jan 19, 2026

Uh oh!

claude bot commented Jan 19, 2026

Uh oh!

claude bot commented Jan 19, 2026

Uh oh!

claude bot commented Jan 19, 2026

Uh oh!

evansenter commented Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

evansenter commented Jan 18, 2026

Summary

Changes

Test plan

Uh oh!

claude bot commented Jan 18, 2026

Code Review

Summary

Issues Found

Critical

Important

Suggestions

Verdict

Uh oh!

claude bot commented Jan 18, 2026

Code Review

Summary

Issues Found

Critical

Important

Suggestions

Verdict

Uh oh!

claude bot commented Jan 18, 2026

Code Review

Summary

Issues Found

Critical

Important

Suggestions

Verdict

Uh oh!

claude bot commented Jan 18, 2026

Code Review

Summary

Issues Found

Critical

Important

Suggestions

Verdict

Uh oh!

claude bot commented Jan 19, 2026

Code Review

Summary

Issues Found

Critical

Important

Suggestions

Previously Addressed (Filtered)

Verdict

Uh oh!

claude bot commented Jan 19, 2026

Code Review

Summary

Issues Found

Critical

Important

Suggestions

Previously Addressed (Filtered)

Verdict

Uh oh!

claude bot commented Jan 19, 2026

Code Review

Summary

Issues Found

Critical

Important

Suggestions

Previously Addressed (Filtered)

Verdict

Uh oh!

claude bot commented Jan 19, 2026

Code Review

Summary

Issues Found

Critical

Important

Suggestions

Previously Addressed (Filtered)