Skip to content

feat: expose agent as library with live integration tests#74

Closed
evansenter wants to merge 10 commits intomainfrom
feat/library-integration-tests
Closed

feat: expose agent as library with live integration tests#74
evansenter wants to merge 10 commits intomainfrom
feat/library-integration-tests

Conversation

@evansenter
Copy link
Copy Markdown
Owner

Summary

  • Refactors clemini from pure binary to library+binary architecture
  • Adds live integration tests for the confirmation flow using Gemini API
  • Adds semantic validation capability using Gemini structured output

Changes

  • src/lib.rs: New library crate exposing AgentEvent, run_interaction, CleminiToolService
  • src/logging.rs: Moved logging infrastructure with OutputSink trait
  • tests/confirmation_tests.rs: 4 live API tests for confirmation flow
  • tests/common/mod.rs: Shared test utilities including semantic validation

Test plan

  • make clippy passes
  • make fmt is clean
  • make test passes (237 tests)
  • Live tests pass: cargo test --test confirmation_tests -- --include-ignored --nocapture

🤖 Generated with Claude Code

- Refactor clemini from pure binary to library+binary architecture
- Add src/lib.rs exposing AgentEvent, run_interaction, CleminiToolService
- Move logging infrastructure to src/logging.rs with OutputSink trait
- Add tests/confirmation_tests.rs with 4 live API tests:
  - test_destructive_command_requests_confirmation
  - test_confirmation_approval_executes_command
  - test_safe_command_no_confirmation
  - test_confirmation_response_is_semantic
- Add semantic validation using Gemini structured output
- Add serial_test dependency for global state tests

Run live tests with: cargo test --test confirmation_tests -- --include-ignored

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Jan 18, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs, enabling integration tests. It adds 4 live API tests for the confirmation flow with a semantic validation utility using Gemini structured output.

Issues Found

Critical

None

Important

None

Suggestions

  • src/logging.rs:187,201,220,223 - The tests use unsafe blocks for std::env::set_var and std::env::remove_var which is sound here but unconventional. Consider using a crate like temp-env or documenting why unsafe is needed (Rust 2024 made env var manipulation unsafe due to thread-safety concerns).

  • tests/common/mod.rs:873-876 - The semantic validator defaults to true (valid) when parsing fails. This could mask validation errors. Consider returning Err or at least logging at warn level rather than just printing.

  • src/main.rs:281-284 - The re-export pattern pub(crate) mod logging { pub use clemini::logging::*; } is a bit unusual. A comment explaining this is for crate::logging:: compatibility with mcp.rs would help future readers understand the indirection.

  • Makefile:15-17 - The make test target now runs three separate cargo test commands sequentially. This works but is slower than cargo test --all-targets which runs lib, bin, and integration tests in one invocation. Consider if the separation is intentional (e.g., for isolation) or if a single command would suffice.

Verdict

REQUEST_CHANGES - Minor suggestions for code clarity and robustness.


Automated review by Claude Code

- Drop MutexGuard before awaiting child.kill() in bash tests
- Remove unnecessary let binding in init_test_logging
- Collapse nested if statements using let chain

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Jan 18, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs, enabling integration testing. It adds 4 live API tests for the confirmation flow with a semantic validation utility.

Issues Found

Critical

None

Important

None

Suggestions

  • src/logging.rs:187,201,220,223 - The tests use unsafe blocks for std::env::set_var and std::env::remove_var. This is correct for Rust 2024 edition but could benefit from a brief comment explaining why unsafe is needed (thread-safety concerns in stdlib). Alternatively, consider the temp-env crate for a safer abstraction.

  • tests/common/mod.rs:993-994 - The semantic validator defaults to true (valid) when JSON parsing fails (.unwrap_or(true)). This design choice could mask validation errors silently. Consider either returning an error, logging at warn level instead of just println, or documenting this permissive behavior as intentional to avoid blocking tests on validator issues.

  • src/main.rs:281-284 - The re-export pattern pub(crate) mod logging { pub use clemini::logging::*; } enables crate::logging:: access from mcp.rs. A one-line comment explaining this indirection would help future readers understand why the module wrapper exists rather than direct clemini::logging imports.

  • Makefile:15-17 - The make test target now runs three separate cargo test invocations sequentially (--lib, --bin clemini, --test confirmation_tests). This works but is slower than a single cargo test --all-targets. If the separation is intentional for isolation or clarity, a comment would help; otherwise, consolidating would speed up CI.

Verdict

REQUEST_CHANGES - Four suggestions for code clarity and robustness. The library extraction and test infrastructure look well-designed overall.


Automated review by Claude Code

- Add comment explaining unsafe env var usage in Rust 2024 edition
- Expand re-export pattern comment for clarity
- Improve semantic validator comment and use eprintln for warnings
- Add Makefile comment explaining separate test invocations

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Jan 18, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs, enabling integration tests. It adds 4 live API tests for the confirmation flow with a semantic validation utility using Gemini structured output.

Issues Found

Critical

None

Important

None

Suggestions

  • src/logging.rs:186-236 - The tests use unsafe blocks for std::env::set_var and std::env::remove_var. While the code is correct for Rust 2024 edition and the #[serial] attribute makes it sound, a brief comment explaining the reason for unsafe (thread-safety concerns in stdlib) was added at line 186-188 which is good. However, consider using the temp-env crate for a cleaner abstraction that handles environment variable cleanup automatically.

  • tests/common/mod.rs:1003 - The semantic validator defaults to true (valid) when JSON parsing fails (.unwrap_or(true)). While the comment at lines 1012-1015 explains this is intentional to avoid blocking tests on validator issues, consider logging at warn level via eprintln! consistently rather than mixing println! (line 1007) with eprintln! (line 1015). The permissive default could mask subtle validation failures.

  • src/main.rs:288-293 - The re-export pattern pub(crate) mod logging { pub use clemini::logging::*; } has a good comment explaining the indirection. This is a reasonable workaround for the crate:: vs clemini:: path discrepancy, but consider whether future refactoring to use clemini::logging directly in mcp.rs would be cleaner (though it would require changing many call sites).

  • Makefile:15-20 - The make test target now runs three separate cargo test invocations. The comment explains this is for "clearer output when one type fails" which is reasonable. However, this triples the test compilation overhead. If CI performance becomes a concern, cargo test --all-targets in a single invocation would be faster.

Verdict

REQUEST_CHANGES - Four minor suggestions for code clarity and robustness. The library extraction architecture and test infrastructure are well-designed. The changes follow project conventions appropriately.


Automated review by Claude Code

Change tool executing prefix from 🔧 to ┌─ to create visual
bracket with tool result line (└─), making tool call output
more structured:

┌─ bash command="ls"
└─ bash 0.01s ~20 tok

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Jan 18, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs, enabling integration testing. It adds 4 live API tests for the confirmation flow with a semantic validation utility using Gemini structured output.

Issues Found

Critical

None

Important

None

Suggestions

None - Previous feedback has been addressed:

  • src/logging.rs:277-279 now has a clear comment explaining why unsafe is needed for env var manipulation in Rust 2024 and why #[serial] makes it sound
  • src/main.rs:279-284 now has a comment explaining the re-export pattern for crate::logging:: compatibility
  • tests/common/mod.rs:1216-1219 includes an eprintln! warning when parse fails, making the permissive default visible
  • Makefile:40-41 has a comment explaining the rationale for separate test invocations

Verdict

APPROVE - Code looks good. The library extraction is well-designed with clean separation of concerns. The integration tests provide valuable coverage for the confirmation flow. Previous review suggestions have been addressed with appropriate comments and logging.


Automated review by Claude Code

Change tool executing format from:
  ┌─ (dimmed) name (cyan) args (dimmed)
to:
  ┌─ name (cyan) args

This makes the opening bracket match the closing bracket (└─)
and improves readability of tool arguments.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Jan 19, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs. It adds 4 live API integration tests for the confirmation flow with a semantic validation utility using Gemini structured output, and includes minor formatting improvements (emoji → box-drawing character for tool output).

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:

  • src/logging.rs:277-279 - Comment explaining unsafe for env var manipulation (Implemented)
  • src/main.rs:279-284 - Comment explaining re-export pattern for crate::logging:: compatibility (Implemented)
  • tests/common/mod.rs:1216-1219 - eprintln! warning when semantic validator parse fails (Implemented)
  • Makefile:40-41 - Comment explaining rationale for separate test invocations (Implemented)

4 items from prior feedback rounds were not re-raised.

Verdict

APPROVE - The library extraction is well-designed with clean separation of concerns. The integration tests provide valuable coverage for the confirmation flow. All previous review suggestions have been addressed with appropriate comments and error handling. The code follows project conventions and the architecture maintains the agent isolation principles documented in CLAUDE.md.


Automated review by Claude Code

Fix ordering issues where tools called log_event() directly, causing
race conditions with async events. Tools now emit AgentEvent::ToolOutput
events via channels, ensuring all output flows through the event system
in correct order.

Changes:
- Add ToolOutput variant to AgentEvent
- Add on_tool_output() to EventHandler trait
- All 11 tools receive events_tx and emit via emit() helper
- Add graceful RwLock poisoning recovery
- Add 3 integration test files with semantic validation
- Wire integration tests into CI with GEMINI_API_KEY
- Add make test-all target for running full test suite

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Jan 19, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs. It adds comprehensive integration tests for the confirmation flow, tool output events, and multi-turn interactions using semantic validation via Gemini structured output. Visual formatting is improved by replacing the tool emoji (🔧) with box-drawing characters (┌─).

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:

  • src/logging.rs:277-279 - Comment explaining unsafe for env var manipulation (Implemented)
  • src/main.rs:279-284 - Comment explaining re-export pattern for crate::logging:: compatibility (Implemented)
  • tests/common/mod.rs:1216-1219 - eprintln! warning when semantic validator parse fails (Implemented)
  • Makefile:40-41 - Comment explaining rationale for separate test invocations (Implemented)

4 items from prior feedback rounds were not re-raised.

Verdict

APPROVE - The library extraction is well-designed with clean separation of concerns. The integration tests provide valuable coverage for the confirmation flow, tool output events, and multi-turn state preservation. All previous review suggestions have been addressed. The code follows project conventions documented in CLAUDE.md.


Automated review by Claude Code

evansenter and others added 2 commits January 19, 2026 03:40
Add .gradle/, bin/, node_modules/, build/, .class files, and
package-lock.json patterns for benchmark exercises.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Allow dead_code on EXTENDED_TIMEOUT constant
- Remove unnecessary let binding in create_test_tool_service

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Jan 19, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs. It adds comprehensive integration tests for the confirmation flow, tool output events, and multi-turn state preservation using semantic validation via Gemini structured output. Visual formatting is improved by replacing the tool emoji (🔧) with box-drawing characters (┌─) for cleaner output.

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:

  • src/logging.rs:277-279 - Comment explaining unsafe for env var manipulation in Rust 2024 (Implemented)
  • src/main.rs:279-284 - Comment explaining re-export pattern for crate::logging:: compatibility (Implemented)
  • tests/common/mod.rs:1216-1219 - eprintln! warning when semantic validator parse fails (Implemented)
  • Makefile:40-41 - Comment explaining rationale for separate test invocations (Implemented)

4 items from prior feedback rounds were not re-raised.

Verdict

APPROVE - The library extraction is well-designed with clean separation of concerns. The integration tests provide valuable coverage for the confirmation flow, tool output events, and multi-turn state preservation. All previous review suggestions have been addressed. The code follows project conventions documented in CLAUDE.md, including the event-driven architecture pattern where tools emit AgentEvent::ToolOutput for visual output instead of calling log_event() directly.


Automated review by Claude Code

@claude
Copy link
Copy Markdown

claude bot commented Jan 19, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs. It introduces comprehensive integration tests for the confirmation flow, tool output events, semantic validation, and multi-turn state preservation. Visual output is improved by replacing the tool emoji (🔧) with box-drawing characters (┌─), and tools now emit AgentEvent::ToolOutput events for proper output ordering.

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:

  • src/logging.rs:277-279 - Comment explaining unsafe for env var manipulation in Rust 2024 (Implemented)
  • src/main.rs:279-284 - Comment explaining re-export pattern for crate::logging:: compatibility (Implemented)
  • tests/common/mod.rs:1216-1219 - eprintln! warning when semantic validator parse fails (Implemented)
  • Makefile:40-41 - Comment explaining rationale for separate test invocations (Implemented)

4 items from prior feedback rounds were not re-raised.

Verdict

APPROVE - The library extraction is well-designed with clean separation of concerns. The integration tests provide valuable coverage for the confirmation flow, tool output events, and multi-turn state preservation. All previous review suggestions have been addressed. The code follows project conventions documented in CLAUDE.md, including the event-driven architecture pattern where tools emit AgentEvent::ToolOutput for visual output instead of calling log_event() directly. The emit() helper pattern documented in CLAUDE.md is consistently implemented across all tools.


Automated review by Claude Code

- Add --reset flag to restore exercises to clean state before running
- Warn users when exercises have uncommitted changes from previous runs
- Add -y/--yes flag to skip confirmation prompts
- Show list of modified files when dirty state detected

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Jan 19, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture by extracting core functionality into src/lib.rs and src/logging.rs. It introduces comprehensive integration tests with semantic validation using Gemini structured output, adds the AgentEvent::ToolOutput variant for proper event ordering, and updates visual formatting (replacing 🔧 with ┌─ for tool output). All tools now emit events through the channel instead of calling log_event() directly, ensuring correct output ordering.

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:

  • src/logging.rs:277-279 - Comment explaining unsafe for env var manipulation in Rust 2024 (Implemented)
  • src/main.rs:279-284 - Comment explaining re-export pattern for crate::logging:: compatibility (Implemented)
  • tests/common/mod.rs:1216-1219 - eprintln! warning when semantic validator parse fails (Implemented)
  • Makefile:40-41 - Comment explaining rationale for separate test invocations (Implemented)

4 items from prior feedback rounds were not re-raised.

Verdict

APPROVE - The library extraction is well-designed with clean separation of concerns. The new AgentEvent::ToolOutput variant and emit() helper pattern follow the architecture principles documented in CLAUDE.md. Integration tests provide valuable coverage for confirmation flow, tool output events, and multi-turn state preservation. Visual formatting improvements (┌─ instead of 🔧) create cleaner output. All previous review suggestions have been addressed. The code follows Rust 2024 edition idioms and project conventions.


Automated review by Claude Code

The test_file_not_found_helpful_response test can take longer than
60s because the model may try multiple approaches (glob, read) before
giving up and suggesting alternatives. Use EXTENDED_TIMEOUT (120s) to
avoid flaky CI failures.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude bot commented Jan 19, 2026

Prompt: evansenter/dotfiles/.../claude-review.md

Code Review

Summary

This PR refactors clemini from a pure binary to a library+binary architecture, exposing core functionality through src/lib.rs and src/logging.rs for integration testing. It introduces comprehensive integration tests with semantic validation using Gemini structured output, adds the AgentEvent::ToolOutput variant for proper event ordering, and updates visual formatting (replacing 🔧 with ┌─ box-drawing characters). All tools now emit events through the channel instead of calling log_event() directly, following the documented architecture principles.

Issues Found

Critical

None

Important

None

Suggestions

None

Previously Addressed (Filtered)

Prior review rounds raised 4 suggestions that have been addressed in subsequent commits:

  • src/logging.rs:277-279 - Comment explaining unsafe for env var manipulation in Rust 2024 (Implemented)
  • src/main.rs:279-284 - Comment explaining re-export pattern for crate::logging:: compatibility (Implemented)
  • tests/common/mod.rs:1216-1219 - eprintln! warning when semantic validator parse fails (Implemented)
  • Makefile:40-41 - Comment explaining rationale for separate test invocations (Implemented)

4 items from prior feedback rounds were not re-raised.

Verdict

APPROVE - The library extraction is well-designed with clean separation of concerns. The AgentEvent::ToolOutput variant and emit() helper pattern follow the architecture principles documented in CLAUDE.md (tool output via events, not log_event() directly). Integration tests provide valuable coverage for confirmation flow, tool output events, semantic validation, and multi-turn state preservation. Visual formatting improvements create cleaner terminal output. All previous review suggestions have been addressed.


Automated review by Claude Code

@evansenter
Copy link
Copy Markdown
Owner Author

Closing as stale - this work was implemented incrementally through PRs #80, #84, and #87. The library architecture and integration test infrastructure now exist on main.

@evansenter evansenter closed this Jan 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant