Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .github/workflows/rust.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,32 @@ jobs:
run: cargo nextest run --workspace
# Note: No doctests - clemini is a binary crate without a library target

test-integration:
name: Integration Tests
needs: check
runs-on: ubuntu-latest
# Only run on push to main or PRs from same repo (forks don't have secrets)
if: github.event_name == 'push' || github.event.pull_request.head.repo.full_name == github.repository
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
- uses: taiki-e/install-action@cargo-nextest
- uses: Swatinem/rust-cache@v2
with:
shared-key: integration
- name: Install mold linker
run: sudo apt-get update && sudo apt-get install -y mold
- name: Run integration tests
run: |
set -e
for test in confirmation_tests tool_output_tests semantic_integration_tests; do
echo "::group::Running $test"
cargo nextest run --test $test --run-ignored all
echo "::endgroup::"
done
env:
GEMINI_API_KEY: ${{ secrets.GEMINI_API_KEY }}

fmt:
name: Format
runs-on: ubuntu-latest
Expand Down
10 changes: 9 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Thumbs.db
.env.local
tmp/

# MCP config (local)
# MCP config (machine-specific paths)
.mcp.json

# Claude Code local settings
Expand All @@ -30,3 +30,11 @@ tmp/
*.log
error.log
output.txt

# Benchmark build artifacts
benchmark/exercises/**/.gradle/
benchmark/exercises/**/bin/
benchmark/exercises/**/node_modules/
benchmark/exercises/**/build/
benchmark/exercises/**/*.class
benchmark/exercises/**/package-lock.json
11 changes: 11 additions & 0 deletions .mcp.json.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"mcpServers": {
"clemini": {
"command": "/path/to/clemini/target/release/clemini",
"args": ["--mcp-server"],
"env": {
"GEMINI_API_KEY": "${GEMINI_API_KEY}"
}
}
}
}
29 changes: 26 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ Clemini is a Gemini-powered coding CLI built with genai-rs. It's designed to be
make check # Fast type checking
make build # Debug build
make release # Release build
make test # Run tests
make test # Unit tests only (fast, no API key)
make test-all # Full suite including integration tests (requires GEMINI_API_KEY)
make clippy # Lint with warnings as errors
make fmt # Format code
make logs # Tail human-readable logs
Expand Down Expand Up @@ -72,7 +73,7 @@ run_interaction() UI Layer
- `McpEventHandler` (`mcp.rs`) - MCP server mode

All handlers use shared formatting functions:
- `format_tool_executing()` - Format tool executing line (`🔧 name args`)
- `format_tool_executing()` - Format tool executing line (`┌─ name args`)
- `format_tool_result()` - Format tool completion line (`└─ name duration ~tokens tok`)
- `format_tool_args()` - Format tool arguments as key=value pairs (used by format_tool_executing)
- `format_context_warning()` - Format context window warnings
Expand Down Expand Up @@ -113,6 +114,7 @@ Debugging: `LOUD_WIRE=1` logs all HTTP requests/responses.

## Documentation

- [docs/TOOLS.md](docs/TOOLS.md) - Tool reference, design philosophy, implementation guide
- [docs/TUI.md](docs/TUI.md) - TUI architecture (ratatui, event loop, output channels)
- [docs/TEXT_RENDERING.md](docs/TEXT_RENDERING.md) - Output formatting guidelines (colors, truncation, spacing)

Expand Down Expand Up @@ -141,11 +143,20 @@ Debugging: `LOUD_WIRE=1` logs all HTTP requests/responses.

Don't skip tests. If a test is flaky or legitimately broken by your change, fix the test as part of the PR.

**Integration tests** - Tests in `tests/` that require `GEMINI_API_KEY` use semantic validation:
- `confirmation_tests.rs` - Confirmation flow for destructive commands
- `tool_output_tests.rs` - Tool output events and model interpretation
- `semantic_integration_tests.rs` - Multi-turn state, error recovery, code analysis

Run locally with: `cargo test --test <name> -- --include-ignored --nocapture`

These use `validate_response_semantically()` from `tests/common/mod.rs` - a second Gemini call with structured output that judges whether responses are appropriate. This provides a middle ground between brittle string assertions and purely structural checks.

**Visual output changes** - Tool output formatting is centralized in `src/events.rs`:

| Change | Location |
|--------|----------|
| Tool executing format (`🔧 name...`) | `format_tool_executing()` in `events.rs` |
| Tool executing format (`┌─ name...`) | `format_tool_executing()` in `events.rs` |
| Tool result format (`└─ name...`) | `format_tool_result()` in `events.rs` |
| Tool error detail (`└─ error:...`) | `format_error_detail()` in `events.rs` |
| Tool args format (`key=value`) | `format_tool_args()` in `events.rs` |
Expand Down Expand Up @@ -178,6 +189,18 @@ Test visual changes by running clemini in each mode and verifying the output loo
- `TuiEventHandler` in `main.rs` (needs `AppEvent`)
- `McpEventHandler` in `mcp.rs` (needs MCP notification channel)

**Tool output via events** - Tools emit `AgentEvent::ToolOutput` for visual output, never call `log_event()` directly. This ensures correct ordering (all output flows through the event channel) and keeps tools decoupled from the UI layer. The standard `emit()` helper pattern:
```rust
fn emit(&self, output: &str) {
if let Some(tx) = &self.events_tx {
let _ = tx.try_send(AgentEvent::ToolOutput(output.to_string()));
} else {
crate::logging::log_event(output);
}
}
```
Uses `try_send` (non-blocking) to avoid stalling tools on slow consumers. The fallback to `log_event()` allows tools to work in contexts where events aren't available (e.g., direct tool tests).

### Module Responsibilities

| Module | Responsibility |
Expand Down
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,4 @@ similar = "2"
[dev-dependencies]
tempfile = "3.10"
mockito = "1.2"
serial_test = "3.1"
11 changes: 9 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.PHONY: check build release test clippy fmt logs
.PHONY: check build release test test-all clippy fmt logs

LOG_DIR = $(HOME)/.clemini/logs
LOG_FILE = $(LOG_DIR)/clemini.log.$(shell date +%Y-%m-%d)
Expand All @@ -12,8 +12,15 @@ build:
release:
cargo build --release

# Run unit tests only (fast, no API key required)
test:
cargo test
cargo test --lib
cargo test --bin clemini
cargo test --test event_ordering_tests

# Run all tests including integration tests (requires GEMINI_API_KEY)
test-all:
cargo nextest run --run-ignored all

clippy:
cargo clippy -- -D warnings
Expand Down
61 changes: 59 additions & 2 deletions benchmark/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,40 @@
from pathlib import Path
from concurrent.futures import ThreadPoolExecutor, as_completed


def check_exercises_dirty():
"""Check if benchmark/exercises has uncommitted changes. Returns list of modified files."""
result = subprocess.run(
["git", "status", "--porcelain", "benchmark/exercises/"],
capture_output=True,
text=True,
)
if result.returncode != 0:
return []
# Filter to only modified tracked files (M or space+M), not untracked (??)
modified = []
for line in result.stdout.strip().split("\n"):
if line and not line.startswith("??"):
# Extract filename (after the status prefix)
modified.append(line[3:] if len(line) > 3 else line)
return [f for f in modified if f]


def reset_exercises():
"""Reset benchmark/exercises to clean state using git checkout."""
print("Resetting exercises to clean state...")
result = subprocess.run(
["git", "checkout", "--", "benchmark/exercises/"],
capture_output=True,
text=True,
)
if result.returncode != 0:
print(f"Warning: git checkout failed: {result.stderr}")
return False
print("Exercises reset successfully.")
return True


def run_clemini(prompt, cwd):
"""Call clemini via subprocess with the given prompt."""
cmd = [
Expand Down Expand Up @@ -117,16 +151,39 @@ def main():
parser = argparse.ArgumentParser(description="Run clemini benchmark on exercises.")
parser.add_argument("--parallel", type=int, default=2, help="Number of exercises to run in parallel.")
parser.add_argument("--time-limit", type=int, default=5, help="Time limit in minutes.")
parser.add_argument("--reset", action="store_true", help="Reset exercises to clean state before running.")
parser.add_argument("-y", "--yes", action="store_true", help="Skip confirmation prompts.")
args = parser.parse_args()

repo_root = Path(__file__).parent.parent.absolute()
os.chdir(repo_root)

base_dir = Path("benchmark/exercises")
if not base_dir.exists():
print(f"Error: {base_dir} not found. Run setup.py first.")
sys.exit(1)


# Handle reset flag
if args.reset:
reset_exercises()
else:
# Check for dirty state and warn
dirty_files = check_exercises_dirty()
if dirty_files:
print(f"\n⚠️ Warning: {len(dirty_files)} exercise file(s) have uncommitted changes:")
for f in dirty_files[:10]: # Show first 10
print(f" {f}")
if len(dirty_files) > 10:
print(f" ... and {len(dirty_files) - 10} more")
print("\nBenchmark results may be affected by previous runs.")
print("Use --reset to restore exercises to clean state.\n")

if not args.yes:
response = input("Continue anyway? [y/N] ").strip().lower()
if response not in ("y", "yes"):
print("Aborted.")
sys.exit(0)

exercises = sorted([d.name for d in base_dir.iterdir() if d.is_dir()])
random.shuffle(exercises)

Expand Down
10 changes: 5 additions & 5 deletions docs/TEXT_RENDERING.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ All three UI modes (Terminal, TUI, MCP) implement the `EventHandler` trait in `e

| Function | Output |
|----------|--------|
| `format_tool_executing()` | `🔧 tool_name args...` |
| `format_tool_executing()` | `┌─ tool_name args...` |
| `format_tool_result()` | `└─ tool_name 0.02s ~18 tok` |
| `format_error_detail()` | ` └─ error: message` |
| `format_tool_args()` | `key=value key2=value2` |
Expand All @@ -57,7 +57,7 @@ Uses the `colored` crate for ANSI terminal colors:
| Tool names | Cyan | `.cyan()` |
| Duration | Yellow | `.yellow()` |
| Error labels | Bright red + bold | `.bright_red().bold()` |
| Tool emoji (🔧) | Dimmed grey | `.dimmed()` |
| Tool bracket (┌─) | Dimmed grey | `.dimmed()` |
| Tool arguments | Dimmed grey | `.dimmed()` |
| Bash command/output | Dimmed grey + italic | `.dimmed().italic()` |
| Diff deletions | Red | `.red()` |
Expand All @@ -72,16 +72,16 @@ Uses the `colored` crate for ANSI terminal colors:
### Executing Line (Before Execution)

```
🔧 <tool_name> <formatted_args>
┌─ <tool_name> <formatted_args>
```

- `🔧`: Dimmed
- `┌─`: Dimmed
- `<tool_name>`: Cyan
- `<formatted_args>`: Dimmed grey, key=value pairs

Example:
```
🔧 read_file file_path="/src/main.rs"
┌─ read_file file_path="/src/main.rs"
```

### Result Line (After Execution)
Expand Down
29 changes: 27 additions & 2 deletions docs/TOOLS.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,17 +222,40 @@ Execute shell commands.
---

#### kill_shell
Kill a background bash task.
Kill a background task (bash or subagent).

**Parameters:**
| Name | Type | Required | Description |
|------|------|----------|-------------|
| task_id | string | yes | Task ID from bash with `run_in_background=true` |
| task_id | string | yes | Task ID from bash or task tool |

**Returns:** `{task_id, status, success}`

---

#### task
Spawn a clemini subagent to handle delegated work.

**Parameters:**
| Name | Type | Required | Description |
|------|------|----------|-------------|
| prompt | string | yes | The task/prompt for the subagent |
| background | boolean | no | Return immediately with task_id. (default: false) |

**Returns:** `{status, stdout, stderr, exit_code}` or `{task_id, status, prompt}` when `background=true`

**Limitations:**
- Subagent cannot use interactive tools (`ask_user`) - stdin is null
- Subagent gets its own sandbox based on cwd (does not inherit parent's `allowed_paths`)
- Background tasks are fire-and-forget (no output capture yet - see issue #79)

**Use cases:**
- Parallel work on independent subtasks
- Breaking down complex tasks for focused execution
- Long-running operations that don't need real-time output

---

### Interaction

#### ask_user
Expand Down Expand Up @@ -306,5 +329,7 @@ Fetch and optionally process a web page.
| Create new files | `write_file` | Only for new files or complete rewrites |
| Run builds/tests | `bash` | Shell commands with output capture |
| Long-running commands | `bash` + `run_in_background` | Don't block on slow operations |
| Delegate complex work | `task` | Spawn focused subagent for subtasks |
| Parallel subtasks | `task` + `background=true` | Multiple subagents working concurrently |
| Need user input | `ask_user` | Rather than guessing |
| Multi-step tasks | `todo_write` | Create todos FIRST, then work through them |
Loading
Loading