Status: Draft
Every piece of context that enters the agent's prompt must be explicit and traceable. No hidden globals, no implicit merging of instruction files from parent directories.
- Local-only contexts. No global
agents.md,CLAUDE.md, or similar files injected behind the scenes. The agent sees exactly what's in its config and project directory. Nothing more. - Context merging is explicit. When multiple context sources exist (system prompt, project instructions, skills), the merge order and result must be inspectable. No magic layering.
Extensibility through skills — structured instructions that expand the agent's knowledge, not its runtime.
A skill is a SKILL.md file with YAML frontmatter (name, description, triggers) and a markdown body (workflows, rules, references). The agent loads a skill into its context when a trigger matches, then executes the task using its existing tools (bash, read, write). No new runtime capabilities needed — skills teach the agent how to use what it already has.
This replaces MCP (Model Context Protocol), which solves the same problem — extensibility — through a runtime protocol layer with dynamic discovery, handshake, and invocation. MCP adds indirection and opacity. Skills are just documents:
- No protocol layer. A skill is a file, not a service. No discovery, no handshake, no transport.
- Progressive disclosure. Frontmatter (~100 words) is always visible for triggering. Body loads only when triggered. Bundled references/scripts load on demand. Context window is not wasted.
- Opt-in via project config. Global skills (
~/.config/agent/skills/) exist but are never loaded unless explicitly listed in the project config. If it's not in the config, it doesn't exist. - Inspectable. You can read every skill the agent has access to. The merge order is explicit. No magic.
- Project-level context assembly. The project spec defines not just which skills are loaded, but how their contexts compose — order, priority, which parts of a skill's body are included or excluded. The agent runtime reads this spec and assembles the final context accordingly. In Codex CLI / Claude Code, this assembly happens inside the runtime opaquely. Here, the project owns the recipe.
Skills can bundle scripts and reference docs, but these are executed by existing tools (bash runs a script, read loads a reference), not by a plugin runtime.
Current sub-agent implementations (Claude Code, Cursor, etc.) create opaque systems — you can't see what a sub-agent is doing, what context it has, or why it made a decision. The root TUI has no visibility into child agents.
We don't have a solution yet, but the constraint is clear:
- Sub-agent context, state, and progress must be visible from the root TUI
- No fire-and-forget agents running in invisible contexts
- The user must be able to inspect any agent's full context at any time
This needs rethinking before implementation. Not just "not in MVP" — architecturally unsolved.
A compiled Go binary that implements a ReAct agent loop with pluggable tools and configurable LLM endpoint.
- ReAct loop: send -> stream response -> tool calls -> execute -> send results -> repeat
- OpenAI Responses API (single transport, replaceable base URL)
- Streaming: SSE-based, parsed manually (~30 LOC)
- Roles: system, developer, user
- System prompt: configurable via file, flag, or env (not hardcoded)
- Pluggable tools: Tool interface + registry, register at startup
- Initial tools: bash, read, write (edit later)
- TUI mode: Bubbletea (input bottom, chat top, streaming)
- Headless mode: plain stdout (print mode)
- Config: single JSON file + CLI flags + env vars, priority: flag > env > file > defaults
- Sessions: JSONL append-only persistence, resume support
- Compaction: not implemented in MVP but architecture supports it (conversation state designed for it)
- Multiple LLM providers (only OpenAI Responses API shape for now)
- Image input
- Permission system (yolo mode for now)
- OAuth
- Web UI
- Skills (format is defined, runtime loading is deferred — see Philosophy)
- MCP — skills achieve the same goal with less indirection (see Philosophy)
- Global context injection — violates Full Context Control (see Philosophy)
- Opaque sub-agents — violates Transparent Orchestration (see Philosophy)
The loop emits events into a channel. TUI and print mode are different consumers.
cmd/agent/main.go # entry, flags, mode
internal/
config/
config.go # Config struct, load from file/env/flags
llm/
client.go # OpenAI Responses API client (raw HTTP + SSE)
types.go # Message, ToolCall, ToolResult, Response
agent/
loop.go # ReAct cycle, emits events to channel
conversation.go # Message history, system prompt management
event.go # Event types (EventText, EventToolCall, etc.)
tool/
registry.go # Tool interface + registry
bash.go
read.go
write.go
session/
writer.go # JSONL append-only writer
reader.go # JSONL reader for session resume
ui/
tui.go # bubbletea Model
print.go # headless stdout mode
type Event interface{ eventMarker() }
type EventText struct { Delta string }
type EventToolCall struct { ID, Name string; Args json.RawMessage }
type EventToolResult struct { ID string; Output string; IsError bool }
type EventDone struct { StopReason string }
type EventError struct { Err error }type Tool struct {
Name string
Description string
Schema json.RawMessage
Execute func(ctx context.Context, args json.RawMessage) (string, error)
}
type Registry struct {
tools map[string]Tool
}No magic, no plugins, no dynamic loading. Slice of structs, registered at startup.
type Config struct {
BaseURL string `json:"base_url"`
APIKey string `json:"api_key"`
Model string `json:"model"`
SystemPrompt string `json:"system_prompt"`
SystemPromptFile string `json:"system_prompt_file"`
MaxTokens int `json:"max_tokens"`
}Priority: CLI flag > env > config file (~/.config/agent/config.json) > defaults.
type Entry struct {
Type string `json:"type"`
Timestamp time.Time `json:"timestamp"`
Role string `json:"role,omitempty"`
Content string `json:"content,omitempty"`
Raw json.RawMessage `json:"raw,omitempty"`
}Append-only. One file = one session. Crash-safe (lose at most last line).
Storage: ~/.config/agent/sessions/<timestamp>-<short-id>.jsonl
type Conversation struct {
SystemPrompt string
Messages []Message
}System prompt always preserved separately. When compaction comes:
- Count tokens (heuristic:
len/4, later tiktoken-go) - Summarize old messages via LLM call
- Replace N messages with one summary message
- Append compaction entry to JSONL
| Package | Purpose | When |
|---|---|---|
stdlib net/http |
HTTP client | Step 1 |
stdlib encoding/json |
JSON | Step 1 |
stdlib bufio |
SSE parsing | Step 1 |
stdlib os |
File I/O, config | Step 1 |
charmbracelet/bubbletea |
TUI | Step 6 |
charmbracelet/lipgloss |
TUI styles | Step 6 |
charmbracelet/glamour |
Markdown rendering | Step 6 |
No LLM SDKs. Raw HTTP. SSE parsed manually.
Step 1: config + llm/types + llm/client
Config loading, types, streaming HTTP client
Test: send message, receive streamed text
Step 2: tool/registry + tool/bash
One tool, registration, JSON Schema
Step 3: agent/loop + agent/conversation + session/writer
ReAct loop + conversation state + JSONL persistence
Test: "list files" -> bash ls -> response
Step 4: ui/print + cmd/agent/main
Headless mode, working CLI with persistence
** MILESTONE: working agent in one binary **
Step 5: tool/read + tool/write + tool/edit
File operations
Step 6: ui/tui
Bubbletea: input bottom, chat top, streaming
** MILESTONE: full TUI agent **
After step 4: ~800-1000 LOC, working agent. After step 6: ~1500-2000 LOC, full TUI agent. Binary size: ~8-12MB.
In rough priority order:
- Compaction — token counting + LLM summarization
- Skills — SKILL.md loading by triggers, opt-in via project config, progressive disclosure
- Edit tool — diff-based file editing
- Multiple providers — Anthropic Messages API, etc.
- Transparent sub-agents — requires design work (see Philosophy)
| Project | What to learn |
|---|---|
| CCX-Go (10K LOC) | Cleanest minimal agent loop, raw HTTP to Claude, Bubbletea TUI |
| Workshop (1.9K LOC) | Progressive build, 6 files, educational |
| OpenCode/Crush (42K LOC) | Production-grade architecture, LSP, sessions |
| Goat (48K LOC) | Embeddable library design, 22 tools, permissions |
| Pi (102K LOC TS) | Original inspiration: extensions, compaction, session format |