Skip to content

grinnan/cos

Repository files navigation

CoS -- Chief of Staff

A self-hosted, local-first personal AI coordination system that gives every LLM agent on your machine -- Claude Code sessions, Ollama agents, scripts importing the cos package -- a shared memory, a shared work queue, and a shared awareness of every other agent. A background daemon harvests session transcripts, runs specialist agents against them, and feeds the results back into the same database that the next session will read at startup.

CoS is for developers who already use multiple AI coding agents and want them to remember decisions, share context, and coordinate work without relying on a cloud SaaS. Everything runs against your own PostgreSQL + pgvector database and your own Ollama instance. Cloud LLMs are an optional fallback, not the default.

How is this different from CrewAI / AutoGen / mem0 / MemGPT / Letta?

System Local-first Cross-session memory Background daemon Multi-agent swarm Self-expanding Harvests existing agent transcripts
CoS Yes (Ollama) Yes (pgvector) Yes Yes Yes (TriggerAgentDesigner) Yes (Claude Code JSONL)
CrewAI Any model Partial No Yes (roles) No No
AutoGen / MS Agent Framework Azure-first Partial No Yes No No
LangGraph Any model Tiered No Yes (graph) No No
mem0 Yes Yes (LLM facts) No No No No
Letta / MemGPT Yes Yes (tiered) No (agent-driven) Limited Partial No
Zep / Graphiti Self-host Yes (temporal KG) No No No No
Cognee Yes Yes (vector+graph) No (library) No Partial No
Khoj Yes Document index No No No No
Claude Memory Tool Client-side File-based No No No No

The two distinguishing properties are the always-on daemon that harvests Claude Code session JSONL transcripts and feeds them to specialist agents, and the self-expanding agent taxonomy -- when the system encounters a session pattern it doesn't recognize, it can code-generate, register, and hot-load a new specialist agent at runtime. No other tool in the survey combines those with a swarm registry and an urgency-tiered persistent agenda. (See COS_COMPARISON.md for the full survey of 40+ peer systems.)

Introduction

What is CoS?

CoS acts as a persistent "Chief of Staff" layer that sits between you and your AI agents. In a typical workflow, you might have several Claude Code sessions open across different projects, an Ollama-based agent running a code review, and a background process ingesting meeting notes -- all at the same time. Without coordination, each of these agents operates in isolation: they don't know what the others are doing, they can't share context, and decisions made in one session are invisible to the rest.

CoS solves this by providing:

  • Shared memory -- Every agent writes to and reads from the same semantic search database. A decision recorded in a morning Claude Code session is instantly recallable by an afternoon Ollama agent working on a different project.
  • A persistent work queue -- Agenda items survive across sessions. You can add "deploy v2.1" at 9am, and whichever agent you open at 3pm will see it waiting.
  • Swarm awareness -- Every active agent registers itself. Agents can see who else is running, what they're working on, and send messages to each other.
  • Autonomous background processing -- A daemon continuously harvests session transcripts, extracts decisions and action items, discovers dropped ideas, and surfaces them as agenda items -- all using local LLM inference to keep costs near zero.
  • Intelligent LLM routing -- Rather than always calling a cloud API, CoS routes each task to the best available model. Embedding and triage run locally on Ollama. Complex reasoning falls back to cloud only when needed.

When is CoS useful?

You're juggling multiple projects with AI agents. You have Claude Code open in three terminals -- one for a backend service, one for a frontend, one for infrastructure. A decision in the backend session ("we're switching from REST to gRPC") should be visible when the frontend agent asks about API integration. CoS makes this automatic: the daemon's SessionHarvestWorker picks up the transcript, the DecisionRecorder agent extracts the decision, and it becomes searchable via cos recall "API protocol" from any session.

You lose track of what was discussed and decided. After a long coding session, it's easy to forget the three minor decisions you made, the two ideas you deferred, and the one bug you noticed but didn't fix. CoS's specialist agents -- DecisionRecorder, VisionKeeperExpert, ActionItemExtractor, BugRecorder -- run automatically against your session transcripts and surface everything as structured, searchable records.

You want AI agents to work autonomously in the background. From the web dashboard, you can spawn pre-configured agents to run code reviews, fix failing tests, run linters, perform security audits, or write documentation -- all without occupying a terminal session. The agent task queue coordinates everything.

You want to minimize cloud API costs. CoS's routing layer prefers local Ollama models for everything they can handle well (embedding, classification, summarization, simple extraction) and only falls back to cloud APIs for tasks that genuinely require it. On typical hardware (M3 Max, 64GB), this reduces cloud API spend by roughly 95% compared to routing everything through Claude or GPT.

You want a morning briefing that knows what happened yesterday. Run cos brief and get a formatted summary of your priorities, open agenda items, active projects, and what the swarm has been doing -- all pulled from the database, not regenerated from scratch.

How does it work in practice?

Here's a typical day with CoS running:

  1. 8:00am -- You open a Claude Code session. CoS bootstraps automatically: registers the session as a boss, checks for messages from other agents, and shows you any urgent agenda items and hot items from overnight processing.

  2. 8:05am -- You start coding. In the background, the daemon is already processing last night's sessions. The TranscriptAnomalyExpert found a spec discussion in yesterday's evening session and queued a SpecFileExpert, which extracted 4 action items and registered a formal spec document.

  3. 10:30am -- You finish a feature and open a new session in a different project. CoS shows you the 4 action items from the spec as this_week urgency. You also see that Boss B (an Ollama agent running a security audit) finished and found 2 medium-severity issues.

  4. 2:00pm -- From the web dashboard, you spawn a "fix tests" agent against the morning project. It claims the task from the queue, runs the test suite, fixes 3 failures, and marks the task complete.

  5. 5:00pm -- You run cos brief before shutting down. It shows you what was accomplished, what's still open, and what the VisionKeeperExpert surfaced as a dropped idea worth revisiting.

The Trigger Pipeline

One of CoS's most powerful features is its self-expanding trigger pipeline. When the TranscriptAnomalyExpert encounters a conversation pattern it doesn't recognize, it proposes a new trigger type. The TriggerReviewExpert evaluates whether it's genuinely new or a duplicate. If it's new, the TriggerAgentDesigner writes a new specialist agent -- including the Python code, the system prompt, and the database registration -- and hot-loads it into the running daemon. The system literally grows new capabilities by observing your work.

flowchart LR
    SESSION["New session<br/>transcript"] --> TAE["TranscriptAnomaly<br/>Expert"]
    TAE -->|"known trigger"| SPECIALIST["Existing specialist<br/>(SpecFileExpert,<br/>DecisionRecorder, etc.)"]
    TAE -->|"unknown pattern"| PROPOSE["Propose new<br/>trigger type"]
    PROPOSE --> TRE["TriggerReview<br/>Expert"]
    TRE -->|"duplicate/noise"| REJECT["Rejected"]
    TRE -->|"genuinely new"| TAD["TriggerAgent<br/>Designer"]
    TAD --> NEW["New agent written,<br/>registered, and<br/>hot-loaded"]

    style TAE fill:#4a4a6a,color:#fff
    style TAD fill:#2d6a4f,color:#fff
    style NEW fill:#2d6a4f,color:#fff
Loading

Architecture

graph TB
    subgraph Interfaces
        CLI["cos CLI"]
        API["Python API<br/><code>import cos</code>"]
        DASH["Web Dashboard<br/>:7432"]
    end

    subgraph Daemon["cos-daemon (background workers)"]
        SH["SessionHarvest<br/>120s"]
        MS["MemorySync<br/>300s"]
        SW["SwarmWorker<br/>60s"]
        CU["ContextUpdate<br/>120s"]
        LD["LLMDiscovery<br/>300s"]
        BW["BossWakeup<br/>180s"]
    end

    subgraph Storage
        PG["PostgreSQL + pgvector<br/><code>chief_of_staff</code>"]
    end

    subgraph Inference
        OL["Ollama (local)"]
        CL["Cloud LLMs<br/>(fallback)"]
    end

    CLI --> PG
    API --> PG
    DASH --> PG
    SH --> PG
    MS --> PG
    SW --> PG
    CU --> PG
    LD --> OL
    SH --> OL
    BW --> PG

    API --> OL
    API --> CL

    style PG fill:#336791,color:#fff
    style OL fill:#1a1a2e,color:#fff
    style CL fill:#4a4a6a,color:#fff
Loading

Key Concepts

graph LR
    B1["Boss A<br/>(Claude Code)"] -- heartbeat --> SWARM["Swarm Registry<br/><code>ops.bosses</code>"]
    B2["Boss B<br/>(Ollama agent)"] -- heartbeat --> SWARM
    B1 -- "spawns" --> M1["Minion"]
    B1 -- "queues" --> AT["Agent Task<br/><code>ops.agent_tasks</code>"]
    AT -- "claimed by" --> AG["Specialist Agent<br/>(TranscriptAnomalyExpert, etc.)"]
    B1 -. "message" .-> B2

    style SWARM fill:#2d6a4f,color:#fff
    style AT fill:#6a2d4f,color:#fff
Loading
Term Description
Boss Any LLM agent (Claude Code, Ollama, Codex) registered in the swarm. Each boss has a name, model, working directory, and heartbeat. A single developer might have 3-5 bosses active simultaneously across different projects.
Minion A subagent spawned by a boss for a specific subtask. For example, a boss working on a feature might spawn a minion to run the test suite. Minions are tracked in ops.minions with their parent boss, task, and result.
Agent task A queued job for a specialist agent. Unlike minions (which are ad-hoc), agent tasks are typed: TranscriptAnomalyExpert, SpecFileExpert, DecisionRecorder, etc. The daemon claims and runs them automatically.
Agenda item A work item with an urgency level (now, today, this_week, soon, later). Items can be created by humans (cos add), by agents (ActionItemExtractor), or by the dashboard. They persist until explicitly resolved.
Memory/RAG Semantic vector search across three sources: memories (typed notes with importance), facts (subject-predicate-object triples), and document chunks (auto-split with 50% overlap). All embeddings are 768-dimensional via nomic-embed-text.

Specialist Agents

CoS ships with 16 baseline specialist agents, each designed for a single responsibility. New agents can be code-generated and hot-loaded at runtime by the TriggerAgentDesigner, so the live registry typically grows past the baseline. The current registry is defined in cos/agents.py (AGENT_REGISTRY).

graph TB
    subgraph "Session Analysis Pipeline"
        TAE["TranscriptAnomalyExpert<br/><i>Classifies sessions against<br/>trigger taxonomy</i>"]
        SFE["SpecFileExpert<br/><i>Extracts specs from design<br/>discussions, creates action items</i>"]
        VKE["VisionKeeperExpert<br/><i>Finds dropped ideas and<br/>deferred thoughts</i>"]
        DR["DecisionRecorder<br/><i>Extracts decisions (tech choices,<br/>config, architecture)</i>"]
        AIE["ActionItemExtractor<br/><i>Pulls concrete tasks<br/>and next steps</i>"]
        BR["BugRecorder<br/><i>Logs bug context,<br/>creates fix items</i>"]
        ADE["ArchDocExpert<br/><i>Creates architecture notes<br/>from design discussions</i>"]
    end

    subgraph "Self-Expansion Pipeline"
        TRE["TriggerReviewExpert<br/><i>Evaluates proposed new<br/>trigger types</i>"]
        TAD["TriggerAgentDesigner<br/><i>Writes and registers<br/>new agents</i>"]
    end

    TAE -->|"spec_session"| SFE
    TAE -->|"vision_drift"| VKE
    TAE -->|"decision_made"| DR
    TAE -->|"action_items"| AIE
    TAE -->|"bug_session"| BR
    TAE -->|"architecture_discussion"| ADE
    TAE -->|"unknown"| TRE
    TRE -->|"genuinely new"| TAD

    style TAE fill:#4a4a6a,color:#fff
    style TAD fill:#2d6a4f,color:#fff
Loading
Agent Trigger What it produces
TranscriptAnomalyExpert Every significant session Classification + delegation to the right specialist
SpecFileExpert spec_session Formal spec document in ops.specs, action items on the agenda
VisionKeeperExpert vision_drift [Vision] agenda items for dropped ideas (importance >= 6)
DecisionRecorder decision_made Records in ops.decisions, vectorized for recall
ActionItemExtractor action_items Agenda items with urgency, assigned to the source project
BugRecorder bug_session Bug summaries vectorized for search, fix items if unresolved
ArchDocExpert architecture_discussion Architecture notes ingested as documents
TriggerReviewExpert Unknown trigger proposed Dedup check, approves or rejects proposals
TriggerAgentDesigner Approved new trigger New agent code written, registered, and hot-loaded

Prerequisites

Requirement Version Notes
Python 3.11+ For local install; Docker handles this otherwise
PostgreSQL 14+ With pgvector extension enabled
Ollama latest Running locally or on a reachable host
Docker + Compose latest For containerized deployment only

Installation

CoS is designed for multi-host deployment. A typical setup has a server (Linux) that hosts the database and the canonical repo, and one or more client machines (macOS laptops, other Linux hosts) that connect to the same database. Every machine runs its own daemon to harvest local sessions and keep local status files current.

The examples below use /srv/apps/cos as the server install path -- substitute any path you prefer. The bootstrap script in ~/.claude/CLAUDE.md probes both /srv/apps/cos/.venv/bin/python (Linux server) and ~/.cos/venv/bin/python (client) by default; if you install elsewhere, edit the probe paths in your bootstrap accordingly.

graph TB
    subgraph Server ["Server (Linux)"]
        REPO["Git repo<br/>/srv/apps/cos"]
        VENV_S[".venv"]
        DAEMON_S["cos-daemon<br/>(Docker or systemd)"]
        PG["PostgreSQL + pgvector"]
        OL["Ollama"]
    end

    subgraph Laptop ["Client (macOS laptop)"]
        VENV_L["~/.cos/venv<br/>(pip install from git)"]
        DAEMON_L["cos-daemon<br/>(LaunchAgent)"]
        CLAUDE_L["Claude Code sessions<br/>(~/.claude/)"]
    end

    DAEMON_S --> PG
    DAEMON_L --> PG
    DAEMON_S --> OL
    DAEMON_L -.->|"optional"| OL_L["Local Ollama"]
    CLAUDE_L --> DAEMON_L

    style PG fill:#336791,color:#fff
    style OL fill:#1a1a2e,color:#fff
Loading

Server install (Linux)

The server hosts the git repo, the database, and optionally runs Ollama for local inference.

# Clone the repo
git clone https://github.com/YOUR-USER/cos.git /srv/apps/cos
cd /srv/apps/cos

# Create a virtualenv and install
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

# Verify installation
cos status

This installs two entry points:

Command Description
cos The CLI (all subcommands)
cos-daemon The background daemon

Client install (macOS / remote Linux)

Client machines don't need a full repo checkout -- just pip install the package from the git remote. The venv lives at ~/.cos/venv so it's colocated with the rest of the CoS state.

# Install cos from the git repo (no local clone needed)
python3 -m venv ~/.cos/venv
~/.cos/venv/bin/pip install git+https://github.com/YOUR-USER/cos.git

# Verify it can reach the database
~/.cos/venv/bin/python -c "import cos; cos.briefing()"

To upgrade later:

~/.cos/venv/bin/pip install --upgrade git+https://github.com/YOUR-USER/cos.git

Docker (server)

A pre-built image is available from the container registry:

# Pull from registry
docker pull ghcr.io/YOUR-USER/cos:latest

# Or build locally
make build

Run via docker-compose:

# Start the daemon (background)
make up

# Run CLI commands inside the container
make cli CMD="status"
make cli CMD="recall 'some query'"

# View daemon logs
make logs

# Stop everything
make down

To build and push a new image:

docker build --provenance=false -t ghcr.io/YOUR-USER/cos:latest .
docker push ghcr.io/YOUR-USER/cos:latest

Note: The --provenance=false flag is required. Without it, Docker attaches a build attestation manifest that GitLab's container registry doesn't support (you'll see "invalid tag: missing manifest digest").

Docker volume mounts

The docker-compose binds host paths into the container so the daemon reads ~/.claude/ sessions directly and writes status files to ~/.cos/ on the host filesystem:

volumes:
  - ${HOME}/.cos:/root/.cos          # status.json, hot_items.md, harvested_sessions.json
  - ${HOME}/.claude:/root/.claude:ro  # session JSONL files, memory markdown files

This is critical -- if ~/.cos is a Docker volume instead of a host bind mount, the statusline and session bootstraps won't see the daemon's output. If ~/.claude isn't mounted, the SessionHarvestWorker won't find any sessions to process.

Optional: myma integration

CoS runs fully standalone by default. The myma integration (meeting-notes sync + notebook push from SpecFileExpert/PlanExpert) is opt-in.

To enable it:

  1. Set COS_ENABLE_MYMA=1 in your .env (and optionally COS_MYMA_URL / COS_MYMA_TOKEN if your myma instance isn't on the defaults).

  2. Start the stack with the myma profile so the daemon/CLI containers join the myma_myma-net network and mount the myma_myma-db volume:

    docker compose --profile myma up -d

    This brings up cos-daemon-myma (and cos-cli-myma for one-shot CLI runs) instead of the plain cos-daemon. Without the profile, no myma volumes or networks are touched.

When COS_ENABLE_MYMA is unset, all myma calls in cos.integrations.myma short-circuit to no-ops.

Database Setup

Create the database with pgvector, then apply migrations:

# Create database (if it doesn't exist)
createdb -h <host> -U postgres chief_of_staff
psql -h <host> -U postgres -d chief_of_staff -c 'CREATE EXTENSION IF NOT EXISTS vector;'

# Apply migrations in order
psql -h <host> -U postgres -d chief_of_staff -f migrations/001_swarm_and_pipeline.sql
psql -h <host> -U postgres -d chief_of_staff -f migrations/002_meeting_assistant_sync.sql

Database Schema

erDiagram
    ops_bosses {
        uuid id PK
        text name
        text model
        text working_dir
        timestamp last_seen
        text status
    }
    ops_agenda_items {
        uuid id PK
        text title
        text body
        text urgency
        text category
        timestamp resolved_at
    }
    ops_agent_tasks {
        uuid id PK
        text agent_type
        text context_ref
        jsonb context_raw
        int priority
        text status
    }
    rag_memories {
        uuid id PK
        text mem_type
        text title
        text body
        vector embedding
        int importance
    }
    rag_chunks {
        uuid id PK
        uuid doc_id FK
        text content
        vector embedding
    }
    rag_documents {
        uuid id PK
        text title
        text source
        text doc_type
    }
    sessions_conversations {
        uuid id PK
        text project_dir
        text summary
    }
    sessions_turns {
        uuid id PK
        uuid conversation_id FK
        text role
        text content
        int turn_index
    }

    rag_documents ||--o{ rag_chunks : "chunked into"
    sessions_conversations ||--o{ sessions_turns : "contains"
    ops_bosses ||--o{ ops_agent_tasks : "spawns"
Loading

The full schema spans five Postgres schemas:

Schema Purpose
principal User profile, preferences, sensitivities, values, habits
people Contacts, relationships, interaction history
ops Bosses, agenda, agent tasks, triggers, specs, LLM nodes, routing log
rag Memories, documents, chunks, facts (all with vector embeddings)
sessions Conversation transcripts, turns, breadcrumbs

Pull Required Ollama Models

# Required: embedding model
ollama pull nomic-embed-text

# Recommended: local inference models
ollama pull qwen2.5:7b         # triage, classification (fast)
ollama pull qwen2.5:14b        # summarization
ollama pull qwen2.5-coder:14b  # code generation

See COSWORK.md for the full cost/quality analysis of local vs. cloud models.

Configuration

All configuration is via environment variables. Set them in your shell, a .env file (loaded by docker-compose), or export before running.

# Example .env file
COS_DB_HOST=db.example.internal
COS_DB_PORT=5432
COS_DB_NAME=chief_of_staff
COS_DB_USER=postgres
COS_DB_PASS=
OLLAMA_BASE=http://127.0.0.1:11434
COS_EMBED_MODEL=nomic-embed-text
COS_NOTIFY_BACKEND=log
COS_AUTO_SPAWN=true
COS_DASHBOARD_PORT=7432
Variable Default Description
COS_DB_HOST db.example.internal PostgreSQL host (use postgres when running via the bundled docker-compose.yml, since that is the compose service name)
COS_DB_PORT 5432 PostgreSQL port
COS_DB_NAME chief_of_staff Database name
COS_DB_USER postgres Database user
COS_DB_PASS (empty) Database password
OLLAMA_BASE http://127.0.0.1:11434 Ollama API base URL
COS_EMBED_MODEL nomic-embed-text Embedding model name
COS_NOTIFY_BACKEND macos (Darwin) / log (Linux) Notification backend(s), comma-separated
COS_AUTO_SPAWN true Auto-spawn agents from the task queue
COS_DASHBOARD_PORT 7432 Web dashboard port
COS_MEETING_NOTES_DIR /data/meeting-notes Meeting notes directory
COS_MEETING_DB /data/myma/meetings.db Meeting assistant SQLite DB
COS_OLLAMA_NODES (empty) Extra Ollama node URLs (comma-separated)

CLI Reference

cos                     Show status (default when no subcommand)
cos status              Dashboard overview: bosses, agenda, tasks, LLM nodes
cos brief               Full morning briefing
cos ls [--all] [--cat]  List agenda items grouped by urgency
cos add <title>         Add agenda item
cos done <item_id>      Resolve an agenda item
cos recall <query>      Semantic search across all memory
cos swarm               Show all registered bosses
cos tasks [--all]       Show agent task queue
cos specs [id]          List or view specs
cos routing             Show LLM routing stats
cos sessions            List harvested sessions
cos dashboard           Open web dashboard in browser
cos daemon              Start the background daemon (foreground)

CLI Examples

# Quick status check
cos status

# Add work items at different urgency levels
cos add "Review PR #42" --urgency today --category code-review
cos add "Investigate flaky test" --urgency this_week --body "test_auth_flow fails intermittently"
cos add "Upgrade dependencies" --urgency later

# Search memory semantically
cos recall "authentication middleware changes"
cos recall "deployment process" -n 10

# Mark something done
cos done 3fa8b2c1-...

# Morning briefing
cos brief

# See who's active in the swarm
cos swarm

# Check agent task queue
cos tasks
cos tasks --all    # include completed

# LLM routing performance
cos routing

# Open the web dashboard
cos dashboard

# Start daemon in foreground (for debugging)
cos daemon

Scenario: Triaging your morning

You sit down and want to know what happened overnight:

# What's the system state?
cos status

# Full briefing with priorities, agenda, and swarm activity
cos brief

# Anything about that deploy discussion yesterday?
cos recall "production deploy timeline"

# See what agents ran overnight
cos tasks --all

# Check if any specs were generated from yesterday's architecture chat
cos specs --project my-backend

Scenario: Capturing work during a session

You're midway through a session and want to note things for later:

# Quick capture -- will show up in your next briefing
cos add "Revisit caching strategy for /api/users" --urgency soon

# High-priority item that should be addressed today
cos add "Fix broken migration on staging" --urgency now --category fix

# Done with the migration fix
cos done <item_id>

Makefile Shortcuts (Docker)

make build            # docker compose build
make up               # start cos-daemon (background)
make down             # stop all services
make restart          # restart cos-daemon
make logs             # tail daemon logs
make cli CMD="status" # run any CLI subcommand in container
make status           # shortcut for cos status
make agenda           # shortcut for cos agenda
make routing          # shortcut for cos routing

Python API

The package exports a flat public API via import cos. Every function listed below works from any Python process -- a script, a REPL, a Jupyter notebook, or inside a running LLM agent session.

Memory & RAG

CoS provides three kinds of semantic storage, all searchable through a single recall() call:

  • Memories -- Typed notes (project, feedback, user, reference) with an importance score. Good for capturing context, decisions, and observations.
  • Facts -- Subject-predicate-object triples with a confidence score. Good for structured knowledge ("auth-middleware is-blocked-by legal-review").
  • Document chunks -- Long-form text auto-split into 400-word overlapping chunks. Good for specs, meeting notes, architecture docs.
flowchart LR
    INPUT["Text input"] --> CHUNK["Chunking<br/>(400 words, 50% overlap)"]
    CHUNK --> EMBED["Embedding<br/>(nomic-embed-text)"]
    EMBED --> STORE["pgvector storage"]
    QUERY["Search query"] --> QEMBED["Embed query"]
    QEMBED --> SIM["Cosine similarity<br/>search"]
    STORE --> SIM
    SIM --> RESULTS["Ranked results<br/>(memories + facts + chunks)"]
Loading

Python:

import cos

# Store a memory (auto-embedded)
cos.remember("project", "Auth rewrite",
             "Rewriting auth middleware for compliance.",
             importance=8, domain="security")

# Semantic search
results = cos.recall("authentication changes", n=5)
for r in results:
    print(r["title"], r["similarity"])

# Store a structured fact (subject-predicate-object triple)
cos.add_fact("auth-middleware", "blocked_by", "legal-review", confidence=0.9)

# Ingest a document (auto-chunked, auto-embedded)
cos.ingest_document("Architecture Decision Record",
                    open("adr-003.md").read(),
                    source="adr-003.md", doc_type="adr")

# Sync Claude Code local memory files (~/.claude/projects/*/memory/*.md) to DB
cos.sync_local_memories()

Shell (via python -c or interactive):

# Semantic recall from the command line
cos recall "authentication middleware changes"

# Quick recall from Python one-liner
python -c "import cos; [print(r['title'], round(r['similarity'],2)) for r in cos.recall('auth changes')]"

# Store a memory
python -c "import cos; cos.remember('project', 'Deploy v2.1', 'Deployed to prod successfully', importance=5)"

# Ingest a file
python -c "
import cos
cos.ingest_document('Meeting Notes', open('notes.md').read(), source='notes.md', doc_type='meeting')
"

Scenario: Recovering context from last week

You're picking up a project after a week away and don't remember the details:

# What do we know about this project's auth work?
cos recall "auth middleware refactor"

# Were any decisions made?
cos recall "decision auth"

# Any specs generated?
cos specs --project my-backend
# Or programmatically in a script
import cos
results = cos.recall("auth middleware", n=10)
for r in results:
    print(f"[{r['source']}] {r['title']} (similarity: {r['similarity']:.2f})")
    print(f"  {r['body'][:120]}...")

Scenario: Building a knowledge base from documents

You have a folder of architecture decision records you want to make searchable:

# Ingest all markdown files in a directory
for f in docs/adr/*.md; do
  python -c "
import cos
title = '$(basename "$f" .md)'
content = open('$f').read()
cos.ingest_document(title, content, source='$f', doc_type='adr')
print(f'Ingested: {title}')
"
done

# Now they're all searchable
cos recall "database migration strategy"

Agenda

The agenda is a persistent work queue that survives across sessions. Items have an urgency level that determines their sort order and visual treatment in the CLI and dashboard.

Urgency levels, lowest to highest:

graph LR
    L["later"] --> S["soon"] --> TW["this_week"] --> T["today"] --> N["now"]
    style L fill:#555,color:#fff
    style S fill:#666,color:#fff
    style TW fill:#0891b2,color:#fff
    style T fill:#ca8a04,color:#fff
    style N fill:#dc2626,color:#fff
Loading

Items marked now or today appear in ~/.cos/hot_items.md and are shown at the top of every new agent session. Items are automatically deduplicated -- if you add an item that's 85% similar to an existing one, the duplicate is silently dropped.

Python:

import cos

cos.add_agenda("Deploy v2.1", body="Staging passed, ready for prod",
               urgency="today", category="deploy")

items = cos.get_agenda()                      # unresolved, sorted by urgency
all_items = cos.get_agenda(include_all=True)   # include resolved

cos.resolve_agenda(item_id)

Shell:

# Add items
cos add "Deploy v2.1" --urgency today --category deploy --body "Staging passed"
cos add "Write tests for auth module" --urgency this_week

# List items
cos ls                # open items grouped by urgency
cos ls --all          # include resolved
cos ls --cat deploy   # filter by category

# Resolve
cos done <item_id>

Scenario: Managing a release checklist

# Build up the checklist
cos add "Run full test suite on staging" --urgency today --category release
cos add "Update CHANGELOG for v2.1" --urgency today --category release
cos add "Tag release v2.1" --urgency today --category release
cos add "Deploy to production" --urgency today --category release
cos add "Post-deploy smoke tests" --urgency today --category release

# View just the release items
cos ls --cat release

# Mark them off as you go
cos done <test_item_id>
cos done <changelog_item_id>

Swarm Coordination

The swarm system lets multiple agents see each other and communicate. Every agent that calls register_boss() becomes visible in the swarm registry. Agents that stop heartbeating for 15 minutes are automatically marked as gone by the SwarmWorker.

This is especially useful when you have agents working on related projects -- a frontend agent can check whether the backend agent is still active before making assumptions about API contracts.

sequenceDiagram
    participant B1 as Boss A (Claude Code)
    participant DB as PostgreSQL
    participant B2 as Boss B (Ollama)

    B1->>DB: register_boss()
    B2->>DB: register_boss()
    B1->>DB: heartbeat() (every few min)
    B2->>DB: heartbeat()
    B1->>DB: log_activity("Refactoring auth")
    B1->>DB: send_message("Need review", to=B2)
    B2->>DB: check_messages()
    DB-->>B2: [message from B1]
    B1->>DB: spawn_minion("Run tests")
Loading

Python:

import cos

# Register this agent as a boss
boss_id = cos.register_boss("my-project", "claude-sonnet-4-6",
                             working_dir="/path/to/project")

# Heartbeat (call periodically to stay "active")
cos.heartbeat()

# Log what you're doing
cos.log_activity("Refactoring auth module", event_type="code")

# See who else is running
bosses = cos.get_active_bosses(stale_minutes=10)

# Message another boss (or broadcast to all)
cos.send_message("Need help", "Can someone review auth changes?")
cos.send_message("Direct msg", "Check the tests", to_boss_id=other_boss_id)

# Check inbox
messages = cos.check_messages(unread_only=True)

Shell:

# See active bosses
cos swarm

# Register + interact via Python one-liners
python -c "import cos; bid = cos.register_boss('cli-task', 'manual'); print(bid)"
python -c "import cos; print(cos.get_active_bosses())"
python -c "import cos; cos.send_message('Status update', 'Auth module complete')"

Scenario: Coordinating agents across projects

You have a frontend and backend agent running simultaneously. The backend agent finishes an API change:

# In the backend agent session
import cos
cos.log_activity("Completed gRPC migration for /api/users", event_type="code")
cos.send_message("API change", "Switched /api/users from REST to gRPC. Proto files in shared/proto/.")
cos.add_agenda("Update frontend API client for gRPC", urgency="today",
               category="build", responsible_boss="/path/to/frontend")

The next time the frontend agent starts, it sees the message and the agenda item automatically.

Agent Tasks

The agent task queue is how the daemon dispatches work to specialist agents. Tasks are claimed atomically (FOR UPDATE SKIP LOCKED) so multiple daemon instances won't double-process. Each task has a type (which agent handles it), a priority (1-10, higher = more urgent), and a context payload.

flowchart LR
    Q["queue_agent_task()"] --> QUEUE["ops.agent_tasks<br/>(pending)"]
    QUEUE --> CLAIM["claim_agent_task()<br/>(atomic pop)"]
    CLAIM --> RUN["Specialist Agent<br/>runs"]
    RUN --> DONE["complete_agent_task()<br/>(result stored)"]
Loading

Python:

import cos

# Queue a specialist agent
cos.queue_agent_task("TranscriptAnomalyExpert",
                     context_ref="session-abc-123",
                     context_raw={"transcript": "..."},
                     priority=5)

# Claim a task (used by daemon/workers)
task = cos.claim_agent_task(agent_type="TranscriptAnomalyExpert")

# Complete a task
cos.complete_agent_task(task["id"], result={"findings": [...]})

Shell:

# View the agent task queue
cos tasks
cos tasks --all   # include completed/failed

# Queue a task from the command line
python -c "
import cos
cos.queue_agent_task('ActionItemExtractor',
    context_ref='session-id-here',
    context_raw={'source': 'manual'},
    priority=8)
print('Task queued')
"

Scenario: Re-processing a past session

You realize an important meeting transcript wasn't fully analyzed:

# Find the session ID
cos sessions

# Manually queue the agents you want to run against it
python -c "
import cos
sid = 'abc123-session-id'
cos.queue_agent_task('DecisionRecorder', context_ref=sid, priority=8)
cos.queue_agent_task('ActionItemExtractor', context_ref=sid, priority=8)
cos.queue_agent_task('VisionKeeperExpert', context_ref=sid, priority=5)
print('Queued 3 agents')
"

# Watch them process
cos tasks

LLM Routing

The router is CoS's inference layer. Rather than hardcoding which model to use, it maintains a registry of available Ollama nodes and their models, scores them based on historical performance (success rate, latency, quality), and picks the best match for each task type.

The quality feedback loop is key: every time an agent calls _llm_json(), the response is automatically scored (parseable JSON = 8.0, unparseable = 2.0). Over time, the router learns which models actually work well for each task and adjusts its scoring. You can also provide manual feedback via rate_response().

flowchart TB
    REQ["Task request<br/>(e.g. summarize)"] --> ROUTER["Router"]
    ROUTER --> SCORE["Score nodes by:<br/>- model match<br/>- historical success rate<br/>- avg latency<br/>- avg quality"]
    SCORE --> PICK["Best (node, model)"]
    PICK --> CALL["Ollama API call"]
    CALL --> RESP["Response"]
    RESP --> LOG["Log to ops.llm_routing_log"]
    RESP -.-> FB["Quality feedback<br/>(rate_response)"]
    FB -.-> SCORE

    style ROUTER fill:#4a4a6a,color:#fff
Loading

Task types and their preferred model tiers:

Task Type Best Models Fallback
embed nomic-embed-text --
triage qwen3:8b, llama3, mistral phi3
extract qwen3:8b, qwen2.5:14b llama3
summarize qwen3:8b, qwen2.5:14b mistral
analyze qwq:32b, deepseek-r1:32b, qwen3:30b llama3
spec qwq:32b, qwen3:30b, deepseek-r1:32b llama3
code qwen3-coder:30b, devstral, codestral llama3

Python:

import cos

# Route a task to the best available model
base_url, model = cos.route("summarize")

# Use the routed LLM directly (returns text + log_id)
text, log_id = cos.routed_llm("Summarize this document...",
                               task_type="summarize",
                               system="You are a summarizer.")

# Provide quality feedback (improves future routing)
cos.rate_response(log_id, quality_score=8)

# Refresh node discovery
cos.refresh_nodes()
nodes = cos.get_active_nodes()

Shell:

# View routing stats
cos routing

# Refresh node discovery manually
python -c "import cos; cos.refresh_nodes(); print(cos.get_active_nodes())"

# Quick routed LLM call
python -c "
import cos
text, lid = cos.routed_llm('What is 2+2?', task_type='triage')
print(text)
"

Scenario: Adding a new Ollama node

You set up Ollama on a second machine and want CoS to use it:

# Option 1: Add via environment variable (persists in .env)
echo 'COS_OLLAMA_NODES=http://192.168.1.50:11434' >> .env

# Option 2: Add to the database directly
psql -h <host> -U postgres -d chief_of_staff -c \
  "INSERT INTO ops.llm_nodes (base_url, label, priority) VALUES ('http://192.168.1.50:11434', 'workstation', 5);"

# Force a refresh
python -c "import cos; cos.refresh_nodes()"

# Verify it's discovered
cos routing

Standing Orders & Context

Standing orders are persistent instructions stored in the database that apply to every boss regardless of model. They encode your preferences and operating rules -- things like "prefer short responses", "no emoji", or "always confirm before destructive actions". Every boss loads them at session start.

Python:

import cos

orders = cos.get_standing_orders(scope="global")
context = cos.get_principal_context()   # full user profile
cos.briefing()                          # print formatted briefing

Shell:

# Morning briefing
cos brief

# Dump standing orders
python -c "import cos; [print(o['title']) for o in cos.get_standing_orders()]"

# View your full principal context
python -c "
import cos
ctx = cos.get_principal_context()
for section, content in ctx.items():
    print(f'=== {section} ===')
    print(content[:200])
    print()
"

Daemon

The daemon (cos-daemon) runs background worker threads that keep the system alive. Each worker runs in its own thread, on its own interval, with independent error handling -- a failure in one worker doesn't affect the others.

flowchart TB
    MAIN["cos-daemon main()"] --> SH & MS & SW & CU & LD & BW

    SH["SessionHarvestWorker<br/>every 120s"]
    MS["MemorySyncWorker<br/>every 300s"]
    SW["SwarmWorker<br/>every 60s"]
    CU["ContextUpdateWorker<br/>every 120s"]
    LD["LLMDiscoveryWorker<br/>every 300s"]
    BW["BossWakeupWorker<br/>every 180s"]

    SH -->|"scans"| JSONL["~/.claude/**/*.jsonl"]
    SH -->|"archives to"| DB["PostgreSQL"]
    SH -->|"skims via"| OL["Ollama"]
    SH -->|"queues"| TASKS["ops.agent_tasks"]

    MS -->|"syncs"| MEM["~/.claude/**/memory/*.md"]
    MS -->|"embeds into"| DB

    SW -->|"writes"| SS["~/.cos/swarm_status.md"]
    CU -->|"writes"| HI["~/.cos/hot_items.md"]
    CU -->|"writes"| SJ["~/.cos/status.json"]
    LD -->|"probes"| OL

    style MAIN fill:#4a4a6a,color:#fff
Loading
Worker Interval What it does
SessionHarvestWorker 120s Scans Claude Code session JSONL files, archives transcripts, skims for significance via local LLM, queues specialist agents. Tracks processed files in ~/.cos/harvested_sessions.json to avoid re-work.
MemorySyncWorker 300s Syncs ~/.claude/projects/*/memory/*.md into the RAG database with embeddings. Watches file mtimes to detect changes and re-embed updated files.
SwarmWorker 60s Marks stale bosses (no heartbeat in 15min) as gone, writes ~/.cos/swarm_status.md with a human-readable summary of who's active and what they're doing.
ContextUpdateWorker 120s Writes ~/.cos/hot_items.md (urgent agenda + recent completions/failures) and ~/.cos/status.json. These files are read by new agent sessions at bootstrap.
LLMDiscoveryWorker 300s Probes Ollama nodes from DB + COS_OLLAMA_NODES, updates availability and model lists. Detects when nodes go offline or new models are pulled.
BossWakeupWorker 180s Checks for agenda items assigned to a specific boss that has gone inactive. Sends a notification to remind you to restart that agent.

Running the Daemon on Multiple Hosts

The daemon is safe to run on multiple hosts simultaneously. Each instance harvests sessions and syncs memories from its own ~/.claude/ directory, but they all write to the same database. This means Claude Code sessions on your laptop get harvested by the laptop's daemon, and sessions on the server get harvested by the server's daemon.

flowchart LR
    subgraph Server
        DS["cos-daemon"] -->|"harvests"| SS["~/.claude/<br/>(server sessions)"]
        DS -->|"writes"| DB["PostgreSQL"]
    end
    subgraph Laptop
        DL["cos-daemon"] -->|"harvests"| SL["~/.claude/<br/>(laptop sessions)"]
        DL -->|"writes"| DB
    end
    DS -->|"writes"| HS["~/.cos/status.json<br/>hot_items.md<br/>(server)"]
    DL -->|"writes"| HL["~/.cos/status.json<br/>hot_items.md<br/>(laptop)"]
Loading

Starting the Daemon

Linux server (Docker)

# Background via Docker (logs via `make logs`)
make up

# View logs
make logs

# Restart after code changes
make restart

Linux server (foreground, for debugging)

cos-daemon
# or
cos daemon

macOS client (LaunchAgent)

On macOS, set up a LaunchAgent so the daemon starts automatically and restarts on crash:

cat > ~/Library/LaunchAgents/com.cos.daemon.plist << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
  "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.cos.daemon</string>

    <key>ProgramArguments</key>
    <array>
        <string>/Users/YOUR_USER/.cos/venv/bin/cos-daemon</string>
    </array>

    <key>EnvironmentVariables</key>
    <dict>
        <key>OLLAMA_BASE</key>
        <string>http://127.0.0.1:11434</string>
        <key>COS_DB_HOST</key>
        <string>db.example.internal</string>
        <key>COS_NOTIFY_BACKEND</key>
        <string>macos</string>
        <key>PATH</key>
        <string>/Users/YOUR_USER/.cos/venv/bin:/usr/local/bin:/usr/bin:/bin</string>
    </dict>

    <key>RunAtLoad</key>
    <true/>

    <key>KeepAlive</key>
    <true/>

    <key>StandardOutPath</key>
    <string>/Users/YOUR_USER/.cos/daemon.log</string>

    <key>StandardErrorPath</key>
    <string>/Users/YOUR_USER/.cos/daemon.err</string>

    <key>ThrottleInterval</key>
    <integer>30</integer>
</dict>
</plist>
EOF

# Load it (starts immediately)
launchctl load ~/Library/LaunchAgents/com.cos.daemon.plist

# Verify it's running
launchctl list | grep cos

Managing the LaunchAgent:

# Check status (PID and exit code)
launchctl list | grep cos

# Stop
launchctl unload ~/Library/LaunchAgents/com.cos.daemon.plist

# Restart (unload + load)
launchctl unload ~/Library/LaunchAgents/com.cos.daemon.plist
launchctl load ~/Library/LaunchAgents/com.cos.daemon.plist

# View logs
tail -f ~/.cos/daemon.log
tail -f ~/.cos/daemon.err

If Ollama isn't running on the client, the LLMDiscoveryWorker will log a connection error but the daemon continues running normally. If you later start Ollama on the client, the worker will automatically discover it on the next probe cycle (every 300s).

Scenario: Debugging the daemon

Something isn't being processed and you want to see what's happening:

# Run in foreground with full logs
cos daemon

# In another terminal, check what's been harvested
cat ~/.cos/harvested_sessions.json | python -m json.tool | tail -20

# Check for agent task failures
cos tasks --all

# Look at what hot_items the daemon is generating
cat ~/.cos/hot_items.md

Web Dashboard

A lightweight web UI (zero external dependencies -- pure HTML/CSS/JS served by Python's built-in HTTP server) on port 7432.

# Open in browser
cos dashboard

# Or access directly
# http://localhost:7432

Features:

  • View and filter agenda items by urgency
  • Add and resolve agenda items
  • Monitor agent task queue
  • Send messages to bosses
  • Spawn agents from pre-configured templates
  • Dark/light theme toggle
  • Collapsible sections and toast notifications

Spawn Templates

The dashboard includes pre-configured agent templates for common tasks. Each template has a prompt and a budget:

Template What it does
code_review Reviews git diff against main, flags bugs/security/missing tests
fix_tests Runs test suite, fixes failures without changing test logic
lint_fix Runs linter/type checker, fixes style issues
security_audit Audits for hardcoded secrets, injection risks, insecure deps
write_readme Writes/updates README based on actual code
write_docs Adds docstrings to undocumented public functions
changelog Generates CHANGELOG from git log
cos_status Generates a CoS system health report
cos_agenda Reviews and triages the agenda
research_deps Researches dependency updates and security advisories
perf_profile Profiles the codebase for performance bottlenecks

Claude Code Integration

CoS bootstraps automatically in Claude Code sessions via ~/.claude/CLAUDE.md. This means every Claude Code session is CoS-aware without any manual setup -- on any machine where the cos package is installed.

flowchart TB
    START["Claude Code session starts"] --> DETECT["Detect Python<br/>/srv/apps/cos/.venv/bin/python (Linux)<br/>~/.cos/venv/bin/python (macOS)"]
    DETECT --> CHECK["Exclusion check<br/>(~/.cos/exclude_dirs.txt)"]
    CHECK -->|excluded| SKIP["Skip CoS"]
    CHECK -->|active| REG["register_boss()"]
    REG --> MSG["check_messages()"]
    REG --> AGENDA["get_agenda() (urgent)"]
    MSG --> LOAD["Read ~/.cos/hot_items.md<br/>Read ~/.cos/swarm_status.md"]
    AGENDA --> LOAD
    LOAD --> READY["Session ready<br/>(urgent items shown first)"]
Loading

Multi-platform Python detection

The bootstrap in ~/.claude/CLAUDE.md automatically finds the right Python venv on each platform:

COS_PYTHON="$([ -x /srv/apps/cos/.venv/bin/python ] && echo /srv/apps/cos/.venv/bin/python || echo $HOME/.cos/venv/bin/python)"

This checks for the server venv at /srv/apps/cos/.venv/bin/python first (Linux). If that doesn't exist, it falls back to ~/.cos/venv/bin/python (macOS client install). Since ~/.claude/CLAUDE.md syncs across machines, this single file works on both without modification.

Bootstrap sequence

  1. Python detection -- Finds the cos venv (server path first, then client path).
  2. Exclusion check -- Reads ~/.cos/exclude_dirs.txt. If the current working directory matches any listed path, CoS is skipped entirely. This is useful for third-party repos or sensitive projects.
  3. Registration -- Calls register_boss() to announce this session to the swarm.
  4. Message check -- Reads any unread messages from other bosses or agents.
  5. Agenda check -- Pulls urgent items (now, today) to display immediately.
  6. Context load -- Reads ~/.cos/hot_items.md (recent completions, failures, urgent work) and ~/.cos/swarm_status.md (what other bosses are doing).

Statusline

The Claude Code statusline shows CoS:<bosses> <urgent>! -- for example, CoS:2 3! means 2 active bosses and 3 urgent agenda items. This is read from ~/.cos/status.json, which the daemon's ContextUpdateWorker writes every 120 seconds.

The status.json file contains:

{
  "active_bosses": 2,
  "urgent_items": 3,
  "approved_proposals": 0,
  "updated_at": "2026-04-03T06:30:04.323141+00:00"
}

Since each machine runs its own daemon, the statusline stays current on every host. If the daemon isn't running, the statusline will show stale data from the last time it was updated.

Excluding directories

To exclude a project directory from CoS:

# Add the absolute path to the exclusion file (one per line)
echo "/path/to/excluded/project" >> ~/.cos/exclude_dirs.txt

# You can also add comments
echo "# Third-party repos" >> ~/.cos/exclude_dirs.txt
echo "/home/user/vendor/some-lib" >> ~/.cos/exclude_dirs.txt

Claude Code Settings

CoS requires three pieces of Claude Code configuration in ~/.claude/settings.json. These use platform-agnostic paths so a single settings file works on both Linux and macOS:

UserPromptSubmit hook -- Fires on every user message. Heartbeats the boss, checks for hot items changes, and injects project-scoped context (agenda items, recent sessions, relevant memories) on session start.

{
  "hooks": {
    "UserPromptSubmit": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "COS_PYTHON=\"$([ -x /srv/apps/cos/.venv/bin/python ] && echo /srv/apps/cos/.venv/bin/python || echo $HOME/.cos/venv/bin/python)\"; \"$COS_PYTHON\" $HOME/.cos/prompt_submit.py"
          }
        ]
      }
    ]
  }
}

Statusline -- Displays CoS:<active_bosses> <urgent_items>! in the Claude Code status bar by reading ~/.cos/status.json.

{
  "statusLine": {
    "type": "command",
    "command": "zsh $HOME/.claude/statusline-command.sh"
  }
}

The statusline script (~/.claude/statusline-command.sh) reads JSON from stdin (provided by Claude Code with workspace, model, and context info) and appends the CoS segment from ~/.cos/status.json. The segment turns red when there are urgent items.

Key files:

File Location Purpose
statusline-command.sh ~/.claude/ Statusline rendering script (zsh)
prompt_submit.py ~/.cos/ UserPromptSubmit hook (Python)
status.json ~/.cos/ Written by daemon, read by statusline
hot_items.md ~/.cos/ Written by daemon, injected by hook on change

Prompt Submit Hook Details

The ~/.cos/prompt_submit.py hook runs on every user message and does the following:

flowchart TB
    MSG["User sends message"] --> EXCL["Check exclude_dirs.txt"]
    EXCL -->|excluded| EXIT["Exit silently"]
    EXCL -->|active| HB["register_boss() / heartbeat"]
    HB --> CHECK["Compare hot_items.md mtime<br/>vs last seen"]
    CHECK -->|changed or gap > 15min| INJECT["Print hot items<br/>to conversation"]
    CHECK -->|unchanged| SKIP["Skip injection"]
    INJECT --> CTX{"Session start?<br/>(gap > 15min)"}
    SKIP --> CTX
    CTX -->|yes| PROJECT["Inject project context:<br/>- Agenda items for this dir<br/>- Recent session summaries<br/>- Relevant memories"]
    CTX -->|no| DONE["Done"]
    PROJECT --> DONE
Loading

On session start (defined as >15 minutes since the last hook run), the hook also queries the database for project-specific context: open agenda items assigned to the current working directory, recent session summaries, and the top relevant memories from semantic search.

Key File Paths

graph LR
    subgraph "~/.cos/"
        HI["hot_items.md<br/><i>urgent agenda</i>"]
        SS["swarm_status.md<br/><i>active bosses</i>"]
        SJ["status.json<br/><i>statusline data</i>"]
        EX["exclude_dirs.txt<br/><i>opt-out directories</i>"]
        DL["daemon.log<br/><i>daemon stdout</i>"]
        DE["daemon.err<br/><i>daemon stderr</i>"]
        HS["harvested_sessions.json<br/><i>processed session tracker</i>"]
        VN["venv/<br/><i>client Python venv</i>"]
        WT["worktrees/<br/><i>git worktrees for agents</i>"]
        NT["notes/<br/><i>Marimo notebooks</i>"]
    end
Loading
Path Purpose
~/.cos/hot_items.md Urgent items injected at session start
~/.cos/swarm_status.md Active bosses summary
~/.cos/status.json Machine-readable status for statusline integrations
~/.cos/exclude_dirs.txt Project paths to skip (one per line)
~/.cos/daemon.log Daemon stdout log
~/.cos/daemon.err Daemon stderr log (macOS LaunchAgent)
~/.cos/harvested_sessions.json Tracks which session files have been processed
~/.cos/venv/ Client-install Python venv (macOS / remote hosts)
~/.cos/prompt_submit.py UserPromptSubmit hook script
~/.cos/worktrees/ Git worktrees for autonomous agent workspaces
~/.cos/notes/ Marimo notebook notes
~/.claude/statusline-command.sh Statusline rendering script
~/.claude/CLAUDE.md Bootstrap instructions (loaded by Claude Code)
~/.claude/settings.json Claude Code settings (hooks, statusline, env)

Notification Backends

Set COS_NOTIFY_BACKEND to one or more (comma-separated):

# Single backend
export COS_NOTIFY_BACKEND=desktop

# Multiple backends
export COS_NOTIFY_BACKEND=log,desktop
Backend Platform Method
macos macOS osascript display notification
desktop Linux notify-send (freedesktop)
log Any Python logger.info
none Any Silent

Notifications are sent by agents when they complete significant work -- for example, when the SpecFileExpert registers a new spec, or when the TriggerAgentDesigner creates a new agent.

Testing

# Install with test dependencies
pip install -e ".[test]"

# Run tests
pytest

# Run a specific test file
pytest tests/test_router.py

# Verbose output
pytest -v

Tests use a real PostgreSQL connection with transaction rollback -- no separate test DB needed. LLM calls are mocked via fixtures in tests/conftest.py.

Project Structure

cos/
  __init__.py       Public API (import cos) -- 63 exported functions
  __main__.py       Entry point for python -m cos
  client.py         PostgreSQL client: memory, agenda, swarm, sessions, sync
  router.py         LLM routing: node discovery, task dispatch, quality tracking
  daemon.py         Background workers (session harvest, memory sync, swarm, etc.)
  agents.py         Specialist agents (transcript analysis, trigger matching, etc.)
  cli.py            Click-based CLI (cos status, cos recall, etc.)
  dashboard.py      Web UI on port 7432 (zero dependencies)
  notify.py         Multi-backend notifications (macOS, Linux, log)
  note.py           Marimo-based note system with AST-validated headers
migrations/
  001_swarm_and_pipeline.sql     Core schema (swarm, pipeline, sessions)
  002_meeting_assistant_sync.sql Meeting assistant sync tables
tests/
  conftest.py       DB fixtures (rollback), LLM mocks
  test_*.py         Unit tests
Dockerfile          Container image for daemon
docker-compose.yml  Multi-service orchestration
Makefile            Convenience targets (build, up, down, cli, logs)

Further Reading

Document Description
HANDOFF.md Full system architecture, inference philosophy, schema reference, bootstrap mechanism
COS_CHEATSHEET.md Quick-reference card for CLI commands, Python API, daemon workers, DB tables
COSWORK.md Cost/quality analysis of local vs. cloud LLM routing, model recommendations, measurement plan

About

Local-first AI chief-of-staff: Postgres+pgvector memory, Ollama inference, swarm registry, self-expanding trigger taxonomy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages