Global Workspace Theory-inspired context management for LLM agents. MCP server with bounded workspace selection, broadcast-bus subscribers, specialist competition, and multi-hop reasoning.
LLMs lose information in long contexts — multi-hop reasoning degrades (Sequential-NIAH best: 63.15%), information aggregation suffers (LongBench Pro T6: 57.72%). This project focuses on the architectural parts of GWT that are useful for agent memory: bounded selection, global availability, recurrent activation, and independent post-broadcast processors.
gwt-context implements a concrete selection-broadcast runtime as an MCP server. Specialist processors compete to surface relevant information into a capacity-limited workspace, then a broadcast bus fans the selected content out to independent subscribers that can propose recall, exact resolution, contradiction flags, or follow-up actions. It is not a biological GWT simulator; it is a practical agent-memory architecture with explicit GWT markers.
| Marker | Implementation |
|---|---|
| Global availability | Workspace broadcast — all items visible simultaneously |
| Functional concurrency | 6 specialists score independently |
| Coordinated selection | CompetitionEngine — single arbitration point |
| Capacity limitation | Workspace capacity = 7 (Miller's 7±2) |
| Persistence with controlled update | Items persist until displaced by competition |
| Goal-modulated arbitration | ×1.3 multiplicative boost by goal relevance |
| Post-broadcast processors | BroadcastBus fan-out with accepted/inhibited proposals |
pip install gwt-contextRequirements: Python 3.11+, sentence-transformers (all-MiniLM-L6-v2, downloaded on first run).
Add to claude_desktop_config.json:
{
"mcpServers": {
"gwt-context": {
"command": "python",
"args": ["-m", "gwt_context"]
}
}
}python -m gwt_contextUse hash embeddings when you want a fully local startup check without downloading a sentence-transformer model:
GWT_EMBEDDING_PROVIDER=hash GWT_EMBEDDING_MODEL=hash python -m gwt_context.smokeThe same command is available after installation as gwt-context-smoke.
Run a real stdio MCP client smoke against the packaged server entrypoint:
gwt-context-mcp-smokeAdd it to Codex as a local MCP server:
codex mcp add gwt-context \
--env GWT_EMBEDDING_PROVIDER=hash \
--env GWT_EMBEDDING_MODEL=hash \
--env GWT_EMBEDDING_DIM=32 \
--env GWT_DATA_DIR=/home/netsky/.gwt-context-codex/projects/gwt-context \
-- python -m gwt_contextNew Codex MCP tools are available to new Codex sessions after the config is
loaded.
Use gwt-global with /home/netsky/.gwt-context-codex/global for cross-project
memory. See docs/codex-mcp.md for namespace setup and
cleanup.
| Tool | Description |
|---|---|
gwt_store |
Store content in long-term memory (embed + index + buffer) |
gwt_set_goal |
Set active goal — biases competition toward goal-relevant items |
gwt_broadcast |
Run selection-broadcast cycle — returns workspace content |
gwt_compete |
Competition round without broadcast (dry-run) |
gwt_query |
Semantic search over long-term memory, optionally admitted to competition |
gwt_attend |
One-call goal-directed attention pass with semantic, structured, graph, hybrid, or auto planning |
gwt_resolve |
Resolve a question against runtime structured memory without broadcasting |
gwt_collection_query |
Run exact count/filter/top-k/average/sum/distinct/min/max/compare operations over runtime structured memory |
gwt_trace_explain |
Explain the most recent explicit attention trace |
gwt_memory_profile |
Inspect active namespace, persisted counts, embedding settings, and structured read-model state |
gwt_readiness_check |
Return compact runtime readiness checks, counts, namespace, and bus health |
gwt_backup_memory |
Produce a JSONL backup payload with namespace metadata |
gwt_export_memory |
Export persisted memory as JSONL without embeddings |
gwt_import_memory |
Import JSONL memory into the active namespace and re-embed records |
gwt_restore_memory |
Restore JSONL memory in merge or confirmed replace mode |
gwt_compact_working_memory |
Dry-run or confirmed compaction of old working-memory records |
gwt_reset |
Clear runtime, workspace, or confirmed persistent memory with explicit confirmation |
gwt_evict |
Manual eviction from workspace |
gwt_link |
Bidirectional link between items (enables multi-hop chains) |
gwt_inspect |
Observe workspace, buffer, goals, stats |
┌─────────────────────────────────────────────────┐
│ Long-Term Memory │
│ (SQLite + Vector Index) │
└──────────────────┬──────────────────────────────┘
│ candidates
▼
┌──────────────────────────────────────────────────┐
│ Specialist Processors │
│ │
│ Relevance (0.35) Recency (0.20) │
│ Novelty (0.15) Frequency (0.10) │
│ Structural Linkage (0.10) Goal Linkage (0.10) │
└──────────────────┬───────────────────────────────┘
│ scored candidates
▼
┌──────────────────────────────────────────────────┐
│ Competition Engine │
│ weighted scores + goal modulation (×1.3) │
│ top-N admitted, losers evicted │
└──────────────────┬───────────────────────────────┘
│ winners
▼
┌──────────────────────────────────────────────────┐
│ Global Workspace (capacity=7) │
│ │
│ Broadcast → formatted text returned to LLM │
└──────────────────────────────────────────────────┘
Items can be linked bidirectionally via gwt_link. The GoalLinkageSpecialist weights each link by how relevant the linked item is to the current goal — multi-hop chains are boosted only when they lead toward the goal. The StructuralLinkageSpecialist preserves chains across minor goal shifts.
When the goal changes, GoalLinkageSpecialist re-weights all links by relevance to the new goal. Items linked to now-irrelevant content lose their boost and get evicted, making room for goal-relevant items.
gwt_context.application.attention.AttentionController provides a reusable path for deterministic selection: set the goal, resolve an evidence plan, query/admit matching memories, then broadcast. Production planning supports semantic lookup, exact structured collection evidence, relation-graph continuation, hybrid mode, and auto mode. The controller itself depends only on application ports.
SelectionBroadcastCycle publishes each workspace broadcast to a subscriber
bus. Structured resolve, semantic recall, relation continuation, contradiction
checking, and plan critique subscribers read the same broadcast and return
proposals. gwt_attend applies accepted proposals through public ports:
follow-up memory queries, deterministic answer resolution, contradiction flags,
and follow-up flags are recorded in the trace. Repeated proposals are inhibited
across broadcasts.
Conscious items also reactivate their gwt_link targets into the preconscious
buffer for the next cycle, so recurrent attention can follow explicit memory
links instead of only parsing names from rendered broadcast text.
The strict state/admission rules are documented in
docs/gwt-runtime-contracts.md.
Start from docs/quickstart.md for local usage and
docs/demo-scenarios.md for reproducible demos, and
docs/external-subscribers.md for LLM/NLI
subscriber adapters.
MCP clients can call gwt_attend(question, keywords?, k?, planner?) for this path without
manually sequencing gwt_set_goal, gwt_query(admit=true), and gwt_broadcast.
They can also call gwt_resolve or gwt_collection_query when they need an exact runtime answer without a broadcast. The most recent attention trace is available at gwt://attention/last and summarized by gwt_trace_explain.
gwt_bus_inspect exposes the latest cycle-level bus result, including
subscriber statuses and accepted/inhibited proposal counts.
src/gwt_context/
├── domain/ # Pure domain, no I/O
│ ├── models.py # MemoryItem, Goal, WorkspaceSlot, etc.
│ ├── workspace.py # GlobalWorkspace (capacity-limited slots)
│ ├── specialists.py # 6 specialist scoring functions
│ ├── competition.py # CompetitionEngine (scoring + eviction)
│ └── broadcast.py # BroadcastAssembler (workspace → text)
├── application/ # Orchestration
│ ├── broadcast_bus.py # Post-broadcast subscribers and proposal arbitration
│ ├── attention.py # Explicit attention controller
│ ├── structured.py # Runtime collection and relation evidence
│ ├── cycle.py # SelectionBroadcastCycle + PreconsciousBuffer
│ ├── ingestion.py # Content → MemoryItem pipeline
│ └── goal_manager.py
├── infrastructure/ # Storage, embeddings
│ ├── storage.py # SQLiteMemoryStore
│ ├── vector_index.py # Numpy cosine similarity index
│ ├── embeddings.py # SentenceTransformerEmbedder
│ └── config.py # GWTConfig
├── mcp/ # MCP interface
│ ├── tools.py # 12 tool definitions
│ ├── resources.py # MCP resources
│ └── prompts.py # System + multi-hop prompts
└── server.py # FastMCP wiring + entry point
pip install -e ".[dev]"
pytest tests/unit/ tests/integration/ -qTests cover domain logic, storage, vector index, MCP boundaries, and full selection-broadcast cycles.
Benchmark harness for evaluation against OpenAI-compatible APIs (Qwen, Llama, etc. via vLLM/TGI):
pip install -e ".[dev,bench]"
cp .env.example .envpython -m tests.benchmarks.ruler_multi_hoppython -m tests.benchmarks.longbench_proSee tests/benchmarks/README.md for the full variable matrix, command examples, and reproducible output behavior.
See docs/attention-controller.md for the architecture note behind the controlled/hybrid design.
See docs/release-readiness.md for current release gates and Qwen smoke status.
See docs/honest-gwt-report.md for the current GWT claim and limitations.
See docs/dogfood-report.md for the latest real MCP/Qwen dogfood evidence.
See docs/benchmark-report-v0.3.md for the current v0.3 benchmark summary.
See docs/mcp-tool-contracts.md for stable MCP response shapes.
Each benchmark runs GWT mode (with tools) and baseline mode (all context in prompt) for comparison.
Results are saved as JSON in BENCHMARK_RESULTS_DIR (default tests/benchmarks/results/) using deterministic filenames:
{benchmark}_{model}_{timestamp}_{config_hash}.json
Benchmark modes include prompt-only baseline, model-controlled tools,
production generic attend, deterministic controlled, and hybrid mode
where GWT selection is deterministic and the model only performs final
synthesis.
Analyze failures and runtime metrics with:
python -m tests.benchmarks.analyze_results tests/benchmarks/resultsRender trace-heavy results as HTML with:
python -m tests.benchmarks.render_trace tests/benchmarks/results/<result>.jsonThe HTML report groups bus proposals by subscriber/kind, lists inhibited proposal keys and rationale, and summarizes workspace changes across trace phases.
Run a small local MCP-facing scenario without downloading embedding models:
python examples/mcp_demo.pyRun the post-release usage loop that exercises store, attend, bus inspection, trace explanation, and workspace inspection:
python examples/real_usage_loop.pyRun the external subscriber proof-of-concept:
python examples/external_subscriber_poc.pyRun the deterministic benchmark smoke used by npm test:
npm run benchmark:smokeRun a tiny model-backed Qwen/OpenAI-compatible smoke while keeping local GWT embeddings deterministic:
GWT_EMBEDDING_PROVIDER=hash GWT_EMBEDDING_MODEL=hash \
python -m tests.benchmarks.ruler_multi_hop \
--hops 2 --distractors 3 --tasks-per-config 1 --max-tasks 1 \
--gwt-mode attendOr use the bounded sanity wrapper:
npm run qwen:sanity -- --run --max-tasks 1The benchmark entrypoints load .env automatically if it exists. The repository now includes .env.example with the current RunPod-compatible defaults:
BENCHMARK_API_BASE=https://example-openai-compatible-endpoint/v1
BENCHMARK_API_PATH=/v1
BENCHMARK_MODEL=qwen3.6-35b-a3b
BENCHMARK_API_KEY=test
BENCHMARK_TIMEOUT_SECONDS=30
BENCHMARK_MAX_RETRIES=2
BENCHMARK_CONCURRENCY=16
BENCHMARK_RESULTS_DIR=tests/benchmarks/results.env is ignored by git, while .env.example is tracked so the shared setup stays visible.
You can still override everything explicitly on the CLI:
python -m tests.benchmarks.ruler_multi_hop \
--api-base "$BENCHMARK_API_BASE" \
--api-path "$BENCHMARK_API_PATH" \
--model "$BENCHMARK_MODEL" \
--api-key "$BENCHMARK_API_KEY" \
--max-tasks 3Environment variables:
| Variable | Default | Description |
|---|---|---|
GWT_WORKSPACE_CAPACITY |
7 |
Max items in workspace |
GWT_BUFFER_SIZE |
50 |
Preconscious buffer size |
GWT_GOAL_MODULATION |
0.3 |
Goal boost strength (0-1) |
GWT_MIN_ACTIVATION |
0.2 |
Ignition threshold for admitting new workspace candidates |
GWT_EMBEDDING_PROVIDER |
sentence-transformer |
sentence-transformer or deterministic local hash |
GWT_EMBEDDING_MODEL |
all-MiniLM-L6-v2 |
Sentence transformer model |
GWT_EMBEDDING_DIM |
384 |
Vector dimension for storage/search |
GWT_DATA_DIR |
~/.gwt-context |
Storage directory |
GWT_DB_PATH |
unset | Optional exact SQLite DB path override |
GWT_VECTOR_INDEX_PATH |
unset | Optional exact vector index path override |
GWT_MAX_BROADCAST_TOKENS |
4000 |
Max tokens per broadcast |
GWT_MAX_VECTOR_ELEMENTS |
100000 |
Max vector index capacity setting |
- Baars, B.J. (1988). A Cognitive Theory of Consciousness
- Hsieh et al. (2024). RULER: What's the Real Context Size of Your Long-Context Language Models?
- Dehaene & Naccache (2001). Towards a cognitive neuroscience of consciousness
MIT