Skip to content

glyphh-ai/model-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Glyphh Code

File-level codebase intelligence for Claude Code. Encodes every source file in your repo as an HDC vector. Claude Code queries the index instead of scanning files.

Same architecture as glyphh-pipedream (3,146 apps) and glyphh-bfcl (#1 on BFCL V4). No LLM at build time. No LLM at search time. Pure HDC encoding and cosine search.

Built on Glyphh Ada 1.1 · Docs → · Glyphh Hub →


WORK IN PROGRESS — This model is under active development. Benchmarks show Glyphh uses 20% fewer tokens and 22% fewer turns than bare Claude Code, with equal search accuracy (13/15). Overall accuracy is 76% vs 84% due to MCP startup latency causing timeouts — not an HDC issue. See benchmark/BENCHMARK.md for full results and analysis.

Getting Started

1. Install the Glyphh CLI

# Create and activate a virtual environment (recommended)
python3 -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

# Install with runtime dependencies (includes FastAPI, SQLAlchemy, pgvector)
pip install 'glyphh[runtime]'

2. Clone and start the model

This model requires PostgreSQL + pgvector for similarity search.

git clone https://github.com/glyphh-ai/model-code.git
cd model-code

# Start the Glyphh shell (prompts login on first run)
glyphh

# Inside the shell:
# glyphh> docker init       # generates docker-compose.yml + init.sql
# glyphh> exit

# Start PostgreSQL + pgvector and the Glyphh runtime
docker compose up -d --wait

This starts:

  • PostgreSQL 16 + pgvector on port 5432 (with HNSW indexing)
  • Glyphh Runtime on port 8002

Swagger docs available at http://localhost:8002/docs in local mode.

3. Deploy the model

glyphh
# glyphh> model deploy .     # deploy code model to runtime

4. Compile your codebase

# Full compile (all indexable files)
python compile.py /path/to/your/repo --runtime-url http://localhost:8002

# Incremental (changed files since last commit)
python compile.py /path/to/your/repo --incremental

# Incremental from a child repo / submodule commit
python compile.py /path/to/your/repo --incremental --diff-repo /path/to/child/repo

# Dry run (show what would be indexed)
python compile.py /path/to/your/repo --dry-run

The --diff-repo flag tells compile.py to run git diff HEAD^ HEAD in a different repo than the source directory. Changed file paths are resolved relative to the child repo but the source directory is still used as the compile root. This is how the post-commit hook handles commits in monorepo subdirectories, child repos, and submodules.

5. Connect Claude Code

Add the MCP server using the Claude Code CLI:

claude mcp add --transport http glyphh http://localhost:8002/{org_id}/code/mcp

To find your org ID, run glyphh auth status in the Glyphh shell:

glyphh
# glyphh> auth status
#   org_id: your-org-id-here

In local mode the org ID is local-dev-org:

claude mcp add --transport http glyphh http://localhost:8002/local-dev-org/code/mcp

Restart Claude Code to pick up the MCP config. In VS Code: Cmd+Shift+P → "Claude Code: Restart". In the CLI: exit and re-enter the session.

Verify the connection with /mcp — you should see glyphh_search, glyphh_related, and glyphh_stats listed as available tools.

6. Add CLAUDE.md (recommended)

Copy the included CLAUDE.md into your project root:

cp CLAUDE.md /path/to/your/project/CLAUDE.md

Claude Code loads this file automatically at the start of every conversation. It teaches Claude Code to always search the Glyphh index before reading files, check blast radius before editing, and use top_tokens and imports from search results to avoid unnecessary file reads.

Without it, Claude Code will still have the MCP tools available but will fall back to its default file scanning behavior.


What it does

Compiles your codebase into a vector index. Exposes it to Claude Code via MCP.

Without Glyphh: Claude reads project structure, scans likely files, reads module, reads tests. ~6,000 tokens before first useful output.

With Glyphh: Claude calls glyphh_search("auth token validation"). Returns: file path, confidence, top concepts, imports, related files. Claude reads one file and acts. ~400 tokens before first useful output.

The index stores not just the vector but the token vocabulary of every file. Search results return enough context that Claude often does not need to read the file at all. When it does read, it already knows exactly what to look for.

Architecture

Same paradigm as all Glyphh models. The file is the exemplar.

Build time:  read file → tokenize path + identifiers + imports
             → encode into HDC vector → store vector + metadata in pgvector

Runtime:     NL query → encode with same pipeline
             → cosine search against index
             → return file path + top tokens + imports
             → Claude reads one file, acts

No LLM at build time. No LLM at runtime for search.

Encoder

Two-layer HDC encoder at 2,000 dimensions (pgvector HNSW compatible):

Layer Weight Signal
path 0.30 File path tokens (BoW): src/services/user_service.pysrc services user service py
content 0.70 Source file vocabulary
↳ identifiers 1.0 All tokens from file content. camelCase/snake_case split before encoding
↳ imports 0.8 Import/require/include targets. Strong cross-file dependency signal

Metadata stored per file (not encoded, returned at search time):

  • top_tokens: 20 most frequent meaningful tokens
  • imports: list of imported module/package names
  • extension: file type
  • file_size: bytes

MCP Tools

Exposed through the runtime's model-specific MCP tool system:

glyphh_search

Find files by natural language query. Returns ranked matches with confidence scores, top tokens, and import lists.

{"tool": "glyphh_search", "arguments": {"query": "auth token validation", "top_k": 5}}

Confidence gate: below threshold returns ASK with candidates, never silent wrong routing.

glyphh_related

Find files semantically related to a given file. Use before editing to understand blast radius.

{"tool": "glyphh_related", "arguments": {"file_path": "src/services/auth.py", "top_k": 5}}

glyphh_stats

Index statistics: total files, extension breakdown.

Drift scoring

The drift.py module computes semantic drift between file versions:

Drift Label Meaning
0.00–0.10 cosmetic Formatting, comments, rename
0.10–0.30 moderate Logic update, new function
0.30–0.60 significant Behavioral change, new dependency
0.60–1.00 architectural Rewrite, interface change

Incremental compile

# Recompile only files changed in the last commit
python compile.py . --incremental

# Recompile files changed in a specific commit
python compile.py . --diff abc123

# Recompile when commit was in a child repo / submodule
python compile.py /path/to/monorepo --incremental --diff-repo /path/to/monorepo/child-repo

The index is updated automatically after every commit via the Claude Code PostToolUse hook (see Claude Code hooks below). For non-Claude workflows, a git post-commit hook is included at hooks/post-commit.

File support

Indexes: .py, .ts, .tsx, .js, .jsx, .java, .cpp, .c, .h, .go, .rs, .rb, .cs, .swift, .sql, .graphql, .yaml, .json, .sh, .css, .html, .svelte, .vue, .md, .proto, .tf, and more.

Skips: .git, node_modules, __pycache__, dist, build, vendor, target, and other build/cache directories.

Max file size: 500 KB. Binary files auto-skipped.

Disable MCP permission prompts

By default Claude Code prompts for permission each time it calls an MCP tool. To allow Glyphh tools silently, add them to .claude/settings.json in your project:

{
  "permissions": {
    "allow": [
      "mcp__glyphh__glyphh_search",
      "mcp__glyphh__glyphh_related",
      "mcp__glyphh__glyphh_drift",
      "mcp__glyphh__glyphh_risk"
    ]
  }
}

Or use a wildcard to allow all tools from the Glyphh server:

{
  "permissions": {
    "allow": [
      "mcp__glyphh__*"
    ]
  }
}

The first matching rule wins — Glyphh tools run silently while everything else still prompts.

Claude Code hooks

Two hooks are included to integrate Glyphh with Claude Code:

  1. enforce-glyphh-search.sh (PreToolUse) — blocks Grep and Glob calls, redirecting Claude to glyphh_search instead
  2. post-commit-compile.sh (PostToolUse) — runs compile.py --incremental after every git commit to keep the index up to date

post-commit-compile.sh

The post-commit hook takes a source directory as its first argument. This is the root of the codebase you want indexed. The hook fires on any git commit that happens inside that directory — whether the commit is in the source directory itself, a child repo, or a submodule.

When a commit lands in a child repo, the hook passes --diff-repo to compile.py so it diffs the correct repo while still compiling against the source directory root.

Add both hooks to .claude/settings.json in your project:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Grep|Glob",
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/model-code/hooks/enforce-glyphh-search.sh"
          }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "/path/to/model-code/hooks/post-commit-compile.sh /path/to/source/dir"
          }
        ]
      }
    ]
  }
}

Replace /path/to/model-code with wherever you cloned this repo and /path/to/source/dir with the root of the codebase to index.

Environment variables

Variable Default Description
GLYPHH_RUNTIME_URL http://localhost:8002 Runtime endpoint
GLYPHH_TOKEN Auto-resolved from CLI session Auth token
GLYPHH_ORG_ID Auto-resolved from CLI session Org ID
GLYPHH_PYTHON /opt/homebrew/anaconda3/bin/python Python interpreter (must have requests)
GLYPHH_HOOK_DISABLE Set to 1 to temporarily disable the hook

Tests

cd glyphh-models/code
PYTHONPATH=../../glyphh-runtime pytest tests/ -v

About

Glyphh Code — file-level codebase intelligence via HDC encoding

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages