Project Cortex

Make your AI coding assistant literally smarter.

Project Cortex provides deep semantic understanding of both code and documentation to LLM-powered coding tools like Claude Code, Cursor, and others. By parsing, indexing, and chunking your code and docs into a searchable vector database, it enables AI assistants to grasp not just what the code does, but why it exists—surfacing architectural decisions, design patterns, and team knowledge that lives in documentation.

What It Does

Project Cortex has two main components:

Intelligent Code & Documentation Indexer - Extracts structured knowledge from your project:

Code Extraction (via tree-sitter):
- Symbols: High-level overview (packages, imports, type/function names with line numbers)
- Definitions: Full type definitions, interfaces, and function signatures
- Data: Constants and initialized variables
Documentation Extraction:
- Semantic chunking: Splits docs by headers/sections when token limits allow
- Architectural context: Surfaces design docs, ADRs, best practices
- Multi-format support: Markdown, RST, and text files
MCP Server - Loads indexed chunks into an in-memory vector database (chromem-go) and exposes them via the Model Context Protocol, enabling AI coding assistants to semantically search both your code and documentation simultaneously.

Why Project Cortex?

Architectural Understanding: LLMs access design decisions, system architecture, and the "why" behind code—not just the "what"
Semantic Search: Find relevant code and docs by meaning, not just keywords
Unified Knowledge Base: Search implementation and rationale together- bridge the gap between code and intent
Privacy-First: Supports local embedding models for sensitive codebases
Fast Incremental Updates: Only reprocesses changed files
Git-Friendly: Indexes stored as JSON files that can be version controlled

Quick Start

Installation

Option 1: Install via go install (Recommended)

go install github.com/mvp-joe/project-cortex/cmd/cortex@v1.3.1

This installs the cortex CLI which includes:

Code and documentation indexer
MCP server for AI assistants

Option 2: Download pre-built binary

Download the latest release for your platform from GitHub Releases:

cortex - Main CLI for indexing and MCP server

Index Your Project

Navigate to your project directory and run:

# One-time indexing
cortex index

# Watch mode for active development
cortex index --watch

This creates a .cortex/ directory with:

.cortex/
  config.yml                 # Configuration
  chunks/
    code-symbols.json        # High-level code map
    code-definitions.json    # Type/function signatures
    code-data.json           # Constants and values
    doc-chunks.json          # Documentation (README, guides, etc.)

The doc-chunks.json file contains chunked documentation (split by headers/sections within token limits), enabling your AI assistant to understand architectural decisions, design patterns, and the reasoning behind implementation choices.

Configure MCP Integration

Option 1: Per-Project Configuration (Recommended)

Create .mcp.json in your project root:

{
  "mcpServers": {
    "cortex": {
      "command": "cortex",
      "args": ["mcp"]
    }
  }
}

Option 2: Global Configuration

Add to ~/.claude/mcp.json:

{
  "mcpServers": {
    "cortex": {
      "command": "cortex",
      "args": ["mcp"],
      "cwd": "/path/to/your/project"
    }
  }
}

See MCP Integration Guide for detailed setup instructions.

Supported Languages

Go
TypeScript / JavaScript (including JSX/TSX)
Python
Rust
C / C++
PHP
Ruby
Java

See Language Support for details on what gets extracted from each language.

How It Works

Parse: Tree-sitter analyzes your code's AST
Extract: Three-tier extraction creates structured representations
Chunk: Code and docs are chunked for optimal vector search
Embed: Content is embedded using configurable models
Index: Chunks stored as version-controlled JSON files
Search: MCP server loads chunks into in-memory vector DB for semantic queries

For a deep dive, see Architecture.

About Embeddings

Project Cortex uses vector embeddings to enable semantic search- finding code and documentation by meaning, not just keywords. By default, Cortex uses cortex-embed, a standalone embedding server that:

Runs as a shared service across all your projects
Loads an ML model once into memory (instead of per project)
Provides local, privacy-first embeddings (your code never leaves your machine)
Automatically downloads and starts when needed- no manual setup

Note on Binary Size: The cortex-embed binary is ~300MB because it bundles a complete Python 3.11 runtime and ML libraries (sentence-transformers, PyTorch). This design choice prioritizes zero-dependency installation over file size—users don't need to manage Python environments, pip dependencies, or model downloads. The binary is downloaded once to ~/.cortex/bin/ and shared across all projects.

See cortex-embed documentation for technical details.

Future support: We plan to support remote embedding providers (OpenAI, Anthropic, etc.) for users who prefer remote embedding providers.

Configuration

Create or edit .cortex/config.yml:

Example:

#Embedding model configuration
embedding:
  provider: "local"  # or "openai"
  model: "BAAI/bge-small-en-v1.5"
  dimensions: 384  # Vector size (must match model)
  endpoint: "http://localhost:8080/embed"

# Indexing options
indexing:
  ignore_patterns:
    - "node_modules/**"
    - "vendor/**"
    - ".git/**"
  max_chunk_size: 1000

# Languages to index (default: all supported)
languages:
  - go
  - typescript
  - python

See Configuration Guide for all options.

Development

This project uses Task for building and development. Common commands:

# List all available tasks
task --list

# Build binaries
task build              # Build cortex CLI
task build:embed        # Build cortex-embed with Python runtime
task build:cross:all    # Cross-compile for all platforms

# Run
task run                # Build and run cortex
task run:embed          # Build and run embedding server

# Testing & Quality
task test               # Run tests
task test:coverage      # Run tests with coverage report
task check              # Run all checks (fmt, vet, lint, test)

# Development
task fmt                # Format code
task lint               # Run linter
task info               # Show build information

# Python Dependencies (for cortex-embed)
task python:deps:darwin-arm64    # Generate for macOS ARM64 (fast)
task python:deps:all             # Generate for all platforms (slow)

# Clean
task clean              # Remove build artifacts
task clean:all          # Remove builds and Python deps

See task --list for all available commands or check the Taskfile.yml.

Adding Language Support

See Contributing Guide for how to add new language parsers.

Documentation

Architecture - How Project Cortex works under the hood
Configuration - Configuration options and customization
MCP Integration - Setting up with Claude Code, Cursor, etc.
Language Support - Supported languages and extraction details
Contributing - Development workflow and guidelines

Use Cases

Large codebases: Maintain architectural context across thousands of files- understand system design, not just individual functions
Onboarding for humans and AI: New engineers grasp design philosophy, best practices, and the "why" behind technical decisions
Legacy systems: Discover architectural decisions and constraints that aren't obvious from code alone
Complex domains: Projects where understanding requires both code and extensive domain knowledge documentation
Well-documented projects: Teams that invest in design docs, ADRs, and architectural guides benefit from semantic access to this knowledge
Regulated industries: Medical, financial, or compliance-heavy codebases where documentation explains constraints and requirements
Understanding trade-offs: Surface documented discussions of why approach A was chosen over approach B

License

Project Cortex is licensed under the Apache License 2.0. See LICENSE for the full license text.

Contributing

Contributions welcome! See Contributing Guide.

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
.claude		.claude
.cortex		.cortex
.github/workflows		.github/workflows
cmd		cmd
docs		docs
internal		internal
specs		specs
testdata		testdata
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
.mcp.json		.mcp.json
.mcp.json.example		.mcp.json.example
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
RELEASING.md		RELEASING.md
Taskfile.yml		Taskfile.yml
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Project Cortex

What It Does

Why Project Cortex?

Quick Start

Installation

Index Your Project

Configure MCP Integration

Supported Languages

How It Works

About Embeddings

Configuration

Development

Adding Language Support

Documentation

Use Cases

License

Contributing

About

Uh oh!

Releases

Packages

Languages

Uh oh!

License

Uh oh!

mvp-joe/project-cortex

Folders and files

Latest commit

History

Repository files navigation

Project Cortex

What It Does

Why Project Cortex?

Quick Start

Installation

Index Your Project

Configure MCP Integration

Supported Languages

How It Works

About Embeddings

Configuration

Development

Adding Language Support

Documentation

Use Cases

License

Contributing

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages