Pearl

Pearl is a RAG (Retrieval-Augmented Generation) system built with Elixir and Phoenix. It generates comprehensive wikis from code repositories, allowing you to ask questions about any codebase using natural language.

This project was inspired by DeepWiki from Devin and created as a learning exercise to explore Elixir, Phoenix LiveView, and RAG architectures—starting with naive RAG and progressing through techniques from recent research papers.

Named after Pearl I. Young (1895–1968), the first female technical employee of NACA (which became NASA) and the second female physicist in the U.S. federal government. After earning degrees in physics, chemistry, and mathematics from the University of North Dakota in 1919, she joined NACA's Langley Laboratory in 1922 as a physicist calibrating flight instrumentation. In 1929, she became Langley's Chief Technical Editor and established the NACA technical reports system, authoring the Style Manual for Engineering Authors that shaped how government aerospace engineers communicated for decades. NASA's History Office called her "the architect of the NACA technical reports system." In 2015, she was inducted into NASA Langley's Hall of Honor.

What Does Pearl Do?

Clone any Git repository — Point Pearl at a GitHub URL and it fetches the code
Generate a wiki — An LLM analyzes the codebase and creates structured documentation
Ask questions — Use the built-in chat to ask questions about the code; Pearl finds relevant code snippets and explains them

Prerequisites

Before setting up Pearl, you'll need to install:

1. Elixir and Erlang

Elixir is the programming language Pearl is written in. The easiest way to install it:

macOS (using Homebrew)

brew install elixir

Other platforms

Follow the official Elixir installation guide.

Verify the installation:

elixir --version
# Should show Elixir 1.15 or higher

2. PostgreSQL with pgvector

Pearl uses PostgreSQL to store repository data and vector embeddings for search. The easiest way is Docker (recommended):

docker compose up -d

This starts PostgreSQL 18 with pgvector pre-installed. Data persists across restarts via a named volume.

Port conflict? If port 5432 is already in use:

export PEARL_DB_PORT=5433
docker compose up -d

Alternative: Native install

macOS (using Homebrew)

brew install postgresql@16 pgvector
brew services start postgresql@16

Other platforms

See the PostgreSQL download page and pgvector installation instructions.

3. LLM Provider

Pearl needs an LLM to generate wikis and answer questions. Choose one:

Option A: OpenRouter (Recommended for beginners)

Create an account at openrouter.ai
Generate an API key

Set the environment variable:

export OPENROUTER_API_KEY=sk-your-key-here

Option B: Ollama (Run models locally)

Install from ollama.ai
Pull a model:
```
ollama pull llama3.2:3b
```

Setup

Clone this repository:

git clone https://github.com/existential-birds/pearl.git
cd pearl/pearl

Start PostgreSQL (if using Docker):
```
docker compose up -d
```

Configure your LLM provider by setting environment variables (either export directly in your terminal or add to a .env file to source later):

# For OpenRouter (recommended)
export LLM_PROVIDER=openrouter
export LLM_MODEL=openai/gpt-5.2
export EMBEDDING_MODEL=openai/text-embedding-3-small
export OPENROUTER_API_KEY=sk-your-key-here

# For Ollama (local)
# export LLM_PROVIDER=ollama
# export OLLAMA_HOST=http://localhost:11434
# export OLLAMA_DEFAULT_MODEL=llama3.2:3b

Run setup:
```
mix setup
```
Start the server:
```
mix phx.server
```
Open Pearl in your browser at http://localhost:4000

Usage

On the home page, paste a GitHub repository URL and click "Clone"
Once cloned, click "Generate Wiki" to create documentation
Browse the generated wiki pages
Use the chat panel to ask questions about the codebase

Architecture

Pearl combines several components:

Phoenix LiveView — Real-time web interface with no JavaScript required
RAG Pipeline — Chunks code files, generates embeddings, and searches for relevant context
LLM Integration — Supports both cloud (OpenRouter) and local (Ollama) providers
pgvector — Stores and searches vector embeddings for similarity matching

For detailed architecture documentation, see CLAUDE.md.

RAG Implementation

One goal of Pearl is to explore different RAG (Retrieval-Augmented Generation) architectures. We start with the simplest approach and progressively implement more sophisticated techniques from research papers.

Current: Naive RAG

Pearl currently implements Naive RAG, the baseline architecture:

Component	Implementation
Chunking	Fixed 500-token chunks with semantic break detection (paragraph boundaries preferred)
Embedding	OpenAI `text-embedding-3-small` (1536 dimensions) via OpenRouter, or `nomic-embed-text` via Ollama
Vector Store	PostgreSQL with pgvector extension, HNSW indexing
Retrieval	Top-5 chunks by cosine similarity
Generation	Retrieved chunks concatenated into system prompt with chat history

This approach is simple and works well for small-to-medium codebases, but has known limitations: no chunk overlap means context can be lost at boundaries, fixed-size chunking ignores code semantics, and top-k retrieval may miss relevant but dissimilar chunks.

Roadmap: Advanced RAG Techniques

Future implementations will explore strategies from ottomator-agents, combining 3-5 techniques for optimal results:

Re-ranking — Two-stage retrieval with cross-encoder scoring (MS MARCO)
Contextual Retrieval — LLM adds context to chunks before embedding (Anthropic)
Context-aware Chunking — Split at semantic boundaries via Docling
Late Chunking — Embed full document, then chunk (arXiv:2409.04701)
Query Expansion / Multi-Query — Generate query variations for broader coverage
Hierarchical RAG — Search child chunks, return parent context
Knowledge Graphs — Vector search + graph traversal (Graphiti)
Agentic RAG — Agent chooses retrieval method per query (arXiv:2501.09136)
Self-Reflective RAG — LLM grades and refines retrieval (arXiv:2310.11511)
Fine-tuned Embeddings — Domain-specific embedding models for 5-10% accuracy gain

Development

# Run tests
mix test

# Format code
mix format

# Run pre-commit checks
mix precommit

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 148 Commits
.claude/commands		.claude/commands
docs		docs
pearl		pearl
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODEOWNERS		CODEOWNERS
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pearl

What Does Pearl Do?

Prerequisites

1. Elixir and Erlang

macOS (using Homebrew)

Other platforms

2. PostgreSQL with pgvector

macOS (using Homebrew)

Other platforms

3. LLM Provider

Option A: OpenRouter (Recommended for beginners)

Option B: Ollama (Run models locally)

Setup

Usage

Architecture

RAG Implementation

Current: Naive RAG

Roadmap: Advanced RAG Techniques

Development

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pearl

What Does Pearl Do?

Prerequisites

1. Elixir and Erlang

macOS (using Homebrew)

Other platforms

2. PostgreSQL with pgvector

macOS (using Homebrew)

Other platforms

3. LLM Provider

Option A: OpenRouter (Recommended for beginners)

Option B: Ollama (Run models locally)

Setup

Usage

Architecture

RAG Implementation

Current: Naive RAG

Roadmap: Advanced RAG Techniques

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages