CodeRefinery - Intelligent Code Search Engine

High-performance RAG system for code analysis with hybrid semantic and keyword search, optimized for 16GB VRAM systems.

System Architecture

graph TB
    subgraph "Client Layer"
        A[Open WebUI] -->|HTTP/JSON| B[Python Tool]
    end
    
    subgraph "CodeRefinery Server - Go"
        B -->|POST /search| C[Gin Router]
        C --> D[Search Handler]
        D --> E[Query Embedding]
        E -->|Ollama API| F[nomic-embed-text]
        F -->|Vector| E
        
        D --> G[Hybrid Search Engine]
        G --> H[Semantic Search]
        G --> I[Keyword Search]
        H --> J[Cosine Similarity]
        I --> K[TF-IDF Scoring]
        
        J --> L[Reciprocal Rank Fusion]
        K --> L
        L --> M[Adaptive Filtering]
        M --> N[Top-K Results]
    end
    
    subgraph "Storage & Indexing"
        O[File System] -->|Watch| P[Indexer]
        P --> Q[Universal AST Parser]
        Q --> R[Language Profiles]
        R --> S[Code Chunks]
        S --> T[Batch Embeddings]
        T -->|Ollama| F
        T --> U[SQLite DB]
        U --> V[Vector Store]
        V --> H
        V --> I
    end
    
    N -->|JSON Response| B
    B -->|Formatted Results| A
    
    style A fill:#e1f5ff
    style G fill:#fff4e6
    style U fill:#f3e5f5
    style F fill:#e8f5e9

Data Flow

sequenceDiagram
    participant User
    participant WebUI as Open WebUI
    participant Tool as Python Tool
    participant Server as Go Server
    participant Ollama
    participant DB as SQLite + Vectors
    
    Note over User,DB: Initial Setup
    User->>Server: ./refinery serve /project
    Server->>DB: Load existing index
    Server->>Ollama: Health check
    Server->>Server: Scan files (.go, .py, .rs, etc.)
    loop For each file
        Server->>Server: Parse with Universal AST
        Server->>Server: Create code chunks
        Server->>Ollama: Generate embeddings
        Ollama-->>Server: Vector embeddings
        Server->>DB: Store chunks + vectors
    end
    Server-->>User: Server ready on :8080
    
    Note over User,DB: Search Flow
    User->>WebUI: "How is auth implemented?"
    WebUI->>Tool: search_codebase("authentication")
    Tool->>Server: POST /search
    Server->>Ollama: Embed query
    Ollama-->>Server: Query vector
    Server->>DB: Retrieve all chunks
    
    par Semantic Search
        Server->>Server: Cosine similarity
    and Keyword Search
        Server->>Server: TF-IDF + phrase match
    end
    
    Server->>Server: Reciprocal Rank Fusion
    Server->>Server: Adaptive filtering
    Server-->>Tool: Top-K results + timing
    Tool-->>WebUI: Formatted code snippets
    WebUI-->>User: Answer with citations

Core Components

1. Universal AST Parser

Language-agnostic parser supporting 50+ programming languages through intelligent heuristics:

Supported Language Families:

C-Family: C, C++, Rust, Go, Java, C#, JavaScript, TypeScript, Kotlin, Swift, Scala, PHP
Python-Family: Python (indentation-based)
Ruby-Family: Ruby, Crystal (begin/end blocks)
Lisp-Family: Lisp, Scheme, Clojure (S-expressions)
ML-Family: OCaml, F#, Haskell, Elm (functional)
Shell: Bash, Zsh, Fish
Assembly: x86, ARM, MIPS
SQL: Stored Procedures and Queries
Lua: Gaming and scripting

Parsing Strategies:

Block-based: Tracks {}, begin/end, do/end delimiters
Indentation-based: Python, YAML syntax awareness
Generic: Fallback for unknown languages

2. Hybrid Search Engine

Reciprocal Rank Fusion (RRF):

RRF Score = Σ(1 / (k + rank_i))

where k=60 (optimal value from research), combining semantic and keyword rankings.

Advantages over weighted addition:

Scale-invariant (no manual weight tuning)
Robust against score inflation
Industry standard (used by Elasticsearch, OpenSearch)

Keyword Scoring Features:

Exact phrase matching (highest priority)
TF-IDF inspired term weighting
Stopword filtering (removes "the", "is", "get", etc.)
Term length weighting (longer = more specific)
Path and signature boosting

Semantic Scoring:

768-dimensional embeddings (nomic-embed-text)
Cosine similarity for relevance
Context-aware understanding

3. Adaptive Result Filtering

Multi-stage filtering for optimal result quality:

Hard limit check (user-specified max results)
Absolute minimum score threshold
Relative quality filter (40% of top score)
Elbow detection (50% score drop = thematic break)

Installation

Prerequisites

System Requirements:

Go 1.21 or higher
Ollama running locally
Open WebUI installed (Docker or native)
Minimum 4GB free RAM

Ollama Models:

# Embedding model (274MB)
ollama pull nomic-embed-text

# Your main LLM (optional)
ollama pull deepseek-r1:14b

Build Steps

Clone and initialize:

git clone https://github.com/marw-dev/coderefinery.git
cd coderefinery
go mod init coderefinery

Install dependencies:

go get github.com/gin-gonic/gin
go get github.com/fsnotify/fsnotify
go get github.com/smacker/go-tree-sitter
go get github.com/mattn/go-sqlite3

Build:

# Standard build
go build -o refinery cmd/refinery/main.go

# Optimized build (smaller binary)
go build -ldflags="-s -w" -o refinery cmd/refinery/main.go

Project Structure

coderefinery/
├── cmd/
│   └── refinery/
│       └── main.go
├── internal/
│   ├── config/
│   │   └── config.go
│   ├── domain/
│   │   └── models.go
│   ├── embedding/
│   │   ├── embedder.go
│   │   └── ollama.go
│   ├── indexer/
│   │   ├── indexer.go
│   │   ├── db.go
│   │   └── parser/
│   │       ├── parser.go
│   │       ├── universal.go
│   │       ├── treesitter.go
│   │       └── go.go
│   ├── search/
│   │   └── searcher.go
│   └── server/
│       ├── server.go
│       └── handlers.go
├── pkg/
│   └── mathutil/
│       └── vector.go
├── go.mod
├── go.sum
└── README.md

Configuration

Server Configuration

Default configuration in internal/config/config.go:

type Config struct {
    Server ServerConfig
    Ollama OllamaConfig
    Indexer IndexerConfig
}

type ServerConfig struct {
    Port           string        // "8080"
    ReadTimeout    time.Duration // 10s
    WriteTimeout   time.Duration // 10s
    MaxRequestSize int64         // 10MB
    EnableCORS     bool          // true
}

type OllamaConfig struct {
    BaseURL string        // "http://localhost:11434"
    Model   string        // "nomic-embed-text"
    Timeout time.Duration // 30s
}

type IndexerConfig struct {
    SupportedExts map[string]string
    ExcludePaths  []string
    MinChunkSize  int           // 50
    MaxChunkSize  int           // 2000
    WatchDebounce time.Duration // 2s
}

Custom Configuration File

Create config.json:

{
  "project_path": "/path/to/project",
  "server": {
    "port": "8080"
  },
  "ollama": {
    "base_url": "http://localhost:11434",
    "model": "nomic-embed-text"
  },
  "indexer": {
    "supported_extensions": {
      ".go": "go",
      ".py": "python",
      ".rs": "rust"
    },
    "exclude_paths": [
      "node_modules",
      ".git",
      "vendor"
    ]
  }
}

Load with:

./refinery serve --config config.json /path/to/project

Usage

Starting the Server

# Index current directory
./refinery serve

# Index specific project
./refinery serve /path/to/project

# With custom config
./refinery serve --config config.json --port 9000 /path/to/project

Expected Output:

Loading chunks from database...
Loaded 0 chunks from history
Scanning for files (Universal Mode)...
Loaded 15 patterns from .gitignore
Re-indexing 247 files (Dynamic Language Detection)...
Progress: 50/247
Progress: 100/247
...
Indexed 1,847 chunks from 247 files in 18.4s
File watcher enabled
Server running on http://localhost:8080

API Endpoints

Health Check:

curl http://localhost:8080/health

Response:

{
  "status": "ok",
  "chunks": 1847,
  "files": 247
}

Statistics:

curl http://localhost:8080/stats

Response:

{
  "TotalChunks": 1847,
  "TotalFiles": 247,
  "Languages": {
    "go": 1245,
    "python": 312,
    "rust": 290
  },
  "LastIndexed": "2025-01-04T15:23:45Z"
}

Search:

curl -X POST http://localhost:8080/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "database connection pool",
    "limit": 5,
    "min_score": 0.3,
    "languages": ["go"],
    "chunk_types": ["function"]
  }'

Open WebUI Integration

Open WebUI Settings → Functions → Add Function
Copy Python tool code (see artifacts)
Configure Valves:
- Docker: REFINERY_URL = "http://host.docker.internal:8080"
- Native: REFINERY_URL = "http://localhost:8080"
- Linux Docker: REFINERY_URL = "http://172.17.0.1:8080"
Enable function and save
In chat, enable CodeRefinery tool

Query Examples

Basic Search:

"How is user authentication implemented?"

Filtered Search:

"Show me database queries in Go"

Architecture Questions:

"Explain the error handling strategy"

Code Location:

"Where is the JWT token validation?"

Performance Optimization

VRAM Budget (16GB)

Component	VRAM Usage
CodeRefinery Server	0 MB (uses RAM)
nomic-embed-text	274 MB
DeepSeek-R1 14B	11.2 GB
Context Buffer	~4 GB
Total	~15.5 GB

Memory Optimization

Reduce Chunk Size:

// internal/config/config.go
IndexerConfig{
    MinChunkSize: 30,   // from 50
    MaxChunkSize: 1500, // from 2000
}

Limit Results:

# Open WebUI Tool
class Valves:
    DEFAULT_LIMIT = 3  # from 5
    MAX_LIMIT = 10     # from 15

Batch Size Tuning:

// internal/embedding/ollama.go
maxConcurrency := 3  // from 5 (less memory, slower)
maxConcurrency := 10 // from 5 (more memory, faster)

Advanced Features

Language Profile Extension

Add custom language support in internal/indexer/parser/universal.go:

profiles["your-lang"] = LanguageProfile{
    BlockStart: []string{"begin"},
    BlockEnd:   []string{"end"},
    FunctionPatterns: []*regexp.Regexp{
        regexp.MustCompile(`(?m)^\s*function\s+\w+`),
    },
    LineComment: []string{"//"},
}

Custom Chunk Types

Extend internal/domain/models.go:

const (
    ChunkTypeFunction  ChunkType = "function"
    ChunkTypeClass     ChunkType = "class"
    ChunkTypeInterface ChunkType = "interface"
    ChunkTypeStruct    ChunkType = "struct"
    ChunkTypeEnum      ChunkType = "enum"      // NEW
    ChunkTypeConstant  ChunkType = "constant"  // NEW
    ChunkTypeGeneric   ChunkType = "generic"
)

Multi-Repository Support

Extend server to handle multiple codebases:

type RepositoryManager struct {
    indices map[string]*indexer.Indexer
    mu      sync.RWMutex
}

func (rm *RepositoryManager) Search(repo string, query string) []SearchResult {
    rm.mu.RLock()
    idx := rm.indices[repo]
    rm.mu.RUnlock()
    
    return idx.Search(query)
}

API Usage:

curl -X POST http://localhost:8080/search?repo=backend \
  -d '{"query": "authentication"}'

Persistent Embeddings Cache

Store embeddings on disk to speed up restarts:

// internal/indexer/db.go
func (db *DB) SaveEmbeddings() error {
    // Already implemented via SQLite
    return nil
}

func (db *DB) LoadEmbeddings() (map[string][]CodeChunk, error) {
    return db.LoadAllChunks()
}

Troubleshooting

Server Connection Issues

Problem: "Cannot connect to CodeRefinery server"

Diagnosis:

# Check if server is running
ps aux | grep refinery

# Check port availability
netstat -tuln | grep 8080

# Check server logs
./refinery serve  # run in foreground

Solutions:

Start server: ./refinery serve
Check firewall: sudo ufw allow 8080
Verify URL in Open WebUI tool matches server address

Indexing Issues

Problem: "Indexed 0 chunks"

Diagnosis:

# Check for supported files
find . -name "*.go" -o -name "*.py" -o -name "*.rs" | head -20

# Check excludes
grep -r "node_modules\|.git\|vendor" .gitignore

Solutions:

Add file extensions to config
Remove unnecessary excludes
Check file permissions: ls -la

Embedding Failures

Problem: "Failed to generate embeddings"

Diagnosis:

# Check Ollama status
ollama list

# Test embedding directly
curl http://localhost:11434/api/embeddings \
  -d '{"model":"nomic-embed-text","prompt":"test"}'

Solutions:

Pull model: ollama pull nomic-embed-text
Restart Ollama: systemctl restart ollama
Check Ollama logs: journalctl -u ollama -f

Docker Networking

Problem: "host.docker.internal" not resolving

Linux Solution:

# Use bridge network IP
REFINERY_URL = "http://172.17.0.1:8080"

Alternative: Use host networking

docker run --network host open-webui/open-webui

Slow Search Performance

Problem: Search takes over 1 second

Diagnosis:

Check chunk count: curl http://localhost:8080/stats
Monitor VRAM: nvidia-smi
Check Ollama response time

Solutions:

Reduce indexed files (add excludes)
Increase batch size for embeddings
Use faster embedding model (mxbai-embed-large)
Add indexes to SQLite DB

Best Practices

Query Design

Effective Queries:

"Show JWT token validation in auth module"
"How is database connection pooling implemented?"
"Find error handling in API handlers"

Ineffective Queries:

"Show me code" (too vague)
"Everything about users" (too broad)
"Fix this" (no search context)

Iterative Refinement

Start broad, then narrow:

User: "How does logging work?"
      → search_codebase("logging system")

User: "Focus on error logs"
      → search_codebase("error logging implementation")

User: "Show the log rotation logic"
      → search_codebase("log rotation", path_filter="logging/")

Filter Usage

Combine filters for precision:

{
  "query": "database queries",
  "languages": ["go"],
  "path_filter": "internal/db",
  "chunk_types": ["function", "method"]
}

Monitoring

Track system health:

# Watch VRAM usage
watch -n 1 'nvidia-smi --query-gpu=memory.used --format=csv'

# Monitor search latency
curl -w "@curl-format.txt" -X POST http://localhost:8080/search

# Check index freshness
curl http://localhost:8080/stats | jq '.LastIndexed'

Security Considerations

Network Security

Firewall Configuration:

# Allow only local connections
sudo ufw deny 8080
sudo ufw allow from 127.0.0.1 to any port 8080

Reverse Proxy (Production):

server {
    listen 443 ssl;
    server_name coderefinery.internal;
    
    location / {
        proxy_pass http://localhost:8080;
        proxy_set_header Host $host;
    }
}

Data Privacy

Code embeddings are stored locally in SQLite
No data sent to external services
Ollama runs entirely on-premises
All processing happens in local RAM/VRAM

Access Control

Implement authentication in production:

// internal/server/middleware.go
func AuthMiddleware() gin.HandlerFunc {
    return func(c *gin.Context) {
        token := c.GetHeader("Authorization")
        if !validateToken(token) {
            c.AbortWithStatus(401)
            return
        }
        c.Next()
    }
}

Contributing

Development Setup

Fork repository
Create feature branch: git checkout -b feature/new-parser
Install development tools:

go install golang.org/x/tools/cmd/goimports@latest
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest

Code Style

Run before committing:

# Format code
gofmt -w .
goimports -w .

# Lint
golangci-lint run

# Test
go test ./...

Adding Language Support

Add to LanguageProfile in universal.go
Test with sample code
Update documentation
Submit PR with examples

License

MIT License - See LICENSE file for details

Acknowledgments

Tree-sitter for AST parsing
Ollama for local embeddings
Gin framework for HTTP server
Open WebUI for LLM interface

Support

GitHub Issues: Report bugs and feature requests
Documentation: This README and inline code comments
Health endpoint: Monitor server status at /health

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
cmd/refinery		cmd/refinery
internal		internal
pkg/mathutil		pkg/mathutil
webui		webui
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
README.md		README.md
config.json		config.json
go.mod		go.mod
go.sum		go.sum
setup.bat		setup.bat
setup.sh		setup.sh
start.bat		start.bat
start.sh		start.sh

License

marw-dev/coderefinery

Folders and files

Latest commit

History

Repository files navigation

CodeRefinery - Intelligent Code Search Engine

System Architecture

Data Flow

Core Components

1. Universal AST Parser

2. Hybrid Search Engine

3. Adaptive Result Filtering

Installation

Prerequisites

Build Steps

Project Structure

Configuration

Server Configuration

Custom Configuration File

Usage

Starting the Server

API Endpoints

Open WebUI Integration

Query Examples

Performance Optimization

VRAM Budget (16GB)

Memory Optimization

Advanced Features

Language Profile Extension

Custom Chunk Types

Multi-Repository Support

Persistent Embeddings Cache

Troubleshooting

Server Connection Issues

Indexing Issues

Embedding Failures

Docker Networking

Slow Search Performance

Best Practices

Query Design

Iterative Refinement

Filter Usage

Monitoring

Security Considerations

Network Security

Data Privacy

Access Control

Contributing

Development Setup

Code Style

Adding Language Support

License

Acknowledgments

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages