High-performance RAG system for code analysis with hybrid semantic and keyword search, optimized for 16GB VRAM systems.
graph TB
subgraph "Client Layer"
A[Open WebUI] -->|HTTP/JSON| B[Python Tool]
end
subgraph "CodeRefinery Server - Go"
B -->|POST /search| C[Gin Router]
C --> D[Search Handler]
D --> E[Query Embedding]
E -->|Ollama API| F[nomic-embed-text]
F -->|Vector| E
D --> G[Hybrid Search Engine]
G --> H[Semantic Search]
G --> I[Keyword Search]
H --> J[Cosine Similarity]
I --> K[TF-IDF Scoring]
J --> L[Reciprocal Rank Fusion]
K --> L
L --> M[Adaptive Filtering]
M --> N[Top-K Results]
end
subgraph "Storage & Indexing"
O[File System] -->|Watch| P[Indexer]
P --> Q[Universal AST Parser]
Q --> R[Language Profiles]
R --> S[Code Chunks]
S --> T[Batch Embeddings]
T -->|Ollama| F
T --> U[SQLite DB]
U --> V[Vector Store]
V --> H
V --> I
end
N -->|JSON Response| B
B -->|Formatted Results| A
style A fill:#e1f5ff
style G fill:#fff4e6
style U fill:#f3e5f5
style F fill:#e8f5e9
sequenceDiagram
participant User
participant WebUI as Open WebUI
participant Tool as Python Tool
participant Server as Go Server
participant Ollama
participant DB as SQLite + Vectors
Note over User,DB: Initial Setup
User->>Server: ./refinery serve /project
Server->>DB: Load existing index
Server->>Ollama: Health check
Server->>Server: Scan files (.go, .py, .rs, etc.)
loop For each file
Server->>Server: Parse with Universal AST
Server->>Server: Create code chunks
Server->>Ollama: Generate embeddings
Ollama-->>Server: Vector embeddings
Server->>DB: Store chunks + vectors
end
Server-->>User: Server ready on :8080
Note over User,DB: Search Flow
User->>WebUI: "How is auth implemented?"
WebUI->>Tool: search_codebase("authentication")
Tool->>Server: POST /search
Server->>Ollama: Embed query
Ollama-->>Server: Query vector
Server->>DB: Retrieve all chunks
par Semantic Search
Server->>Server: Cosine similarity
and Keyword Search
Server->>Server: TF-IDF + phrase match
end
Server->>Server: Reciprocal Rank Fusion
Server->>Server: Adaptive filtering
Server-->>Tool: Top-K results + timing
Tool-->>WebUI: Formatted code snippets
WebUI-->>User: Answer with citations
Language-agnostic parser supporting 50+ programming languages through intelligent heuristics:
Supported Language Families:
- C-Family: C, C++, Rust, Go, Java, C#, JavaScript, TypeScript, Kotlin, Swift, Scala, PHP
- Python-Family: Python (indentation-based)
- Ruby-Family: Ruby, Crystal (begin/end blocks)
- Lisp-Family: Lisp, Scheme, Clojure (S-expressions)
- ML-Family: OCaml, F#, Haskell, Elm (functional)
- Shell: Bash, Zsh, Fish
- Assembly: x86, ARM, MIPS
- SQL: Stored Procedures and Queries
- Lua: Gaming and scripting
Parsing Strategies:
- Block-based: Tracks
{},begin/end,do/enddelimiters - Indentation-based: Python, YAML syntax awareness
- Generic: Fallback for unknown languages
Reciprocal Rank Fusion (RRF):
RRF Score = Σ(1 / (k + rank_i))
where k=60 (optimal value from research), combining semantic and keyword rankings.
Advantages over weighted addition:
- Scale-invariant (no manual weight tuning)
- Robust against score inflation
- Industry standard (used by Elasticsearch, OpenSearch)
Keyword Scoring Features:
- Exact phrase matching (highest priority)
- TF-IDF inspired term weighting
- Stopword filtering (removes "the", "is", "get", etc.)
- Term length weighting (longer = more specific)
- Path and signature boosting
Semantic Scoring:
- 768-dimensional embeddings (nomic-embed-text)
- Cosine similarity for relevance
- Context-aware understanding
Multi-stage filtering for optimal result quality:
- Hard limit check (user-specified max results)
- Absolute minimum score threshold
- Relative quality filter (40% of top score)
- Elbow detection (50% score drop = thematic break)
System Requirements:
- Go 1.21 or higher
- Ollama running locally
- Open WebUI installed (Docker or native)
- Minimum 4GB free RAM
Ollama Models:
# Embedding model (274MB)
ollama pull nomic-embed-text
# Your main LLM (optional)
ollama pull deepseek-r1:14b- Clone and initialize:
git clone https://github.com/marw-dev/coderefinery.git
cd coderefinery
go mod init coderefinery- Install dependencies:
go get github.com/gin-gonic/gin
go get github.com/fsnotify/fsnotify
go get github.com/smacker/go-tree-sitter
go get github.com/mattn/go-sqlite3- Build:
# Standard build
go build -o refinery cmd/refinery/main.go
# Optimized build (smaller binary)
go build -ldflags="-s -w" -o refinery cmd/refinery/main.gocoderefinery/
├── cmd/
│ └── refinery/
│ └── main.go
├── internal/
│ ├── config/
│ │ └── config.go
│ ├── domain/
│ │ └── models.go
│ ├── embedding/
│ │ ├── embedder.go
│ │ └── ollama.go
│ ├── indexer/
│ │ ├── indexer.go
│ │ ├── db.go
│ │ └── parser/
│ │ ├── parser.go
│ │ ├── universal.go
│ │ ├── treesitter.go
│ │ └── go.go
│ ├── search/
│ │ └── searcher.go
│ └── server/
│ ├── server.go
│ └── handlers.go
├── pkg/
│ └── mathutil/
│ └── vector.go
├── go.mod
├── go.sum
└── README.md
Default configuration in internal/config/config.go:
type Config struct {
Server ServerConfig
Ollama OllamaConfig
Indexer IndexerConfig
}
type ServerConfig struct {
Port string // "8080"
ReadTimeout time.Duration // 10s
WriteTimeout time.Duration // 10s
MaxRequestSize int64 // 10MB
EnableCORS bool // true
}
type OllamaConfig struct {
BaseURL string // "http://localhost:11434"
Model string // "nomic-embed-text"
Timeout time.Duration // 30s
}
type IndexerConfig struct {
SupportedExts map[string]string
ExcludePaths []string
MinChunkSize int // 50
MaxChunkSize int // 2000
WatchDebounce time.Duration // 2s
}Create config.json:
{
"project_path": "/path/to/project",
"server": {
"port": "8080"
},
"ollama": {
"base_url": "http://localhost:11434",
"model": "nomic-embed-text"
},
"indexer": {
"supported_extensions": {
".go": "go",
".py": "python",
".rs": "rust"
},
"exclude_paths": [
"node_modules",
".git",
"vendor"
]
}
}Load with:
./refinery serve --config config.json /path/to/project# Index current directory
./refinery serve
# Index specific project
./refinery serve /path/to/project
# With custom config
./refinery serve --config config.json --port 9000 /path/to/projectExpected Output:
Loading chunks from database...
Loaded 0 chunks from history
Scanning for files (Universal Mode)...
Loaded 15 patterns from .gitignore
Re-indexing 247 files (Dynamic Language Detection)...
Progress: 50/247
Progress: 100/247
...
Indexed 1,847 chunks from 247 files in 18.4s
File watcher enabled
Server running on http://localhost:8080
Health Check:
curl http://localhost:8080/healthResponse:
{
"status": "ok",
"chunks": 1847,
"files": 247
}Statistics:
curl http://localhost:8080/statsResponse:
{
"TotalChunks": 1847,
"TotalFiles": 247,
"Languages": {
"go": 1245,
"python": 312,
"rust": 290
},
"LastIndexed": "2025-01-04T15:23:45Z"
}Search:
curl -X POST http://localhost:8080/search \
-H "Content-Type: application/json" \
-d '{
"query": "database connection pool",
"limit": 5,
"min_score": 0.3,
"languages": ["go"],
"chunk_types": ["function"]
}'-
Open WebUI Settings → Functions → Add Function
-
Copy Python tool code (see artifacts)
-
Configure Valves:
- Docker:
REFINERY_URL = "http://host.docker.internal:8080" - Native:
REFINERY_URL = "http://localhost:8080" - Linux Docker:
REFINERY_URL = "http://172.17.0.1:8080"
- Docker:
-
Enable function and save
-
In chat, enable CodeRefinery tool
Basic Search:
"How is user authentication implemented?"
Filtered Search:
"Show me database queries in Go"
Architecture Questions:
"Explain the error handling strategy"
Code Location:
"Where is the JWT token validation?"
| Component | VRAM Usage |
|---|---|
| CodeRefinery Server | 0 MB (uses RAM) |
| nomic-embed-text | 274 MB |
| DeepSeek-R1 14B | 11.2 GB |
| Context Buffer | ~4 GB |
| Total | ~15.5 GB |
Reduce Chunk Size:
// internal/config/config.go
IndexerConfig{
MinChunkSize: 30, // from 50
MaxChunkSize: 1500, // from 2000
}Limit Results:
# Open WebUI Tool
class Valves:
DEFAULT_LIMIT = 3 # from 5
MAX_LIMIT = 10 # from 15Batch Size Tuning:
// internal/embedding/ollama.go
maxConcurrency := 3 // from 5 (less memory, slower)
maxConcurrency := 10 // from 5 (more memory, faster)Add custom language support in internal/indexer/parser/universal.go:
profiles["your-lang"] = LanguageProfile{
BlockStart: []string{"begin"},
BlockEnd: []string{"end"},
FunctionPatterns: []*regexp.Regexp{
regexp.MustCompile(`(?m)^\s*function\s+\w+`),
},
LineComment: []string{"//"},
}Extend internal/domain/models.go:
const (
ChunkTypeFunction ChunkType = "function"
ChunkTypeClass ChunkType = "class"
ChunkTypeInterface ChunkType = "interface"
ChunkTypeStruct ChunkType = "struct"
ChunkTypeEnum ChunkType = "enum" // NEW
ChunkTypeConstant ChunkType = "constant" // NEW
ChunkTypeGeneric ChunkType = "generic"
)Extend server to handle multiple codebases:
type RepositoryManager struct {
indices map[string]*indexer.Indexer
mu sync.RWMutex
}
func (rm *RepositoryManager) Search(repo string, query string) []SearchResult {
rm.mu.RLock()
idx := rm.indices[repo]
rm.mu.RUnlock()
return idx.Search(query)
}API Usage:
curl -X POST http://localhost:8080/search?repo=backend \
-d '{"query": "authentication"}'Store embeddings on disk to speed up restarts:
// internal/indexer/db.go
func (db *DB) SaveEmbeddings() error {
// Already implemented via SQLite
return nil
}
func (db *DB) LoadEmbeddings() (map[string][]CodeChunk, error) {
return db.LoadAllChunks()
}Problem: "Cannot connect to CodeRefinery server"
Diagnosis:
# Check if server is running
ps aux | grep refinery
# Check port availability
netstat -tuln | grep 8080
# Check server logs
./refinery serve # run in foregroundSolutions:
- Start server:
./refinery serve - Check firewall:
sudo ufw allow 8080 - Verify URL in Open WebUI tool matches server address
Problem: "Indexed 0 chunks"
Diagnosis:
# Check for supported files
find . -name "*.go" -o -name "*.py" -o -name "*.rs" | head -20
# Check excludes
grep -r "node_modules\|.git\|vendor" .gitignoreSolutions:
- Add file extensions to config
- Remove unnecessary excludes
- Check file permissions:
ls -la
Problem: "Failed to generate embeddings"
Diagnosis:
# Check Ollama status
ollama list
# Test embedding directly
curl http://localhost:11434/api/embeddings \
-d '{"model":"nomic-embed-text","prompt":"test"}'Solutions:
- Pull model:
ollama pull nomic-embed-text - Restart Ollama:
systemctl restart ollama - Check Ollama logs:
journalctl -u ollama -f
Problem: "host.docker.internal" not resolving
Linux Solution:
# Use bridge network IP
REFINERY_URL = "http://172.17.0.1:8080"Alternative: Use host networking
docker run --network host open-webui/open-webuiProblem: Search takes over 1 second
Diagnosis:
- Check chunk count:
curl http://localhost:8080/stats - Monitor VRAM:
nvidia-smi - Check Ollama response time
Solutions:
- Reduce indexed files (add excludes)
- Increase batch size for embeddings
- Use faster embedding model (mxbai-embed-large)
- Add indexes to SQLite DB
Effective Queries:
- "Show JWT token validation in auth module"
- "How is database connection pooling implemented?"
- "Find error handling in API handlers"
Ineffective Queries:
- "Show me code" (too vague)
- "Everything about users" (too broad)
- "Fix this" (no search context)
Start broad, then narrow:
User: "How does logging work?"
→ search_codebase("logging system")
User: "Focus on error logs"
→ search_codebase("error logging implementation")
User: "Show the log rotation logic"
→ search_codebase("log rotation", path_filter="logging/")
Combine filters for precision:
{
"query": "database queries",
"languages": ["go"],
"path_filter": "internal/db",
"chunk_types": ["function", "method"]
}Track system health:
# Watch VRAM usage
watch -n 1 'nvidia-smi --query-gpu=memory.used --format=csv'
# Monitor search latency
curl -w "@curl-format.txt" -X POST http://localhost:8080/search
# Check index freshness
curl http://localhost:8080/stats | jq '.LastIndexed'- Firewall Configuration:
# Allow only local connections
sudo ufw deny 8080
sudo ufw allow from 127.0.0.1 to any port 8080- Reverse Proxy (Production):
server {
listen 443 ssl;
server_name coderefinery.internal;
location / {
proxy_pass http://localhost:8080;
proxy_set_header Host $host;
}
}- Code embeddings are stored locally in SQLite
- No data sent to external services
- Ollama runs entirely on-premises
- All processing happens in local RAM/VRAM
Implement authentication in production:
// internal/server/middleware.go
func AuthMiddleware() gin.HandlerFunc {
return func(c *gin.Context) {
token := c.GetHeader("Authorization")
if !validateToken(token) {
c.AbortWithStatus(401)
return
}
c.Next()
}
}- Fork repository
- Create feature branch:
git checkout -b feature/new-parser - Install development tools:
go install golang.org/x/tools/cmd/goimports@latest
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latestRun before committing:
# Format code
gofmt -w .
goimports -w .
# Lint
golangci-lint run
# Test
go test ./...- Add to
LanguageProfileinuniversal.go - Test with sample code
- Update documentation
- Submit PR with examples
MIT License - See LICENSE file for details
- Tree-sitter for AST parsing
- Ollama for local embeddings
- Gin framework for HTTP server
- Open WebUI for LLM interface
- GitHub Issues: Report bugs and feature requests
- Documentation: This README and inline code comments
- Health endpoint: Monitor server status at
/health