Skip to content

innomon/classifier-rag

Repository files navigation

classifier - RAG-based Topic Classifier

A Retrieval-Augmented Classification system implemented in Go using:

  • YZMA (hybridgroup/yzma) for local embedding generation via llama.cpp
  • PostgreSQL + pgvector as the vector database for similarity search
  • MCP (Model Context Protocol) for AI assistant integration
  • OpenAI-Compatible API for seamless integration with LLM frameworks

Use Case

Instead of fine-tuning a model for classification, this project uses a RAG approach to classify user queries into topic names. This allows for dynamic addition or modification of query-to-topic mappings without retraining.

For a given user query, the system creates an embedding, searches for the top match in the classifier table using cosine similarity, and returns the associated topic. If no match meets the threshold or the database is empty, it returns a configurable default_topic (default: "none").

Features

  • Local embedding generation using any GGUF embedding model
  • Vector similarity search using PostgreSQL's pgvector and HNSW indexing
  • OpenAI-compatible /v1/chat/completions endpoint for classification
  • MCP server with stdio transport for integration with AI assistants (Claude, Amp, etc.)
  • JSONL-based export and import for easy data portability
  • Handcrafted CLI for efficient management of classification entries

Prerequisites

  • Go 1.25+
  • PostgreSQL with the pgvector extension installed
  • llama.cpp library — download or build, then set YZMA_LIB
  • GGUF embedding model — e.g., T5Gemma 2-270M (Recommended for 640-dim vectors)

Installing llama.cpp

# Using yzma CLI to download
go install github.com/hybridgroup/yzma/cmd/yzma@latest
yzma lib install

# Set the library path
export YZMA_LIB=/path/to/libllama.so    # Linux
export YZMA_LIB=/path/to/libllama.dylib  # macOS

Build & Run

# Build the binary (no CGo required)
go build -o classifier .

# Run the classifier, both OpenAI API and MCP Server (stdio) 
./classifier serve

Usage

Add Classification Entries

./classifier add billing "How can I pay my bill?"
./classifier add billing "I have a question about my invoice"
./classifier add technical "My internet is not working"
./classifier add technical "I am experiencing slow speeds"

Classify a Query

./classifier query "I want to pay my monthly subscription"

Output:

Classified Topic: billing (Score: 0.9123)

Export & Import (JSONL)

# Export all entries to a directory (grouped by topic)
./classifier export ./backup

# Import entries from a directory
./classifier import ./data

OpenAI-Compatible API

The classifier provides an endpoint at http://localhost:8080/v1/chat/completions. It takes the last message's content as the query and returns the classified topic as the assistant's message.

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "classifier",
    "messages": [{"role": "user", "content": "My router is blinking red"}]
  }'

Configuration

classifier supports configuration via config.yaml, environment variables, and command-line flags.

Configuration File (config.yaml)

model: "./models/nomic-embed-text-v1.5.Q8_0.gguf"
lib_path: "/usr/local/lib/libllama.so"
database_url: "postgres://postgres@localhost:5432/postgres?sslmode=disable"
default_topic: "none"
context_size: 512
batch_size: 512
verbose: false
server:
  port: "8080"
  transport: "stdio"

Environment Variables

Variable Description Default
CLASSIFIER_MODEL Path to GGUF embedding model
YZMA_LIB Path to llama.cpp library
CLASSIFIER_DATABASE_URL PostgreSQL connection string postgres://...
CLASSIFIER_DEFAULT_TOPIC Topic returned when no match found none
CLASSIFIER_CONTEXT_SIZE Context size for embeddings 512
CLASSIFIER_BATCH_SIZE Batch size for processing 512
CLASSIFIER_VERBOSE Enable verbose logging false

Project Structure

.
├── api.go           # OpenAI-compatible HTTP API
├── mcp_server.go    # MCP server implementation
├── rag.go           # RAG core (PostgreSQL + pgvector + YZMA)
├── main.go          # CLI entry point and flag parsing
├── config.go        # Configuration management
├── command.go       # Handcrafted command registry
├── cmd_add.go       # Add entry command
├── cmd_query.go     # Query/Classify command
├── cmd_export.go    # JSONL Export command
├── cmd_import.go    # JSONL Import command
└── cmd_serve.go     # Start API and MCP servers

How It Works

  1. Automatic Prefixing: To optimize performance for instruction-tuned models like T5Gemma, the system automatically prepends the classification: prefix to any text before generating its embedding (if not already present). This ensures the model weights semantic features correctly for classification tasks while keeping your stored content clean.
  2. Embedding Generation: The text (with prefix) is tokenized and passed through the local GGUF model via YZMA/llama.cpp to produce a high-dimensional vector.
  3. Vector Storage: The original content, topic, and normalized embedding are stored in PostgreSQL.
  4. Similarity Search: For classification queries, the query text is embedded with the same prefix and compared against stored vectors using cosine similarity (1 - (embedding <=> $1)).
  5. Classification: The system returns the topic with the highest similarity score, or the default_topic if no matches are found.

License

MIT

About

RAG Classifier exposes both OpenAI API and MCP Server (stdio)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages