classifier - RAG-based Topic Classifier

A Retrieval-Augmented Classification system implemented in Go using:

YZMA (hybridgroup/yzma) for local embedding generation via llama.cpp
PostgreSQL + pgvector as the vector database for similarity search
MCP (Model Context Protocol) for AI assistant integration
OpenAI-Compatible API for seamless integration with LLM frameworks

Use Case

Instead of fine-tuning a model for classification, this project uses a RAG approach to classify user queries into topic names. This allows for dynamic addition or modification of query-to-topic mappings without retraining.

For a given user query, the system creates an embedding, searches for the top match in the classifier table using cosine similarity, and returns the associated topic. If no match meets the threshold or the database is empty, it returns a configurable default_topic (default: "none").

Features

Local embedding generation using any GGUF embedding model
Vector similarity search using PostgreSQL's pgvector and HNSW indexing
OpenAI-compatible /v1/chat/completions endpoint for classification
MCP server with stdio transport for integration with AI assistants (Claude, Amp, etc.)
JSONL-based export and import for easy data portability
Handcrafted CLI for efficient management of classification entries

Prerequisites

Go 1.25+
PostgreSQL with the pgvector extension installed
llama.cpp library — download or build, then set YZMA_LIB
GGUF embedding model — e.g., T5Gemma 2-270M (Recommended for 640-dim vectors)

Installing llama.cpp

# Using yzma CLI to download
go install github.com/hybridgroup/yzma/cmd/yzma@latest
yzma lib install

# Set the library path
export YZMA_LIB=/path/to/libllama.so    # Linux
export YZMA_LIB=/path/to/libllama.dylib  # macOS

Build & Run

# Build the binary (no CGo required)
go build -o classifier .

# Run the classifier, both OpenAI API and MCP Server (stdio) 
./classifier serve

Usage

Add Classification Entries

./classifier add billing "How can I pay my bill?"
./classifier add billing "I have a question about my invoice"
./classifier add technical "My internet is not working"
./classifier add technical "I am experiencing slow speeds"

Classify a Query

./classifier query "I want to pay my monthly subscription"

Output:

Classified Topic: billing (Score: 0.9123)

Export & Import (JSONL)

# Export all entries to a directory (grouped by topic)
./classifier export ./backup

# Import entries from a directory
./classifier import ./data

OpenAI-Compatible API

The classifier provides an endpoint at http://localhost:8080/v1/chat/completions. It takes the last message's content as the query and returns the classified topic as the assistant's message.

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "classifier",
    "messages": [{"role": "user", "content": "My router is blinking red"}]
  }'

Configuration

classifier supports configuration via config.yaml, environment variables, and command-line flags.

Configuration File (`config.yaml`)

model: "./models/nomic-embed-text-v1.5.Q8_0.gguf"
lib_path: "/usr/local/lib/libllama.so"
database_url: "postgres://postgres@localhost:5432/postgres?sslmode=disable"
default_topic: "none"
context_size: 512
batch_size: 512
verbose: false
server:
  port: "8080"
  transport: "stdio"

Environment Variables

Variable	Description	Default
`CLASSIFIER_MODEL`	Path to GGUF embedding model	—
`YZMA_LIB`	Path to llama.cpp library	—
`CLASSIFIER_DATABASE_URL`	PostgreSQL connection string	`postgres://...`
`CLASSIFIER_DEFAULT_TOPIC`	Topic returned when no match found	`none`
`CLASSIFIER_CONTEXT_SIZE`	Context size for embeddings	`512`
`CLASSIFIER_BATCH_SIZE`	Batch size for processing	`512`
`CLASSIFIER_VERBOSE`	Enable verbose logging	`false`

Project Structure

.
├── api.go           # OpenAI-compatible HTTP API
├── mcp_server.go    # MCP server implementation
├── rag.go           # RAG core (PostgreSQL + pgvector + YZMA)
├── main.go          # CLI entry point and flag parsing
├── config.go        # Configuration management
├── command.go       # Handcrafted command registry
├── cmd_add.go       # Add entry command
├── cmd_query.go     # Query/Classify command
├── cmd_export.go    # JSONL Export command
├── cmd_import.go    # JSONL Import command
└── cmd_serve.go     # Start API and MCP servers

How It Works

Automatic Prefixing: To optimize performance for instruction-tuned models like T5Gemma, the system automatically prepends the classification: prefix to any text before generating its embedding (if not already present). This ensures the model weights semantic features correctly for classification tasks while keeping your stored content clean.
Embedding Generation: The text (with prefix) is tokenized and passed through the local GGUF model via YZMA/llama.cpp to produce a high-dimensional vector.
Vector Storage: The original content, topic, and normalized embedding are stored in PostgreSQL.
Similarity Search: For classification queries, the query text is embedded with the same prefix and compared against stored vectors using cosine similarity (1 - (embedding <=> $1)).
Classification: The system returns the topic with the highest similarity score, or the default_topic if no matches are found.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.agents/skills/classifier		.agents/skills/classifier
.gitignore		.gitignore
LICENSE		LICENSE
MODEL.md		MODEL.md
README.md		README.md
api.go		api.go
cmd_add.go		cmd_add.go
cmd_delete.go		cmd_delete.go
cmd_export.go		cmd_export.go
cmd_import.go		cmd_import.go
cmd_list.go		cmd_list.go
cmd_query.go		cmd_query.go
cmd_serve.go		cmd_serve.go
cmd_test.go		cmd_test.go
command.go		command.go
command_test.go		command_test.go
config.go		config.go
config.yaml		config.yaml
config_test.go		config_test.go
doc.go		doc.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
mcp_server.go		mcp_server.go
rag.go		rag.go
rag_test.go		rag_test.go
readpdf.go		readpdf.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

classifier - RAG-based Topic Classifier

Use Case

Features

Prerequisites

Installing llama.cpp

Build & Run

Usage

Add Classification Entries

Classify a Query

Export & Import (JSONL)

OpenAI-Compatible API

Configuration

Configuration File (`config.yaml`)

Environment Variables

Project Structure

How It Works

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

classifier - RAG-based Topic Classifier

Use Case

Features

Prerequisites

Installing llama.cpp

Build & Run

Usage

Add Classification Entries

Classify a Query

Export & Import (JSONL)

OpenAI-Compatible API

Configuration

Configuration File (config.yaml)

Environment Variables

Project Structure

How It Works

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Configuration File (`config.yaml`)

Packages