Skip to content

elcuervo/emb

Repository files navigation

emb

GitHub Release Docker Hub emb gem emb-server gem

redis-cli EMB minilm "hello world"
→ \x7c\x8e\x80\xbd...   (384 float32s × 4 bytes)

Install

curl -fsSL https://github.com/elcuervo/emb/raw/main/install.sh | sh

Installs to /usr/local/bin. Set EMB_INSTALL_DIR to change the target:

curl -fsSL https://github.com/elcuervo/emb/raw/main/install.sh | EMB_INSTALL_DIR=~/.local/bin sh

Platforms: macOS (Apple Silicon), Linux (amd64, arm64).

Quick start

# Auto-downloads a model from HuggingFace and starts the server
emb -model-repo Xenova/all-MiniLM-L6-v2

# In another terminal:
redis-cli EMB minilm "hello world"\x7c\x8e\x80\xbd...   (384 float32s × 4 bytes)

Features

  • Redis protocol: any Redis client works (redis-cli, redis-py, redis-rb, etc.)
  • ONNX Runtime: fast CPU/GPU inference via CGo bindings
  • HuggingFace integration: auto-download models and auto-detect dim, max_length, output tensor, pooling strategy from ONNX graph + config.json
  • Multi-model queries: EMB.MULTI calls different models in one command (MGET-style partial failures)

Quick start

One-liner (no config file)

# Auto-downloads a model from HuggingFace and starts the server
emb -model-repo Xenova/all-MiniLM-L6-v2

# In another terminal:
redis-cli EMB model "hello world"

Two models inline

emb \
  -model minilm -model-onnx ./models/minilm/model.onnx -model-tokenizer ./models/minilm/tokenizer.json \
  -model bge   -model-repo Xenova/bge-small-en-v1.5

redis-cli EMB.MULTI minilm "hello" bge "world"

Local development (with config file)

# Download a model from HuggingFace
just download-model

# Start the server
just dev

# In another terminal:
redis-cli EMB minilm "hello world"

Commands

Command Description
EMB <model> <text> [text...] Embed one or more texts. Single text → bulk string, multiple → array of bulk strings
EMB.MODELS List loaded models with dimensions and status
EMB.INFO <model> Model details: dim, workers, requests served, avg latency
EMB.STATS Server statistics: uptime, total requests, per-model breakdown
EMB.MULTI <model> <text> [<model> <text>...] Embed texts across different models in one call
EMB.HELP Command reference
PING PONG

EMB.MULTI example

redis-cli EMB.MULTI minilm "hello" siglip2 "a photo of a cat"
1) \x7c\x8e\x80\xbd...   (minilm, 384 floats)
2) \x4a\x9f\x31\xc2...   (siglip2, 768 floats)

Configuration

listen: ":6379"

models:
  minilm:
    onnx: ./models/minilm/model.onnx

  siglip2:
    onnx: ./models/siglip2/text_model.onnx
    tokenizer: ./models/siglip2/tokenizer.json
    output_tensor: pooler_output
    pooling: none
    normalize: true
    dim: 768

  # Auto-download from HuggingFace
  e5:
    model_repo: intfloat/e5-small-v2
    pooling: none
    normalize: false

Model options

Field Default Description
onnx Path to ONNX model file
tokenizer <model-dir>/tokenizer.json Path to HuggingFace tokenizer JSON
model_repo HuggingFace repo (auto-downloads ONNX + tokenizer)
dim auto-detected Embedding dimension
max_length auto-detected (or 512) Max token sequence length
pooling auto-detected mean (3D output) or none (2D pre-pooled)
normalize false L2-normalize the output
output_tensor auto-detected ONNX output tensor name
preload false Load model at startup instead of on first request
pad_output false Pad sequences to max_length with trailing zeros (compatibility with legacy implementations that don't pass attention mask)
workers auto-tuned Number of worker goroutines
batching {timeout: 1, max_batch: 32} Smart batching settings (set timeout: 0 to disable)

Clients

The response is raw little-endian float32 bytes. Any Redis client works.

Ruby:

require "redis_client"

redis = RedisClient.new(port: 6379)
raw = redis.call("EMB", "minilm", "hello world")
emb = raw.unpack("e*")

Or use the emb gem:

require "emb"

Emb[:minilm]["hello world"]
# => [0.0123, -0.0456, 0.0789, ...]

Python:

import struct
raw = redis.execute_command("EMB", "minilm", "hello world")
emb = list(struct.unpack(f"<{len(raw)//4}f", raw))

Go:

var vec []float32
binary.Read(bytes.NewReader(raw), binary.LittleEndian, &vec)

Ruby Gems

Ruby gems for emb:

  • emb — Client library with connection pooling, proxy, and multi-model support. Auto-decodes float32 responses. README

  • emb-server — Precompiled server binary. Install and run emb directly. README

Development

Commands

just format          # Format all Go code
just lint            # Run linters
just test            # Run tests
just bench           # Run benchmarks
just build           # Build the emb binary
just dev             # Build and run the server
just download-model  # Download a model from HuggingFace

Nix

A flake.nix is provided for reproducible development shells:

nix develop

This provides Go, ONNX Runtime, golangci-lint, just, and all CGo configuration.

Docker

# Run with a model mounted:
docker run -v ./models:/models elcuervo/emb \
  -config /models/config.yaml

About

emb - text embedding generation server

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors