redis-cli EMB minilm "hello world"
→ \x7c\x8e\x80\xbd... (384 float32s × 4 bytes)
curl -fsSL https://github.com/elcuervo/emb/raw/main/install.sh | shInstalls to /usr/local/bin. Set EMB_INSTALL_DIR to change the target:
curl -fsSL https://github.com/elcuervo/emb/raw/main/install.sh | EMB_INSTALL_DIR=~/.local/bin shPlatforms: macOS (Apple Silicon), Linux (amd64, arm64).
# Auto-downloads a model from HuggingFace and starts the server
emb -model-repo Xenova/all-MiniLM-L6-v2
# In another terminal:
redis-cli EMB minilm "hello world"
→ \x7c\x8e\x80\xbd... (384 float32s × 4 bytes)- Redis protocol: any Redis client works (
redis-cli,redis-py,redis-rb, etc.) - ONNX Runtime: fast CPU/GPU inference via CGo bindings
- HuggingFace integration: auto-download models and auto-detect dim, max_length, output tensor, pooling strategy from ONNX graph + config.json
- Multi-model queries:
EMB.MULTIcalls different models in one command (MGET-style partial failures)
# Auto-downloads a model from HuggingFace and starts the server
emb -model-repo Xenova/all-MiniLM-L6-v2
# In another terminal:
redis-cli EMB model "hello world"emb \
-model minilm -model-onnx ./models/minilm/model.onnx -model-tokenizer ./models/minilm/tokenizer.json \
-model bge -model-repo Xenova/bge-small-en-v1.5
redis-cli EMB.MULTI minilm "hello" bge "world"# Download a model from HuggingFace
just download-model
# Start the server
just dev
# In another terminal:
redis-cli EMB minilm "hello world"| Command | Description |
|---|---|
EMB <model> <text> [text...] |
Embed one or more texts. Single text → bulk string, multiple → array of bulk strings |
EMB.MODELS |
List loaded models with dimensions and status |
EMB.INFO <model> |
Model details: dim, workers, requests served, avg latency |
EMB.STATS |
Server statistics: uptime, total requests, per-model breakdown |
EMB.MULTI <model> <text> [<model> <text>...] |
Embed texts across different models in one call |
EMB.HELP |
Command reference |
PING |
PONG |
redis-cli EMB.MULTI minilm "hello" siglip2 "a photo of a cat"
1) \x7c\x8e\x80\xbd... (minilm, 384 floats)
2) \x4a\x9f\x31\xc2... (siglip2, 768 floats)
listen: ":6379"
models:
minilm:
onnx: ./models/minilm/model.onnx
siglip2:
onnx: ./models/siglip2/text_model.onnx
tokenizer: ./models/siglip2/tokenizer.json
output_tensor: pooler_output
pooling: none
normalize: true
dim: 768
# Auto-download from HuggingFace
e5:
model_repo: intfloat/e5-small-v2
pooling: none
normalize: false| Field | Default | Description |
|---|---|---|
onnx |
— | Path to ONNX model file |
tokenizer |
<model-dir>/tokenizer.json |
Path to HuggingFace tokenizer JSON |
model_repo |
— | HuggingFace repo (auto-downloads ONNX + tokenizer) |
dim |
auto-detected | Embedding dimension |
max_length |
auto-detected (or 512) | Max token sequence length |
pooling |
auto-detected | mean (3D output) or none (2D pre-pooled) |
normalize |
false |
L2-normalize the output |
output_tensor |
auto-detected | ONNX output tensor name |
preload |
false |
Load model at startup instead of on first request |
pad_output |
false |
Pad sequences to max_length with trailing zeros (compatibility with legacy implementations that don't pass attention mask) |
workers |
auto-tuned | Number of worker goroutines |
batching |
{timeout: 1, max_batch: 32} |
Smart batching settings (set timeout: 0 to disable) |
The response is raw little-endian float32 bytes. Any Redis client works.
Ruby:
require "redis_client"
redis = RedisClient.new(port: 6379)
raw = redis.call("EMB", "minilm", "hello world")
emb = raw.unpack("e*")Or use the emb gem:
require "emb"
Emb[:minilm]["hello world"]
# => [0.0123, -0.0456, 0.0789, ...]Python:
import struct
raw = redis.execute_command("EMB", "minilm", "hello world")
emb = list(struct.unpack(f"<{len(raw)//4}f", raw))Go:
var vec []float32
binary.Read(bytes.NewReader(raw), binary.LittleEndian, &vec)Ruby gems for emb:
-
emb— Client library with connection pooling, proxy, and multi-model support. Auto-decodes float32 responses. README -
emb-server— Precompiled server binary. Install and runembdirectly. README
just format # Format all Go code
just lint # Run linters
just test # Run tests
just bench # Run benchmarks
just build # Build the emb binary
just dev # Build and run the server
just download-model # Download a model from HuggingFaceA flake.nix is provided for reproducible development shells:
nix developThis provides Go, ONNX Runtime, golangci-lint, just, and all CGo configuration.
# Run with a model mounted:
docker run -v ./models:/models elcuervo/emb \
-config /models/config.yaml