GTE-Small in Go

A pure Go implementation of the GTE-small text embedding model. Produces 384-dimensional, L2-normalized embeddings suitable for similarity search and clustering, directly ported from @antirez's C implementation.

Performance is not comparable to the C version, with embeddings generated ~3x slower due to Go's lack of low-level optimizations.

Quick Start

# Install deps for conversion
pip install safetensors requests

# Download Hugging Face weights and convert to .gtemodel
python - <<'PY'
import os, requests, pathlib
base = "https://huggingface.co/thenlper/gte-small/resolve/main"
files = ["config.json", "vocab.txt", "tokenizer_config.json", "special_tokens_map.json", "model.safetensors"]
out = pathlib.Path("models/gte-small")
out.mkdir(parents=True, exist_ok=True)
for name in files:
    url = f"{base}/{name}"
    path = out / name
    if path.exists():
        continue
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(path, "wb") as f:
            for chunk in r.iter_content(8192):
                if chunk:
                    f.write(chunk)
PY
python convert_model.py models/gte-small gte-small.gtemodel

# Run the demo
go run ./cmd/gte --model-path gte-small.gtemodel "I love cats" "I love dogs" "The stock market crashed"

# Or via make
make                # builds Go binaries and runs tests
make run-go         # runs the demo with sample sentences
make run-bench      # runs the single-model benchmark (reports ms/op and throughput)
make go-bench       # go test benchmark (ms/op_avg via go test)

Sample output:

Model loaded in 0.11 s
Embedding dimension: 384
Max sequence length: 512

Cosine similarity matrix:
       S1     S2     S3
S1:  1.000  0.898  0.727
S2:  0.898  1.000  0.722
S3:  0.727  0.722  1.000

Go API

import "github.com/rcarmo/gte-go/gte"

model, _ := gte.Load("gte-small.gtemodel")
defer model.Close()

emb, _ := model.Embed("Hello world")          // []float32 length 384, L2-normalized
embBatch, _ := model.EmbedBatch([]string{"hi", "there"})
sim, _ := gte.CosineSimilarity(embBatch[0], embBatch[1])

Model Format

.gtemodel is identical to the original C project: a binary header, vocabulary, and contiguous float32 weights. Use convert_model.py to export from Hugging Face weights.

Testing & Benchmarks

GTE_MODEL_PATH=gte-small.gtemodel go test ./...
GTE_MODEL_PATH=gte-small.gtemodel go test -bench=BenchmarkEmbed -benchmem ./gte
make run-bench   # convenient single-model benchmark with human-readable output

gte/gte_test.go embeds three reference sentences and checks cosine similarities within a small tolerance.
gte/bench_test.go reports per-embedding latency (ms/op_avg) via go test.
cmd/bench prints total calls, average ms per embedding (derived from total_time/total_calls), and throughput.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
cmd		cmd
gte		gte
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
convert_model.py		convert_model.py
go.mod		go.mod

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GTE-Small in Go

Quick Start

Go API

Model Format

Testing & Benchmarks

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

rcarmo/gte-go

Folders and files

Latest commit

History

Repository files navigation

GTE-Small in Go

Quick Start

Go API

Model Format

Testing & Benchmarks

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages