# VectrixDB Custom Models

Use your own embedding models from HuggingFace or sentence-transformers.

## Using a HuggingFace Model

You can specify any sentence-transformers compatible model.

In [1]:
from vectrixdb import Vectrix

# Use a small, fast model from HuggingFace
db = Vectrix(
    "custom_model_test",
    model="sentence-transformers/all-MiniLM-L6-v2"  # 22M params, very fast
)

# Add some data
db.add([
    "The quick brown fox jumps over the lazy dog.",
    "A fast auburn canine leaps above a sleepy hound.",
    "Python is a popular programming language.",
    "Machine learning models need training data."
])

print(f"Documents added: {len(db)}")

  import pynvml  # type: ignore[import]


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Documents added: 4


In [2]:
# Search
results = db.search("fox jumping")

print("Search results:")
for r in results:
    print(f"  {r.score:.4f}: {r.text}")

Search results:
  0.0180: The quick brown fox jumps over the lazy dog.
  0.0081: A fast auburn canine leaps above a sleepy hound.
  0.0079: Python is a popular programming language.
  0.0078: Machine learning models need training data.


## Model Comparison

Compare results from different models on the same data.

In [3]:
# Test data
test_texts = [
    "Climate change affects global weather patterns.",
    "Rising temperatures cause ice caps to melt.",
    "Renewable energy reduces carbon emissions.",
    "Solar panels convert sunlight to electricity."
]

query = "global warming impact"

# Default model (e5-small English)
db_default = Vectrix("compare_default", language="en")
db_default.add(test_texts)
results_default = db_default.search(query)

print(f"Query: '{query}'\n")
print("Default model (e5-small):")
print(f"  Top match: {results_default.top.text}\n")

# MiniLM model
db_minilm = Vectrix("compare_minilm", model="sentence-transformers/all-MiniLM-L6-v2")
db_minilm.add(test_texts)
results_minilm = db_minilm.search(query)

print("MiniLM model:")
print(f"  Top match: {results_minilm.top.text}")

Query: 'global warming impact'

Default model (e5-small):
  Top match: Climate change affects global weather patterns.

MiniLM model:
  Top match: Climate change affects global weather patterns.


## Available Small Models

Some recommended small models for testing:
- `sentence-transformers/all-MiniLM-L6-v2` - 22M params, English
- `sentence-transformers/paraphrase-MiniLM-L3-v2` - 17M params, English
- `sentence-transformers/all-MiniLM-L12-v2` - 33M params, English

In [4]:
# Test with paraphrase model
db_para = Vectrix(
    "paraphrase_test",
    model="sentence-transformers/paraphrase-MiniLM-L3-v2"
)

db_para.add([
    "I love programming in Python.",
    "Python coding is my favorite activity.",
    "JavaScript is used for web development.",
    "The weather is nice today."
])

results = db_para.search("I enjoy writing Python code")
print("Paraphrase model results:")
for r in results:
    print(f"  {r.score:.4f}: {r.text}")

modules.json:   0%|          | 0.00/229 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

README.md: 0.00B [00:00, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/69.6M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/314 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Paraphrase model results:
  0.0179: I love programming in Python.
  0.0179: Python coding is my favorite activity.
  0.0079: JavaScript is used for web development.
  0.0078: The weather is nice today.


## Cleanup

In [5]:
import shutil, os

for folder in ["custom_model_test", "compare_default", "compare_minilm", "paraphrase_test"]:
    if os.path.exists(folder):
        shutil.rmtree(folder)
        print(f"Deleted {folder}")