# **Text Embeddings with GGUF Models Using Llama.cpp**

This notebook demonstrates how to generate text embeddings using a locally stored GGUF model with the `llama-cpp-python` library. It uses IBM’s `granite-embedding-30m-english` model as an example, which can be replaced with any other Llama.cpp-compatible GGUF embedding model from Hugging Face. The notebook walks through loading the model, generating embeddings from input text, and previewing the results—all in a clean, beginner-friendly workflow.


In [1]:
from llama_cpp import Llama

## Load the Embedding Model from Local File
- Example: IBM's `granite-embedding-30m-english` model in GGUF format
- Pre-downloaded from Hugging Face and saved in the `models/` directory
- You can swap in any GGUF text embedding model from Hugging Face
- See this [guide to finding a compatible GGUF model](https://shaikhonai.substack.com/i/162148895/select-and-download-a-gguf-model)

In [7]:
embedding_model = Llama(model_path="models/granite-embedding-30m-english-Q6_K.gguf", embedding=True, verbose=False)

## Embedding a single text input

In [None]:
input_text = "Paris is known for the Eiffel Tower."
embedding = embedding_model.create_embedding(input_text)
vector = embedding['data'][0]['embedding']

print(f"Text: {input_text}")
print("Embedding (partial):", vector[:12], "...")


Text: Paris is known for the Eiffel Tower.
Embedding partial: [-1.3312551975250244, 1.8344782590866089, 1.8325539827346802, 0.19754132628440857, -0.3855469822883606, -2.0754051208496094, -0.12226647138595581, -2.131216287612915, 0.5409421920776367, -0.7406637072563171, 0.04207378625869751, -2.238772392272949] ...


## Embedding Multiple Text Inputs

In [16]:
input_texts = [
    "Paris is known for the Eiffel Tower.",
    "Machine learning helps computers learn from data."
]

embeddings = embedding_model.create_embedding(input_texts)

for text, item in zip(input_texts, embeddings["data"]):
    print(f"Text: {text}")
    print("Embedding (partial):", item["embedding"][:12], "...\n")

Text: Paris is known for the Eiffel Tower.
Embedding (partial): [-1.3312551975250244, 1.8344782590866089, 1.8325539827346802, 0.19754132628440857, -0.3855469822883606, -2.0754051208496094, -0.12226647138595581, -2.131216287612915, 0.5409421920776367, -0.7406637072563171, 0.04207378625869751, -2.238772392272949] ...

Text: Machine learning helps computers learn from data.
Embedding (partial): [-0.6378023624420166, 0.6769125461578369, 1.4247599840164185, 2.2569093704223633, -0.4022442102432251, 0.6476706266403198, 0.6126803159713745, -0.08391742408275604, 1.828855276107788, 0.36864525079727173, 1.2115947008132935, -0.962933361530304] ...

