# 📚 VectorVet Demo Notebook  
  
This notebook demonstrates embedding generation, loading embeddings, computing metrics, and summarizing results using the custom `VectorVet` toolkit. 

---    
## 🔧 Setup and Imports  

In [1]:
# Standard Libraries  
import sys  
from pathlib import Path  
  
# Data Science Libraries  
import numpy as np  
import pandas as pd  
from tqdm.auto import tqdm  
  
# Text Processing  
from sklearn.datasets import fetch_20newsgroups  
from langchain.text_splitter import RecursiveCharacterTextSplitter  
  
# Embedding Generation  
from llama_cpp import Llama  
  
# Custom VectorVet Modules  
PROJECT_ROOT = Path.cwd().parent  
sys.path.append(str(PROJECT_ROOT))  
  
from vectorvet.core.loader import load_multiple_embeddings  
from vectorvet.core.metrics import run_all_metrics  
from vectorvet.core.summarizer import summarize_to_dataframe  
from vectorvet.core.utils import timer  
  
# Display Configuration  
pd.set_option("display.max_columns", None) 

---  
  
## 🗃️ Load and Chunk the Dataset  
  
We'll use the well-known 20 Newsgroups dataset for this demo. We'll chunk the texts into manageable pieces.  

In [2]:
# Fetch the 20 Newsgroups dataset  
news = fetch_20newsgroups(subset="train", remove=("headers", "footers", "quotes"))  
texts = [t for t in news.data if t.strip()]  
  
# Initialize a text splitter  
splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=50)  
  
# Chunk the dataset  
chunked_data = []  
for idx, text in enumerate(tqdm(texts, desc="Chunking texts")):  
    chunks = splitter.split_text(text)  
    for chunk_idx, chunk in enumerate(chunks):  
        chunked_data.append({  
            "original_index": idx,  
            "chunk_index": chunk_idx,  
            "chunk": chunk  
        })  
  
# Create DataFrame of chunks  
chunked_df = pd.DataFrame(chunked_data)  
chunk_texts = chunked_df["chunk"].tolist()  
  
print(f"✅ Total chunks created: {len(chunk_texts)}")

Chunking texts:   0%|          | 0/11014 [00:00<?, ?it/s]

✅ Total chunks created: 21975


In [3]:
chunk_texts = chunk_texts[:1000]

---  
  
## 📌 Generate Embeddings for Each Model  
  
We'll generate embeddings using various LLM models and save them for further analysis.  

In [4]:
# Define paths  
MODEL_DIR = PROJECT_ROOT / "models"  
EMB_DIR = PROJECT_ROOT / "embeddings"  
EMB_DIR.mkdir(exist_ok=True, parents=True)  
  
# List of models to embed with  
MODELS = [  
    "Phi-3-mini-4k-instruct-q4.gguf",  
    "Llama-3.2-1B-Instruct.Q6_K.gguf",  
    "Llama-3.1-8b-instruct-q6_k.gguf",  
    "phi-2.Q6_K.gguf",  
]  
  
# Generate embeddings  
for fname in MODELS:  
    model_path = MODEL_DIR / fname  
    model_name = model_path.stem  
    out_file = EMB_DIR / f"{model_name}_20news_chunks.npy"  
  
    if out_file.exists():  
        print(f"✔ {out_file.name} already exists – skipping")  
        continue  
  
    print(f"→ Embedding with {model_name} …")  
    llm = Llama(  
        model_path=str(model_path),  
        n_gpu_layers=-1,  
        embedding=True,  
    )  
  
    embs = np.zeros((len(chunk_texts), llm.n_embd()), dtype=np.float32)  
  
    with timer(f"Embedding generation for {model_name}"):  
        for i, txt in enumerate(tqdm(chunk_texts, desc=f"Embedding ({model_name})")):  
            emb = llm.embed(txt)  
            emb = np.array(emb)  
  
            if emb.ndim > 1:  
                emb = emb.mean(axis=0)  # Average if needed  
  
            emb = emb.flatten()  
  
            if emb.shape[0] != llm.n_embd():  
                print(f"Warning: Skipping text {i} due to embedding size mismatch: {emb.shape}")  
                continue  
  
            embs[i] = emb  
  
    np.save(out_file, embs)  
    print(f"✔ Saved embeddings to {out_file}")  

→ Embedding with Phi-3-mini-4k-instruct-q4 …


ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA RTX A3000 12GB Laptop GPU, compute capability 8.6, VMM: yes
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA RTX A3000 12GB Laptop GPU) - 11230 MiB free
llama_model_loader: loaded meta data with 24 key-value pairs and 195 tensors from /workspaces/VectorVet/models/Phi-3-mini-4k-instruct-q4.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = phi3
llama_model_loader: - kv   1:                               general.name str              = Phi3
llama_model_loader: - kv   2:                        phi3.context_length u32              = 4096
llama_model_loader: - kv   3:                      phi3.embedding_length u32              = 3072
llama_model_loader: - kv   4:         

Embedding (Phi-3-mini-4k-instruct-q4):   0%|          | 0/1000 [00:00<?, ?it/s]

llama_perf_context_print:        load time =    2423.91 ms
llama_perf_context_print: prompt eval time =    2422.42 ms /   136 tokens (   17.81 ms per token,    56.14 tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =    2429.73 ms /   137 tokens
llama_perf_context_print:        load time =    2423.91 ms
llama_perf_context_print: prompt eval time =     104.45 ms /   134 tokens (    0.78 ms per token,  1282.95 tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =     111.73 ms /   135 tokens
llama_perf_context_print:        load time =    2423.91 ms
llama_perf_context_print: prompt eval time =     130.44 ms /   216 tokens (    0.60 ms per token,  1655.87 tokens per second)
llama_perf_context_print:        eval time = 

[Embedding generation for Phi-3-mini-4k-instruct-q4] 179.04s
✔ Saved embeddings to /workspaces/VectorVet/embeddings/Phi-3-mini-4k-instruct-q4_20news_chunks.npy
→ Embedding with Llama-3.2-1B-Instruct.Q6_K …


llama_model_loader: loaded meta data with 35 key-value pairs and 147 tensors from /workspaces/VectorVet/models/Llama-3.2-1B-Instruct.Q6_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Models Meta Llama Llama 3.2 1B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = models-meta-llama-Llama-3.2
llama_model_loader: - kv   5:                         general.size_label str              = 1B
llama_model_loader: - kv   6:                            general.license str            

Embedding (Llama-3.2-1B-Instruct.Q6_K):   0%|          | 0/1000 [00:00<?, ?it/s]

llama_perf_context_print:        load time =      80.02 ms
llama_perf_context_print: prompt eval time =      79.33 ms /   122 tokens (    0.65 ms per token,  1537.98 tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =      82.35 ms /   123 tokens
llama_perf_context_print:        load time =      80.02 ms
llama_perf_context_print: prompt eval time =      31.45 ms /   116 tokens (    0.27 ms per token,  3688.63 tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =      34.27 ms /   117 tokens
llama_perf_context_print:        load time =      80.02 ms
llama_perf_context_print: prompt eval time =      39.23 ms /   182 tokens (    0.22 ms per token,  4639.78 tokens per second)
llama_perf_context_print:        eval time = 

[Embedding generation for Llama-3.2-1B-Instruct.Q6_K] 62.73s
✔ Saved embeddings to /workspaces/VectorVet/embeddings/Llama-3.2-1B-Instruct.Q6_K_20news_chunks.npy
→ Embedding with Llama-3.1-8b-instruct-q6_k …


llama_model_loader: loaded meta data with 33 key-value pairs and 292 tensors from /workspaces/VectorVet/models/Llama-3.1-8b-instruct-q6_k.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Meta Llama 3.1 8B Instruct
llama_model_loader: - kv   3:                           general.finetune str              = Instruct
llama_model_loader: - kv   4:                           general.basename str              = Meta-Llama-3.1
llama_model_loader: - kv   5:                         general.size_label str              = 8B
llama_model_loader: - kv   6:                            general.license str              = llama3.1
llama_model_l

Embedding (Llama-3.1-8b-instruct-q6_k):   0%|          | 0/1000 [00:00<?, ?it/s]

llama_perf_context_print:        load time =     165.04 ms
llama_perf_context_print: prompt eval time =     164.24 ms /   122 tokens (    1.35 ms per token,   742.81 tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =     170.62 ms /   123 tokens
llama_perf_context_print:        load time =     165.04 ms
llama_perf_context_print: prompt eval time =     156.21 ms /   116 tokens (    1.35 ms per token,   742.58 tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =     161.66 ms /   117 tokens
llama_perf_context_print:        load time =     165.04 ms
llama_perf_context_print: prompt eval time =     198.29 ms /   182 tokens (    1.09 ms per token,   917.84 tokens per second)
llama_perf_context_print:        eval time = 

[Embedding generation for Llama-3.1-8b-instruct-q6_k] 306.61s
✔ Saved embeddings to /workspaces/VectorVet/embeddings/Llama-3.1-8b-instruct-q6_k_20news_chunks.npy
→ Embedding with phi-2.Q6_K …


llama_model_loader: loaded meta data with 20 key-value pairs and 325 tensors from /workspaces/VectorVet/models/phi-2.Q6_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = phi2
llama_model_loader: - kv   1:                               general.name str              = Phi2
llama_model_loader: - kv   2:                        phi2.context_length u32              = 2048
llama_model_loader: - kv   3:                      phi2.embedding_length u32              = 2560
llama_model_loader: - kv   4:                   phi2.feed_forward_length u32              = 10240
llama_model_loader: - kv   5:                           phi2.block_count u32              = 32
llama_model_loader: - kv   6:                  phi2.attention.head_count u32              = 32
llama_model_loader: - kv   7:               phi2.attention.head_count_kv

Embedding (phi-2.Q6_K):   0%|          | 0/1000 [00:00<?, ?it/s]

llama_perf_context_print:        load time =      74.78 ms
llama_perf_context_print: prompt eval time =      73.28 ms /   126 tokens (    0.58 ms per token,  1719.48 tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =      78.29 ms /   127 tokens
llama_perf_context_print:        load time =      74.78 ms
llama_perf_context_print: prompt eval time =      65.92 ms /   120 tokens (    0.55 ms per token,  1820.42 tokens per second)
llama_perf_context_print:        eval time =       0.00 ms /     1 runs   (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:       total time =      70.54 ms /   121 tokens
llama_perf_context_print:        load time =      74.78 ms
llama_perf_context_print: prompt eval time =      92.15 ms /   196 tokens (    0.47 ms per token,  2126.94 tokens per second)
llama_perf_context_print:        eval time = 

[Embedding generation for phi-2.Q6_K] 165.17s
✔ Saved embeddings to /workspaces/VectorVet/embeddings/phi-2.Q6_K_20news_chunks.npy


---  
  
## 📊 Compute Metrics and Summarize Results  
  
Now let's load the embeddings we've saved, compute intrinsic embedding metrics, and summarize results into a tidy dataframe.  

In [5]:
# Find saved embeddings  
files = {  
    p.stem.split("_20news_chunks")[0]: str(p)  
    for p in EMB_DIR.glob("*_20news_chunks.npy")  
}  
  
print("🗂️ Embedding sets detected:", list(files.keys()))  
  
# Load embeddings  
embs = load_multiple_embeddings(files)  
  
# Compute all metrics  
results = {}  
for name, mat in embs.items():  
    print(f"\n📐 Running metrics for: {name}")  
    with timer(f"Metrics computation for {name}"):  
        results[name] = run_all_metrics(mat)  
  
# Summarize results into DataFrame  
summary_df = summarize_to_dataframe(results)  
  
# Display summarized metrics nicely  
summary_df.style.format(precision=3) 

🗂️ Embedding sets detected: ['Llama-3.1-8b-instruct-q6_k', 'Llama-3.2-1B-Instruct.Q6_K', 'phi-2.Q6_K', 'Phi-3-mini-4k-instruct-q4']

📐 Running metrics for: Llama-3.1-8b-instruct-q6_k
Calculating Isotropy...
Calculating Hubness...
Calculating Clustering Quality...
Calculating Pairwise Cosine Similarity...
[Metrics computation for Llama-3.1-8b-instruct-q6_k] 1.04s

📐 Running metrics for: Llama-3.2-1B-Instruct.Q6_K
Calculating Isotropy...
Calculating Hubness...
Calculating Clustering Quality...
Calculating Pairwise Cosine Similarity...
[Metrics computation for Llama-3.2-1B-Instruct.Q6_K] 0.64s

📐 Running metrics for: phi-2.Q6_K
Calculating Isotropy...
Calculating Hubness...
Calculating Clustering Quality...
Calculating Pairwise Cosine Similarity...
[Metrics computation for phi-2.Q6_K] 0.57s

📐 Running metrics for: Phi-3-mini-4k-instruct-q4
Calculating Isotropy...
Calculating Hubness...
Calculating Clustering Quality...
Calculating Pairwise Cosine Similarity...
[Metrics computation for Phi

Unnamed: 0,IsoScore,skewness,robin_hood,antihub_rate,silhouette,davies_bouldin,cos_mean,cos_std
Llama-3.1-8b-instruct-q6_k,0.003,1.741,0.272,0.0,0.177,2.752,0.449,0.224
Llama-3.2-1B-Instruct.Q6_K,0.001,2.651,0.325,0.0,0.141,2.658,0.653,0.219
Phi-3-mini-4k-instruct-q4,0.001,3.406,0.358,0.0,0.086,2.869,0.731,0.183
phi-2.Q6_K,0.001,2.291,0.323,0.0,0.118,2.546,0.635,0.213


---  
  
## ✅ Final Results  
  
The resulting table summarizes the embedding quality across all models, making it easy to compare and interpret metrics like isotropy, hubness, clustering quality, and pairwise cosine similarity.  
  
🎉 **You're all set!**  