# Embedding Generation and Vector Indexing (FAISS)

This notebook demonstrates how to generate embeddings using OpenCLIP and build a vector index using FAISS for efficient image similarity search.

### Generate Embeddings (OpenCLIP)

In [2]:
from open_clip import create_model_and_transforms, tokenize
import torch
from PIL import Image
import os

# Load model
model, _, preprocess = create_model_and_transforms('ViT-B-32', pretrained='openai')
tokenizer = tokenize

device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device).eval()

# Load and preprocess image
image_path = "../data/processed/10001.jpg"
image = preprocess(Image.open(image_path)).unsqueeze(0).to(device)

# Encode image
with torch.no_grad():
    image_embedding = model.encode_image(image)
    image_embedding /= image_embedding.norm(dim=-1, keepdim=True)

image_embedding.shape

torch.Size([1, 512])

### Build FAISS Index

In [3]:
import faiss
import numpy as np

# Let's assume we already have a matrix of embeddings
# Simulate with 100 image embeddings
num_images = 100
dim = image_embedding.shape[-1]
image_embeddings = np.random.randn(num_images, dim).astype("float32")
image_embeddings = image_embeddings / np.linalg.norm(image_embeddings, axis=1, keepdims=True)

# Build FAISS index
index = faiss.IndexFlatIP(dim)  # Inner product (cosine similarity)
index.add(image_embeddings)

print("FAISS index contains", index.ntotal, "vectors")

FAISS index contains 100 vectors


### Query with Text

In [4]:
# Encode a text query
text = "a photo of a mountain landscape"
text_tokens = tokenizer([text]).to(device)

with torch.no_grad():
    text_embedding = model.encode_text(text_tokens)
    text_embedding /= text_embedding.norm(dim=-1, keepdim=True)

text_embedding_np = text_embedding.cpu().numpy().astype("float32")

In [5]:
# Search top 5 matches
k = 5
D, I = index.search(text_embedding_np, k)

print("Top indices:", I)
print("Scores:", D)

Top indices: [[20 62  6 29 31]]
Scores: [[0.13009575 0.11412328 0.10218724 0.10147056 0.09993938]]


In [6]:
# Map retrieved indices back to image metadata
image_paths = [f"data/images/image_{i}.jpg" for i in range(num_images)]

top_results = [image_paths[i] for i in I[0]]
top_results

['data/images/image_20.jpg',
 'data/images/image_62.jpg',
 'data/images/image_6.jpg',
 'data/images/image_29.jpg',
 'data/images/image_31.jpg']

## Summary

- Used OpenCLIP to encode images and queries into the same embedding space.
- Built a FAISS index using cosine similarity (inner product).
- Performed efficient top-K image retrieval for a text query.