In [None]:
%pip install --quiet --upgrade torch torchvision pillow sentence-transformers faiss-cpu pandas

In [None]:
import urllib.request
import zipfile

url = "https://raw.githubusercontent.com/jsoma/dataharvest25-ai-images-video/main/cat.png"
urllib.request.urlretrieve(url, "cat.png")

url = "https://raw.githubusercontent.com/jsoma/dataharvest25-ai-images-video/main/tattoos.zip"
urllib.request.urlretrieve(url, "tattoos.zip")

with zipfile.ZipFile('tattoos.zip', 'r') as zip_ref:
    zip_ref.extractall()

# Image-based Semantic Search

## aka searching by vibes

When AI models "think" about a cat, they don't actually think about my adorable cats. They think about the *mathematical representation* of a cat.

When multimodal models represent the *word* cat in their tiny electric brains, it's similar to when they represent an *image* of a cat. We can use this to build a **text search engine of images**.

In [None]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('clip-ViT-B-32')

Let's take this model and see what it thinks about when you tell it the word "cat".

In [None]:
embedding = model.encode('cat')
embedding[:200]

Ah, yes, sure, me too! And if we look at an image of a cat?

In [None]:
from PIL import Image

image = Image.open("cat.png").convert("RGB")
image

In [None]:
embedding = model.encode('cat')
embedding[:200]

I'm not going to go through those numbers, but I guarantee they have some similarities!

## Processing images

So this becomes useful when you have a bunch of images and want to search through them. You create embeddings for your images, embed your search term, and then say "find me all of the images that are kind of similar to this search term."

We're going to use a collection of tattoo and non-tattoo images from a machine-learning project I did a few years ago. The process of building an embedding index works like this:

In [None]:
import glob
from tqdm import tqdm
from PIL import Image

filenames = glob.glob("tattoos/*.jp*")
data = []

embeddings = []
for path in tqdm(filenames):
    image = Image.open(path).convert("RGB")
    embedding = model.encode(image)
    embeddings.append(embedding)


We're using [FAISS](https://github.com/facebookresearch/faiss) which is absolutely overkill, but oh well.

In [None]:
import faiss
import numpy as np

embedding_matrix = np.vstack(embeddings).astype('float32')
faiss.normalize_L2(embedding_matrix)

index = faiss.IndexFlatIP(embedding_matrix.shape[1])
index.add(embedding_matrix)

## Search by text

To find images that match a text query, we just encode the text and say "find me things that are similar!"

In [None]:
query = "colorful bird"
match_count = 10

query_embedding = model.encode(query, convert_to_numpy=True).astype('float32')
faiss.normalize_L2(query_embedding.reshape(1, -1))
D, I = index.search(query_embedding.reshape(1, -1), match_count)

matches = [filenames[i] for i in I[0]]
scores = D[0]

for filename, score in zip(matches, scores):
    print(f"{filename}: {score}")

In [None]:
from IPython.display import HTML
import pandas as pd

df = pd.DataFrame({
    'scores': scores,
    'filename': matches,
    'query': query
})
df['preview'] = df['filename'].apply(lambda filename: f'<img src="{filename}" width="100"/>')


HTML(df.to_html(escape=False))

## Search by image

To find images that are similar to another image (with 'similar' having no true controllable meaning), we just encode the image and say "find me things that are similar!"

In [None]:
image = Image.open('tattoo.png').convert("RGB")
image

In [None]:
match_count = 10

query_embedding = model.encode(image, convert_to_numpy=True).astype('float32')
faiss.normalize_L2(query_embedding.reshape(1, -1))
D, I = index.search(query_embedding.reshape(1, -1), match_count)

matches = [filenames[i] for i in I[0]]
scores = D[0]

for filename, score in zip(matches, scores):
    print(f"{filename}: {score}")

In [None]:
from IPython.display import HTML
import pandas as pd

df = pd.DataFrame({
    'scores': scores,
    'filename': matches,
})
df['preview'] = df['filename'].apply(lambda filename: f'<img src="{filename}" width="100"/>')


HTML(df.to_html(escape=False))

Why are they similar? NO IDEA. *Because the model thinks so.*