# Similarity search with images

txtai as the name implies works with text and ai, pretty straightforward. But that doesn't mean it can't work with different types of content. For example, an image can be described with words. We can use that description to compare an image to a query or other documents. This notebook shows how images and text can be embedded into the same space to support similarity search.

A future version of txtai will add support for image captioning which will enable images, audio, documents and text to all live in the same embedding index. The model in this notebook is designed to have images in a separate embedding index. Stay tuned for more on image captioning!

# Install dependencies

Install `txtai` and all dependencies. Since this notebook uses sentence-transformers directly, we need to install the similarity extras package.

In [5]:
%%capture
!pip install torchvision ipyplot git+https://github.com/neuml/txtai#egg=txtai[similarity]

# Get test data
!wget -N https://github.com/neuml/txtai/releases/download/v2.0.0/tests.tar.gz
!tar -xvzf tests.tar.gz

# Create an Embeddings model

[sentence-transformers](https://github.com/UKPLab/sentence-transformers/tree/master/examples/applications/image-search) recently added support for the [OpenAI CLIP model](https://github.com/openai/CLIP). This model embeds text and images into the same space, enabling image similarity search. txtai can directly utilize these models through sentence-transformers. Check out the sentence-transformers link above for additional examples on how to use this model.

This section builds an embeddings index over a series of images.



In [6]:
%%capture

import glob

from PIL import Image

from txtai.embeddings import Embeddings

def images():
  for path in glob.glob('txtai/*jpg'):
    yield (path, Image.open(path), None)

embeddings = Embeddings({"method": "sentence-transformers", "path": "sentence-transformers/clip-ViT-B-32"})
embeddings.index(images())

# Search the index

Now that we have an index, let's search it! This section runs a list of queries against the index and shows the top result for each query. Have to say this is pretty 🔥🔥🔥

In [7]:
import ipyplot
from PIL import Image

images, labels = [], []
for query in ["Walking into the office", "Saturday cleaning the yard", "Working on the latest analysis", "Working on my homework", "Watching an exciting race",
              "The universe is massive", "Time lapse video of traffic", "Relaxing Thanksgiving day"]:
  index, _ = embeddings.search(query, 1)[0]

  images.append(Image.open(index))
  labels.append(query)

ipyplot.plot_images(images, labels, img_width=425, force_b64=True)

# Multilingual Support

sentence-transformers also has a [model](https://huggingface.co/sentence-transformers/clip-ViT-B-32-multilingual-v1) that supports over 50+ languages. This enables running queries using those languages with an image index.

Note this model only supports text, so images must first be indexed with the model used above.

In [8]:
import ipyplot
from PIL import Image

from txtai.pipeline import Translation

# Update model at query time to support multilingual queries
embeddings.config["path"] = "sentence-transformers/clip-ViT-B-32-multilingual-v1"
embeddings.model = embeddings.loadVectors()

# Translate queries to German
queries = ["Walking into the office", "Saturday cleaning the yard", "Working on the latest analysis", "Working on my homework", "Watching an exciting race",
           "The universe is massive", "Time lapse video of traffic", "Relaxing Thanksgiving day"]
translate = Translation()
translated = translate(queries, "de")

images, labels = [], []
for x, query in enumerate(translated):

  index, _ = embeddings.search(query, 1)[0]

  images.append(Image.open(index))
  labels.append("%s<br/>(%s)" % (query, queries[x]))

ipyplot.plot_images(images, labels, img_width=425, force_b64=True)