# Image search tutorial

In this tutorial, we'll walk through how to use Lexy to create a multimodal search application. We'll use the [CLIP](https://openai.com/blog/clip/) model from OpenAI to create embeddings for images, and then use those embeddings to find matching images for a given text query, or vice versa.

In [None]:
from lexy_py import LexyClient

lexy = LexyClient()

## Create collection

Let's first create a collection to store our images. We'll use the `images_tutorial` collection for this tutorial.

In [None]:
# create a new collection
images_tutorial = lexy.create_collection('images_tutorial')
images_tutorial

## Create index and binding

### Define index

First we'll define our index to store our embedded images. We use `*.embeddings.clip` as the transformer model name to indicate that we want to use the CLIP embeddings model, but that the embedding field can use any model that matches this pattern, including `image.embeddings.clip` and `text.embeddings.clip`. 

In [None]:
# define index fields
index_fields = {
    "embedding": {"type": "embedding", "extras": {"dims": 512, "model": "*.embeddings.clip"}},
}

# create index
idx = lexy.create_index(
    index_id='image_tutorial_index', 
    description='Index for images tutorial',
    index_fields=index_fields
)
idx

We'll use the CLIP image embeddings transformer available on [HuggingFace](https://huggingface.co/openai/clip-vit-base-patch32). This transformer uses the [CLIP](https://openai.com/blog/clip/) model from OpenAI to create embeddings for images. 

The CLIP model is a transformer model that was trained on a large dataset of images and text pairs. The model learns to map images and text to a shared embedding space, where the embeddings of matching images and text are close together. We can use this model to create embeddings for images, and then use those embeddings to find matching images for a given text query, or vice versa.

In [None]:
lexy.transformers

### Create binding

We'll create a binding that will process images added to our `images_tutorial` collection using the CLIP image embeddings transformer, and store the results in `image_tutorial_index`.

In [None]:
binding = lexy.create_binding(collection_id='images_tutorial',
                              transformer_id='image.embeddings.clip',
                              index_id='image_tutorial_index')
binding

## Upload images to the collection

Let's upload some images from the [image-text-demo dataset](https://huggingface.co/datasets/jamescalam/image-text-demo) to the collection. This dataset is from HuggingFace datasets and requires the `datasets` package to be installed.

In [None]:
! pip install datasets

In [None]:
# import test data from HuggingFace datasets - requires `pip install datasets`

from datasets import load_dataset
data = load_dataset("jamescalam/image-text-demo", split="train")

In [None]:
len(data)

In [None]:
# add documents to the collection
for i, row in enumerate(data, start=1):
    print(i, row['text'])
    lexy.upload_documents(files=row['image'], 
                          filenames=row['text'] + '.jpg', 
                          collection_id='images_tutorial')

In [None]:
# check the collection
images_tutorial.list_documents()

## Query index

Let's first define some helper functions to display our image results.

In [None]:
import httpx
from IPython.display import display, HTML
from PIL import Image

def image_from_url(url): 
    response = httpx.get(url)
    response.raise_for_status()
    return Image.open(response)

def display_results_html(records):
    html_content = ""
    for r in records:
        d = r['document']
        thumbnail_url = d.thumbnail_url
        fname = d.meta.get('filename')
        score = f"score: {r['distance']:.4f}"
        # Creating a row for each result with image on the left and text on the right
        html_content += f"""
        <div style='display: flex; align-items: center; margin-bottom: 20px; margin-top: 20px;'>
            <img src='{thumbnail_url}' style='width: auto; height: auto; margin-right: 20px;'/>
            <div>
                <p>{fname}</p>
                <p>{score}</p>
            </div>
        </div>
        """
    # Display all results as HTML
    display(HTML(html_content))


### Query by text

We can query our index by text to find matching images.

In [None]:
results = idx.query(query_text='best friends', return_document=True)
display_results_html(results)

In [None]:
results = idx.query(query_text='gotham city', return_document=True)
display_results_html(results)

### Query by image

We can also query our index by image to find matching images.

In [None]:
img = image_from_url('https://getlexy.com/assets/images/dalle-agi.jpeg')
img

In [None]:
results = idx.query(query_image=img, return_document=True)
display_results_html(results)

In [None]:
img = image_from_url('https://upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Night_in_the_Greater_Tokyo_Area_ISS054.jpg/2560px-Night_in_the_Greater_Tokyo_Area_ISS054.jpg')
img

In [None]:
results = idx.query(query_image=img, return_document=True)
display_results_html(results)

In [None]:
img = image_from_url('https://upload.wikimedia.org/wikipedia/commons/e/ed/Shanghai_skyline_2018%28cropped%29.jpg')
img

In [None]:
results = idx.query(query_image=img, return_document=True)
display_results_html(results)

## Clean up

In [None]:
# lexy.delete_binding(binding_id=binding.id)

In [None]:
# lexy.delete_index('image_tutorial_index', drop_table=True)

In [None]:
# lexy.delete_collection('images_tutorial', delete_documents=True)