# Multimodal search with CLIP

In this notebook we show-case SuperDuperDB's functionality for searching with multiple types of data over
the same `VectorIndex`. This comes out very naturally, due to the fact that SuperDuperDB allows
users and developers to add arbitrary models to SuperDuperDB, and (assuming they output vectors) use
these models at search/ inference time, to vectorize diverse queries.

To this end, we'll be using the [CLIP multimodal architecture](https://openai.com/research/clip).

In [None]:
!pip install https://github.com/openai/CLIP
!pip install pinnacledb

So let's start. 

SuperDuperDB supports MongoDB as a databackend. Correspondingly, we'll import the python MongoDB client `pymongo`
and "wrap" our database to convert it to a SuperDuper `Datalayer`:

In [1]:
import os
import clip
import pymongo
from pinnacledb.misc.pinnacle import pinnacle
from pinnacledb.ext.torch.model import TorchModel
from pinnacledb.ext.pillow.image import pil_image as i
from pinnacledb.db.mongodb.query import Collection
from pinnacledb.container.document import Document as D
from IPython.display import display


# Uncomment one of the following lines to use a bespoke MongoDB deployment
# For testing the default connection is to mongomock

mongodb_uri = os.getenv("MONGODB_URI","mongomock://test")
# mongodb_uri = "mongodb://localhost:27017"
# mongodb_uri = "mongodb://pinnacle:pinnacle@mongodb:27017/documents"
# mongodb_uri = "mongodb://<user>:<pass>@<mongo_cluster>/<database>"
# mongodb_uri = "mongodb+srv://<username>:<password>@<atlas_cluster>/<database>"

# Super-Duper your Database!
from pinnacledb import pinnacle
db = pinnacle(mongodb_uri)

collection = Collection(name='tiny-imagenet')

INFO:numexpr.utils:NumExpr defaulting to 8 threads.


In order to make this notebook easy to execute an play with, we'll use a sub-sample of the [Tiny-Imagenet
dataset](https://paperswithcode.com/dataset/tiny-imagenet). 

Everything we are doing here generalizes to much larger datasets, with higher resolution images, without
further ado. For such use-cases, however, it's advisable to use a machine with a GPU, otherwise they'll 
be some significant thumb twiddling to do.

To get the images into the database, we use the `Encoder`-`Document` framework. This allows
us to save Python class instances as blobs in the `Datalayer`, but retrieve them as Python objects.
This makes it far easier to integrate Python AI-models with the datalayer.

To this end, SuperDuperDB contains pre-configured support for `PIL.Image` instances. It's also 
possible to create your own encoders.

In [None]:
from pinnacledb.container.document import Document as D
from pinnacledb.ext.pillow.image import pil_image as i
from datasets import load_dataset
import random

dataset = load_dataset("zh-plus/tiny-imagenet")['valid']
dataset = [D({'image': i(r['image'])}) for r in dataset]
dataset = random.sample(dataset, 1000)

Downloading readme:   0%|          | 0.00/3.90k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/3.52k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/146M [00:00<?, ?B/s]

The wrapped python dictionaries may be inserted directly to the `Datalayer`:

In [None]:
db.execute(collection.insert_many(dataset, encoders=(i,)))

We can verify that the images are correctly stored:

In [None]:
x = db.execute(collection.find_one())['image'].x
x

We now can wrap the CLIP model, to ready it for multimodel search. It involves 2 components:

- text-encoding
- visual-encoding

Once we have installed both parts, we will be able to search with both images and text for 
matching items:

In [None]:
from pinnacledb.ext.torch.tensor import tensor
import torch

t = tensor(torch.float, shape=(512,))

model, preprocess = clip.load("RN50", device='cpu')

text_model = TorchModel(
    identifier='clip_text',
    object=model,
    preprocess=lambda x: clip.tokenize(x)[0],
    forward_method='encode_text',
    encoder=t
)

Let's verify this works:

In [None]:
text_model.predict('this is a test', one=True)

Similar procedure with the visual part, which takes `PIL.Image` instances as inputs.

In [None]:
visual_model = TorchModel(
    identifier='clip_image',
    preprocess=preprocess,
    object=model.visual,
    encoder=t,
)

In [None]:
visual_model.predict(x, one=True)

Now let's create the index for searching by vector. We register both models with the index simultaneously,
but specifying that it's the `visual_model` which will be responsible for creating the vectors in the database
(`indexing_listener`). The `compatible_listener` specifies how one can use an alternative model to search 
the vectors. By using models which expect different types of index, we can implement multimodal search
without further ado.

In [None]:
from pinnacledb.container.vector_index import VectorIndex
from pinnacledb.container.listener import Listener

db.add(
    VectorIndex(
        'my-index',
        indexing_listener=Listener(
            model=visual_model,
            key='image',
            select=collection.find(),
        ),
        compatible_listener=Listener(
            model=text_model,
            key='text',
            active=False,
        )
    )
)

We can now demonstrate searching by text for images:

In [None]:
out = db.execute(
    collection.like(D({'text': 'mushroom'}), vector_index='my-index', n=3).find({})
)
for r in out:
    display(r['image'].x)