<!-- TABS -->
# Multimodal vector search - Image

In [None]:
datas = [{'img': d} for d in data[:100]]

## Build multimodal embedding models

We define the output data type of a model as a vector for vector transformation.

In [None]:
# <tab: MongoDB>
from pinnacledb.components.vector_index import vector
output_datatpye = vector(shape=(1024,))

In [None]:
# <tab: SQL>
from pinnacledb.components.vector_index import sqlvector
output_datatpye = sqlvector(shape=(1024,))

Then define two models, one for text embedding and one for image embedding.

In [None]:
# <tab: Text-Image>
!pip install git+https://github.com/openai/CLIP.git
import clip
from pinnacledb import vector
from pinnacledb.ext.torch import TorchModel

# Load the CLIP model and obtain the preprocessing function
model, preprocess = clip.load("RN50", device='cpu')

# Create a TorchModel for text encoding
compatible_model = TorchModel(
    identifier='clip_text', # Unique identifier for the model
    object=model, # CLIP model
    preprocess=lambda x: clip.tokenize(x)[0],  # Model input preprocessing using CLIP 
    postprocess=lambda x: x.tolist(), # Convert the model output to a list
    datatype=output_datatpye,  # Vector encoder with shape (1024,)
    forward_method='encode_text', # Use the 'encode_text' method for forward pass 
)

# Create a TorchModel for visual encoding
model = TorchModel(
    identifier='clip_image',  # Unique identifier for the model
    object=model.visual,  # Visual part of the CLIP model    
    preprocess=preprocess, # Visual preprocessing using CLIP
    postprocess=lambda x: x.tolist(), # Convert the output to a list 
    datatype=output_datatpye, # Vector encoder with shape (1024,)
)

Because we use multimodal models, we define different keys to specify which model to use for embedding calculations in the vector_index.

In [4]:
indexing_key = 'img' # we use img key for img embedding
compatible_key = 'text' # we use text key for text embedding

## Perform a vector search

We can perform the vector searches using two types of data:

- Text: By text description, we can find images similar to the text description.
- Image: By using an image, we can find images similar to the provided image.

In [None]:
# <tab: Text>
item = Document({compatible_key: "Find a black dog"})

In [None]:
# <tab: Image>
from IPython.display import display
search_image = data[0]
display(search_image)
item = Document({indexing_key: search_image})

Once we have this search target, we can execute a search as follows.

In [None]:
select = query_table_or_collection.like(item, vector_index=vector_index_name, n=5).select()
results = list(db.execute(select))

## Visualize Results

In [None]:
from IPython.display import display
for result in results:
    display(result[indexing_key])

## Check the system stays updated

You can add new data; once the data is added, all related models will perform calculations according to the underlying constructed model and listener, simultaneously updating the vector index to ensure that each query uses the latest data.

In [None]:
new_datas = [{'img': d} for d in data[100:200]]
ids = db.execute(table_or_collection.insert(new_datas))