# 2. Visual Search - OpenAI Clip and VecText Clusters

<img src="https://github.com/retkowsky/images/blob/master/visualsearchlogo.jpg?raw=true">

# Visual Search with Azure Cognitive Search, Sentence Transformers, Azure Computer Vision and bar code/QR code detection

## Description
The goal of this is **Azure AI asset is to enable search over Text and Images using Azure Cognitive Search**. The technique was inspired by a research article which show how to **convert vectors (embeddings) to text which allows the Cognitive Search service to leverage the inverted index to quickly find the most relevant items**. For this reason, any model that will convert an object to a vector can be leveraged if the number of dimensions in the resulting vector is less than 3,000. It also allows users to leverage existing pretrained or fine-tuned models.<br><br>

This technique has shown to be incredibly effective and easy to implement. We are using **Sentence Transformers, which is an OpenAI clip model wrapper**. We need to embed all our existing catalog of images. Then the objects embedding are converted into a set of fake terms and all the results are stored into an Azure Cognitive Search index for handling all the search requests.
For example, if an embedding looked like [-0,21, .123, ..., .876], this might be converted to a set of fake terms such as: “A1 B3 … FED0”. This is what is sent as the search query to Azure Cognitive Search.<br><br>

We can **enrich the Azure Cognitive Search index by using extracted text from the images using Azure Read API**. We can also detect and extract any information from **bar code and/or QR code** that might be available in the products catalog images. And we can use also **Azure Computer Vision as well to detect the dominant colors of the image, the tags that can describe the image and the caption of each image**. All these information will be ingested into the Azure Cognitive Search index.<br><br>

The goal of this asset is to be able to use the inverted index within Azure Cognitive Search to be able to quickly find vectors stored in the search index that are like a vector provided as part of a search query and/or using any AI extracted information (text, dominant colors, …). Unlike techniques like cosine similarity which are slow to process large numbers of items, this leverages an inverted index which enables much more data to be indexed and searched.<br>

## Process

- We have here a collection of catalog images (466 images).
- For each of these images, we will embed them using Sentence Transformers.  Sentence Transformer can be used to map images and texts to the same vector space. As model, we use the OpenAI CLIP Model which was trained on a large set of images and image alt texts.
- We can retrieve any text from these images using Azure Read API (if any text is available)
- We can retrieve any text information from any bar code or QR code (if any)
- All these information will be ingested into an Azure Cognitive Search index
- Then if you have a field image, you can embed it and extract any text/barcode information and call the Azure Cognitive Search index to retrieve any similar images using vecText similarity and/or using any query text from the extracted text


<img src="https://github.com/retkowsky/images/blob/master/process.png?raw=true">

Field images are available in the field images directory (number of images=53)


## Azure products documentation
- https://azure.microsoft.com/en-us/products/search/ 
- https://azure.microsoft.com/en-us/products/cognitive-services/computer-vision/#overview 
- https://learn.microsoft.com/en-us/azure/cognitive-services/Computer-vision/how-to/call-read-api 
- https://zbar.sourceforge.net/ 
- https://github.com/liamca/vector-search

## Research article
https://www.researchgate.net/publication/305910626_Large_Scale_Indexing_and_Searching_Deep_Convolutional_Neural_Network_Features
    
## Directories
- **images**: We have two directories (catalog images, field images)
- **model**: Directory to save the clusters of the model
- **results**: Directory to save some results
- **test**: Directory that contains some testing images

## Python notebooks

### 0. Settings.ipynb
Notebook that contains the link to the images and the importation process of the python required libraries

### 1. Catalog images exploration.ipynb
This notebook will display some catalog and field images

### 2. OpenAI Clip and VecText Clusters.ipynb
This notebook will explain what sentence transformers is and will generate the clusters
This notebook analyzes a set of existing images to determine a set of "cluster centers" that will be used to determine which "fake words" are generated for a vector
This notebook will take a test set of files (testSamplesToTest) and determine the optimal way to cluster vectors into fake words that will be indexed into Azure Cognitive Search

### 3. VecText generation.ipynb
This notebook will generate the vectext embedding for all the catalog images

### 4. BarCode Information extraction.ipynb
This notebook will detect any barcode or QR code from the catalog images and will extract the information

### 5. Azure CV for OCR, tags, colors and captions.ipynb
This notebook will use Azure Computer Vision or OCR, colors, tags and caption extraction for each of the catalog images.

### 6. Azure Cognitive Search Index Generation.ipynb
This notebook will show how to ingest all the information into an Azure Cognitive Search index.

### 7. Calling Azure Cognitive Search.ipynb
We can now test the index using some images similarity visual search or free text queries using azure Cognitive Search.

## Python files

- **azureCognitiveSearch.py**
This python file contains many functions to manage and use Azure Cognitive Search

- **myfunctions.py**
This python file contains many generic functions used in all the notebooks

- **vec2Text.py**
This python file contains some functions for the sentence transformers model


24-oct-2022 Serge Retkowsky | serge.retkowsky@microsoft.com | https://www.linkedin.com/in/serger/

In [1]:
import configparser
import concurrent.futures
import json
import glob
import pandas as pd
import pickle
import seaborn as sns
import sys
import torch

from IPython.display import Image as IPDImage
from PIL import Image
from sentence_transformers import SentenceTransformer, util

import myfunctions as my
import vec2Text

from azureml.core import Workspace, Dataset, Datastore
import azureml.core
from azureml.data.datapath import DataPath

%matplotlib inline

In [2]:
print(my.get_today())

24-10-2022 14:28:22


## 1. Settings

In [3]:
config_file = 'azureservices.py'

config = configparser.ConfigParser()
config.read(config_file)

subscription_id = config.get('AzureML', 'subscription_id')
resource_group = config.get('AzureML', 'resource_group')
workspace_name = config.get('AzureML', 'workspace_name')
                            
ws = Workspace(subscription_id, resource_group, workspace_name)

In [4]:
from azureml.core.experiment import Experiment
experiment = Experiment(workspace=ws, name="VisualSearch")

In [5]:
experiment

Name,Workspace,Report Page,Docs Page
VisualSearch,azuremlvision,Link to Azure Machine Learning studio,Link to Documentation


In [6]:
cluster_center_file = 'cluster_centers_images.pkl'
testSamplesToTest = 100

In [7]:
IMAGES_DIR = "./images/catalog_images"
print("Image are available here:", IMAGES_DIR)

Image are available here: ./images/catalog_images


## 2. Open AI with sentence transformers

<img src="https://github.com/openai/CLIP/raw/main/CLIP.png">

Here we will use sentence transformer which is an OpenAI clip wrapper<br>
https://github.com/UKPLab/sentence-transformers
<br><br>

- Blog: https://openai.com/blog/clip/
- Model Card: https://github.com/openai/CLIP/blob/main/model-card.md
- Paper: https://arxiv.org/abs/2103.00020

List of models: https://www.sbert.net/docs/pretrained_models.html#image-text-models

In [8]:
model = vec2Text.openai_clip_model()

Loading OpenAI Clip model: clip-ViT-B-32
Done


## 3. VecText Clusters

In [9]:
files = vec2Text.get_files_in_dir(IMAGES_DIR)
print('Total images Catalog files:', len(files))

Total images Catalog files: 466


In [10]:
# Look at a single document and determine the number
# of dimensions the resulting vectors will have
dimensions = vec2Text.calculate_dimensions(files[0], model=model)
print('Vector Dimensions:', dimensions)

Vector Dimensions: 512


In [11]:
start = my.now()
vecDict = []

print(my.get_today(), "Adding test vectors to the dictionnary...")
print("Test Samples:", testSamplesToTest, "\n")

vecDict = vec2Text.initialize_vector_dictionary(dimensions)
idx = 0

for file in files:
    # Embedding the samples
    cur_vec = vec2Text.image_embedding(file, model=model)

    for d in range(dimensions):
        vecDict[str(d)].append(cur_vec[d])

    idx += 1

    if idx % 10 == 0:
        print('Processed:', idx, 'of', testSamplesToTest)

    if idx == testSamplesToTest:
        break

print("\nDone in", (my.now() - start).in_words(locale='en'))

24-10-2022 14:28:26 Adding test vectors to the dictionnary...
Test Samples: 100 

Processed: 10 of 100
Processed: 20 of 100
Processed: 30 of 100
Processed: 40 of 100
Processed: 50 of 100
Processed: 60 of 100
Processed: 70 of 100
Processed: 80 of 100
Processed: 90 of 100
Processed: 100 of 100

Done in 14 seconds


### Running the k-means algorithm using the optimal k values and find the cluster centers for the 512 dimensions

In [12]:
start = my.now()

print(my.get_today(), "Finding the cluster centers...\n")
clusterCenters = vec2Text.find_cluster_centers(dimensions, vecDict)
print("\nDone in", (my.now() - start).in_words(locale='en'))

24-10-2022 14:28:40 Finding the cluster centers...

Processed 10 of 512
Processed 20 of 512
Processed 30 of 512
Processed 40 of 512
Processed 50 of 512
Processed 60 of 512
Processed 70 of 512
Processed 80 of 512
Processed 90 of 512
Processed 100 of 512
Processed 110 of 512
Processed 120 of 512
Processed 130 of 512
Processed 140 of 512
Processed 150 of 512
Processed 160 of 512
Processed 170 of 512
Processed 180 of 512
Processed 190 of 512
Processed 200 of 512
Processed 210 of 512
Processed 220 of 512
Processed 230 of 512
Processed 240 of 512
Processed 250 of 512
Processed 260 of 512
Processed 270 of 512
Processed 280 of 512
Processed 290 of 512
Processed 300 of 512
Processed 310 of 512
Processed 320 of 512
Processed 330 of 512
Processed 340 of 512
Processed 350 of 512
Processed 360 of 512
Processed 370 of 512
Processed 380 of 512
Processed 390 of 512
Processed 400 of 512
Processed 410 of 512
Processed 420 of 512
Processed 430 of 512
Processed 440 of 512
Processed 450 of 512
Processed 46

## 4. Saving clusters

In [13]:
PKL_FOLDER = "model"
my.create_dir(PKL_FOLDER)

Done. Directory: model has been created


In [14]:
print("Saving cluster centers into:", cluster_center_file)

os.chdir(PKL_FOLDER)
with open(cluster_center_file, 'wb') as pickle_out:
    pickle.dump(clusterCenters, pickle_out)
os.chdir("..")

print("\nDone")

Saving cluster centers into: cluster_centers_images.pkl

Done


### Saving to Azure ML datastore

In [15]:
workspace = Workspace(subscription_id, resource_group, workspace_name)
datastore = workspace.get_default_datastore()

ds = Dataset.File.upload_directory(src_dir=PKL_FOLDER,
                                   target=DataPath(datastore, PKL_FOLDER),
                                   show_progress=True,
                                   overwrite=True)

Validating arguments.
Arguments validated.
Uploading file to model
Uploading an estimated of 1 files
Uploading model/cluster_centers_images.pkl
Uploaded model/cluster_centers_images.pkl, 1 files out of an estimated total of 1
Uploaded 1 files
Creating new dataset


## 5. Clusters file

In [16]:
my.list_dir(PKL_FOLDER)

Files in directory: model 

1 	 2022-10-24 14:36:57.169631 29.1 kB 	 cluster_centers_images.pkl


> End of notebook