# Run Image Similarity Search with Milvus on Different Models

In this example we will run two models to generate image feature vectors and do similarity search with [Milvus](https://milvus.io/). At the same time, we will use [ONNX](https://github.com/onnx/onnx) to manage the different models(Vgg16 and Resnet50).

## Data
We use the [COCO animals dataset](http://cs231n.stanford.edu/coco-animals.zip), which is a smaller subset of the COCO dataset, which has 800 training images and 200 test images of 8 classes of animals: bear, bird, cat, dog, giraffe, horse, sheep, and zebra.

And we extract 200 test image from it, you can download the zip file on [google drive](https://drive.google.com/file/d/1Nz67USZbz3bloUcf6PFrt7SZ7p6oMSS_/view?usp=sharing).

In [None]:
# unzip the coco-images
! unzip coco-images.zip

## Requirement 
| Packages |  Servers |
| --------------- | -------------- |
| pymilvus_orm2.0.0rc1        | Milvus-2.0rc1   |
| redis | Redis          |
| onnx |       |
| onnxruntime |       |
| tensorflow |       |
| tf2onnx |       |



## Up and running


#### 1. Start Milvus server

```bash
wget https://raw.githubusercontent.com/milvus-io/milvus/master/deployments/docker/standalone/docker-compose.yml -O docker-compose.yml
docker-compose up -d
```

This demo uses Milvus 2.0. Refer to the [Install Milvus](https://milvus.io/docs/v2.0.0/install_standalone-docker.md) for how to install Milvus docker. 

#### 2. Start Redis
Start [Redis](https://hub.docker.com/_/redis/) with Docker container, Redis is used to store image paths and the corresponding Milvus ids.

In [None]:
! docker run -d -p 6379:6379 redis

#### 3. Install python packages

In [None]:
! pip install -r requirements.txt

## Code Overview
Once the environment is ready, we can start running the code. Let's start a similar image search together.

#### 1. Load ResNet50 model and covert it to ONNX
The first step is to extract the feature vector of the image, as mentioned earlier we will run it using two models, one is the ResNet50 model based Keras and the other is the VGG16 model.

As you will see in the code below, VGG16 is a pre-trained model in ONNX format, so we can convert ResNet50 to ONNX format as well to facilitate uniform use later.

In [None]:
import onnxruntime
import keras2onnx
import numpy as np
from numpy import linalg as LA
from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input
import tensorflow as tf
import os

In [None]:
# load keras-resnet50 model and save as a floder
model_resnet50 = ResNet50(include_top=False, pooling='max', weights='imagenet')
tf.saved_model.save(model_resnet50, "keras_resnet50_model")

In [None]:
# convert resnet50 model to onnx
! python -m tf2onnx.convert --saved-model "keras_resnet50_model" --output "onnx_resnet50.onnx"

#### 2. Get the image embeddings with onnx model(ResNet50&VGG16)
Next we will use the two ONNX format models to generate feature vectors. And it can be seen in the following code, both the ResNet50 model and VGG16 generate different vectors, and they do not even have the same dimensionality, Resnet50 is 2048 dimensions and vgg16 is 512 dimensions.

You can download the `onnx_vgg16.onnx` file om google drive: https://drive.google.com/file/d/1QRVlLD9qgbY2K8ezM83BYyvnQZVCKuTO/view?usp=sharing

In [None]:
# get the image vectors with onnx model
def get_onnx_vectors(onnx_model, img_path):
    img = image.load_img(img_path, target_size=(224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    
    sess = onnxruntime.InferenceSession(onnx_model)
    x = x if isinstance(x, list) else [x]
    feed = dict([(input.name, x[n]) for n, input in enumerate(sess.get_inputs())])
    feat = sess.run(None, feed)[0]
    
    norm_feat = feat[0] / LA.norm(feat[0])
    norm_feat = [i.item() for i in norm_feat]
    return norm_feat

In [None]:
# try to generate vectors with ResNet50 and prepared VGG16 ONNX model
vec_resnet = get_onnx_vectors("onnx_resnet50.onnx", "./pic/example.jpg")
vec_vgg = get_onnx_vectors("onnx_vgg16.onnx", "./pic/example.jpg")

In [None]:
# show the dimension of different models
print("The dimensions of ResNet50: ", len(vec_resnet))
print("The dimensions of VGG16: ", len(vec_vgg))

In [None]:
# generate the vectors batchs with the coco-images dataset
def read_all_images(onnx_model, img_directory):
    images = os.listdir(img_directory)
    images.sort()
    vectors = []
    for image in images:
        vec = get_onnx_vectors(onnx_model, img_directory + '/' + image)
        vectors.append(vec)
    return vectors

In [None]:
# it will take about one minute
IMG_DICT = 'coco-images'
resnet_vectors = read_all_images("onnx_resnet50.onnx", IMG_DICT)
print(len(resnet_vectors))

In [None]:
# it will take about one minute
vgg_vectors = read_all_images("onnx_vgg16.onnx", IMG_DICT)
print(len(vgg_vectors))

#### 3. Create collection and insert the vectors to Milvus
In the previous section we generate the feature vectors by the model, next we can process the vectors.

Milvus is an open source vector search engine, we can store and retrieve the vectors in Milvus, we first have to create the collection and insert the data.

In [None]:
# connect to Milvus server
from pymilvus_orm import *
connections.connect(host="127.0.0.1", port=19530)

In [None]:
resnet_collection_name = "resnet_img_search"
resnet_VECTOR_DIMENSION = 2048
resnet_fields = [
    schema.FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    schema.FieldSchema(name="vector", dtype=DataType.FLOAT_VECTOR, dim=resnet_VECTOR_DIMENSION)
]

vgg_collection_name = "vgg_img_search"
vgg_VECTOR_DIMENSION = 512
vgg_fields = [
    schema.FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    schema.FieldSchema(name="vector", dtype=DataType.FLOAT_VECTOR, dim=vgg_VECTOR_DIMENSION)
]

TOP_K = 5

In [None]:
#Creat a collection

resnet_schema = schema.CollectionSchema(fields=resnet_fields, description="Resnet test collection")
resnet_collection = Collection(name=resnet_collection_name, schema=resnet_schema)

vgg_schema = schema.CollectionSchema(fields=vgg_fields, description="Vgg test collection")
vgg_collection = Collection(name=vgg_collection_name, schema=vgg_schema)

In [None]:
# insert data to Milvus and return ids
resnet_mr = resnet_collection.insert([resnet_vectors])
resnet_collection.load()
resnet_ids = resnet_mr.primary_keys
print("Insert resnet vector to Milvus and the len:", len(resnet_ids))

vgg_mr = vgg_collection.insert([vgg_vectors])
vgg_collection.load()
vgg_ids = vgg_mr.primary_keys
print("Insert vgg vector to Milvus and the len:", len(vgg_ids))

#### 4. Insert the image and ids to Reids
After the vectors of the images are stored in Milvus, the corresponding ids are generated, and the next step is to store the correspondence between the ids and the image paths in Redis.

In [None]:
# connect to Redis
import redis
red = redis.Redis(host = '127.0.0.1', port=6379, db=0)

In [None]:
def img_ids_to_redis(img_directory, res_ids):
    images = os.listdir(img_directory)
    images.sort()
    for img, ids in zip(images, res_ids):
        red.set(ids, img)

In [None]:
img_ids_to_redis(IMG_DICT, resnet_ids)
img_ids_to_redis(IMG_DICT, vgg_ids)

#### 5. Search the similarity images
Then we can retrieve an image in Milvus, and Milvus will return the most similar image id, and then get the image according to the id in Redis.

In [None]:
# search in Milvus and return the similarly results with ids
def search_in_milvus(collection, search_vector):
    search_params = {"metric_type": "L2", "params": {"nprobe": 32}}
    results = collection.search([search_vector], "vector", param=search_params, limit=10, expr=None)
    re_ids = [x.id for x in results[0]]
    re_distance = [x.distance for x in results[0]]
    return re_ids, re_distance

In [None]:
# get the images according the result ids
def get_sim_imgs(collection, search_vector):
    ids, distance = search_in_milvus(collection, search_vector)
    img = [red.get(i).decode("utf-8") for i in ids]
    return ids, distance, img

In [None]:
resnet_ids, resnet_distance, resnet_img =  get_sim_imgs(resnet_collection, vec_resnet)
vgg_ids, vgg_distance, vgg_img =  get_sim_imgs(vgg_collection, vec_vgg)

#### 6. Show the results
Finally we get the most similar image results(id, distance and image) to the target image, and we can see together how the different models retrieve the results.

We can see that the first image of the retrieval result is our target image (because the distance is 0), and the first 5 results look the same but the distance is not the same. It can be seen that the retrieval effect of VGG16 and ResNet50 is about the same, but of course you can use more and larger data to retrieve, and the effect may be different.

In [None]:
from IPython.display import Image, display
def show_results(img, ids=None, distance=None):
    for i in range(len(img)):
        print(ids[i], distance[i])
        x = Image(filename=IMG_DICT+'/'+resnet_img[i]) 
        display(x)

In [None]:
show_results(resnet_img, resnet_ids, resnet_distance)

In [None]:
show_results(vgg_img, vgg_ids, vgg_distance)