<i>Copyright (c) Microsoft Corporation. All rights reserved.</i>

<i>Licensed under the MIT License.</i>

# Image Similarity Export

In the Scenario->Image Similarity notebook [12_fast_retrieval.ipynb](12_fast_retrieval.ipynb) we implemented the approximate nearest neighbor search method to find similar images from a group of reference images, given a query input image. This notebook repeats some of those steps with the goal of exporting computed reference image features to text file for use in visualizing the results in an HTML web interface. 

To be able to test the model in a simple HTML interface, we export: the computed reference image features, a separate text file of reference image file names, and thumbnail versions of the reference images. The first two files are initially exported as text files then compressed into zip files to minimuze file size. The reference images are converted to 150x150 pixel thumbnails and stored in a flat directory. All exports are saved to the UICode folder. Notebook **2_upload_ui** is used to upload the exports to your Azure Blob storage account for easy public access. 

It is assumed you already completed the steps in notebook [12_fast_retrieval.ipynb](12_fast_retrieval.ipynb) and have deployed your query image processing model to an Azure ML resource (container services, Kubernetes services, ML web app, etc.) with a queryable, CORS-compliant API endpoint.

## Initialization

In [1]:
# Ensure edits to libraries are loaded and plotting is shown in the notebook.
%matplotlib inline
%reload_ext autoreload
%autoreload 2

In [2]:
# Standard python libraries
import sys
import os
import numpy as np
from pathlib import Path
import random
import scrapbook as sb
from sklearn.neighbors import NearestNeighbors
from tqdm import tqdm
import zipfile
from zipfile import ZipFile

# Fast.ai
import fastai
from fastai.vision import (
    load_learner,
    cnn_learner,
    DatasetType,
    ImageList,
    imagenet_stats,
    models,
    PIL
)

# Computer Vision repository
sys.path.extend([".", "../../.."])  # to access the utils_cv library
from utils_cv.classification.data import Urls
from utils_cv.common.data import unzip_url
from utils_cv.common.gpu import which_processor, db_num_workers
from utils_cv.similarity.metrics import compute_distances
from utils_cv.similarity.model import compute_features_learner
from utils_cv.similarity.plot import plot_distances, plot_ranks_distribution

In [3]:
print(f"Fast.ai version = {fastai.__version__}")
which_processor()

Fast.ai version = 1.0.57
Cuda is not available. Torch is using CPU


## Data preparation
We start with parameter specifications and data preparation. We use the *Fridge objects* dataset, which is composed of 134 images, divided into 4 classes: can, carton, milk bottle and water bottle. 

In [4]:
# Data location
DATA_PATH = unzip_url(Urls.fridge_objects_path, exist_ok=True)

# Image reader configuration
BATCH_SIZE = 16
IM_SIZE = 300

# Number of comparison of nearest neighbor versus exhaustive search for accuracy computation
NUM_RANK_ITER = 100

# Size of thumbnail images in pixels
MAX_SIZE = (150, 150)

In [5]:
# Load images into fast.ai's ImageDataBunch object
random.seed(642)
data = (
    ImageList.from_folder(DATA_PATH)
    .split_by_rand_pct(valid_pct=0.8, seed=20)
    .label_from_folder()
    .transform(size=IM_SIZE)
    .databunch(bs=BATCH_SIZE, num_workers = db_num_workers())
    .normalize(imagenet_stats)
)
print(f"Training set: {len(data.train_ds.x)} images, validation set: {len(data.valid_ds.x)} images")

Training set: 27 images, validation set: 107 images


## Load model

Below we load a [ResNet18](https://arxiv.org/pdf/1512.03385.pdf) CNN from fast.ai's library which is pre-trained on ImageNet.

In [6]:
learn = cnn_learner(data, models.resnet18, ps=0)

Alternatively, one can load a model which was trained using the [01_training_and_evaluation_introduction.ipynb](01_training_and_evaluation_introduction.ipynb) notebook using these lines of code:
```python
    learn = load_learner(".", 'image_similarity_01_model')
    learn.data = data
```

## Feature extraction

We now compute the DNN features for each image in our validation set. We use the output of the penultimate layer as our image representation, which, for the Resnet-18 model has a dimensionality of 512 floating point values.

In [7]:
# Use penultimate layer as image representation
embedding_layer = learn.model[1][-2] 
print(embedding_layer)

BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)


In [8]:
# Compute DNN features for all validation images
valid_features = compute_features_learner(data, DatasetType.Valid, learn, embedding_layer)
print(f"Computed DNN features for the {len(list(valid_features))} validation images,\
each consisting of {len(valid_features[list(valid_features)[0]])} floating point values.\n")

# Normalize all reference features to be of unit length
valid_features_list = np.array(list(valid_features.values()))
valid_features_list /= np.linalg.norm(valid_features_list, axis=1)[:,None]

Computed DNN features for the 107 validation images,each consisting of 512 floating point values.



## Export for HTML Interface

Here we package all of the data for upload to Blob Storage to interact with the model in a simple HTML interface. 

First, we export the computed reference features to ZIP file. 

In [9]:
f = open("ref_features.txt", 'w')
f.write('[')
f.writelines('],\n'.join('[' + ','.join(map(str,i)) for i in valid_features_list))
f.write(']]')
f.close()

Then we export the reference image file names to disk. Exported file names will include the parent directory name as well.  

In [10]:
f = open("ref_filenames.txt", 'w')
f.write('["')
f.writelines('",\n"'.join((i[len(DATA_PATH)+1:]).replace("/","_").replace("\\","_") for i in valid_features.keys()))
f.write('"]')
f.close()

Next we compress the exported text data into Zip files.

In [11]:
# Writing files to zipfiles, one by one 
with ZipFile('ref_features.zip','w', zipfile.ZIP_DEFLATED) as zip: 
    zip.write("ref_features.txt")
with ZipFile('ref_filenames.zip','w', zipfile.ZIP_DEFLATED) as zip: 
    zip.write("ref_filenames.txt")
    
# Remove the txt files
os.remove("ref_features.txt")
os.remove("ref_filenames.txt")

# Make subfolder to hold all HTML Demo files and a subfolder for the zip files
if not os.path.exists('../UICode'):
    os.makedirs('../UICode')

if not os.path.exists('../UICode/data'):
    os.makedirs('../UICode/data')
    
# Move the zip files to the new directory
os.replace("ref_features.zip", "../UICode/data/ref_features.zip")
os.replace("ref_filenames.zip", "../UICode/data/ref_filenames.zip")

Next, we resize the reference images to 150x150 pixel thumbnails in a new directory called 'small-150'

In [12]:
# Make subfolder to hold all HTML Demo files and a subfolder for the zip files
if not os.path.exists('../UICode/small-150'):
    os.makedirs('../UICode/small-150')

path_mr = '../UICode/small-150'

# Now resize the images to thumbnails
for root, dirs, files in os.walk(DATA_PATH):
    for file in files:
        if file.endswith(".jpg"):
            #fname = path_mr +'/' + root[len(DATA_PATH)+1:] + '_' + file
            fname = os.path.join(path_mr, root[len(DATA_PATH)+1:] + '_' + file) 
            im = PIL.Image.open(os.path.join(root, file))
            im.thumbnail(MAX_SIZE) 
            im.save(fname, 'JPEG', quality=70)