## Creating a Simple Image Search Script

Download the Oxford-17 flowers image data set, available at this link:

https://www.robots.ox.ac.uk/~vgg/data/flowers/17/

Choose one image in your data that you want to be the 'target image'. Write a Python script or Notebook which does the following:

1. Use the cv2.compareHist() function to compare the 3D color histogram for your target image to each of the other images in the corpus one-by-one.
2. In particular, use chi-square distance method, like we used in class. Round this number to 2 decimal places.
3. Save the results from this comparison as a single .csv file, showing the distance between your target image and each of the other images. The .csv file should show the filename for every image in your data except the target and the distance metric between that image and your target. Call your columns: filename, distance.


__General instructions__

For this exercise, you can upload either a standalone script OR a Jupyter Notebook
Save your script as image_search.py OR image_search.ipynb
If you have external dependencies, you must include a requirements.txt
You can either upload the script here or push to GitHub and include a link - or both!
Your code should be clearly documented in a way that allows others to easily follow along.
Similarly, remember to use descriptive variable names! A name like hist is more readable than h.
The filenames of the saved images should clearly relate to the original image


__Purpose__

This assignment is designed to test that you have a understanding of:

1. how to make extract features from images based on colour space;
2. how to compare images for similarity based on their colour histogram;
3. how to combine these skills to create an image 'search engine'

__Load Libraries__

In [9]:
import os
import sys
sys.path.append(os.path.join(".."))
import cv2
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import pandas as pd
import argparse

__Define function__

In [23]:
def main():
    
    # argparse 
    ap = argparse.ArgumentParser()
    # parameters
    ap.add_argument("-p", "--path", required = True, help= "Path to directory of images")
    ap.add_argument("-t", "--target_image", required = True, help= "Filename of the target image")
    # parse arguments
    args = vars(ap.parse_args())
    
    # get path to image directory
    image_directory = args["path"]
    # get name of the target image
    target_name = args["target"]
    
    # empty dataframe to save data
    data = pd.DataFrame(columns=["filename", "distance"])
    
    # read target image
    target_image = cv2.imread(os.path.join(image_directory, target_name))
    # create histogram for all 3 color channels
    target_hist = cv2.calcHist([target_image], [0,1,2], None, [8,8,8], [0,256, 0,256, 0,256])
    # normalise the histogram
    target_hist_norm = cv2.normalize(target_hist, target_hist, 0,255, cv2.NORM_MINMAX)
    
    # for each image (ending with .jpg) in the directory
    for image_path in Path(image_directory).glob("*.jpg"):
        # only get the image name
        _, image = os.path.split(image_path)
        # if not the target image
        if image != target_name:
            # read the image and save as comparison image
            comparison_image = cv2.imread(os.path.join(image_directory, image))
            # create histogram for comparison iamge
            comparison_hist = cv2.calcHist([comparison_image], [0,1,2], None, [8,8,8], [0,256, 0,256, 0,256])
            # normalising the comparison image histogram
            comparison_hist_norm = cv2.normalize(comparison_hist, comparison_hist, 0,255, cv2.NORM_MINMAX)    
            # calculate the chisquare distance
            distance = round(cv2.compareHist(target_hist_norm, comparison_hist_norm, cv2.HISTCMP_CHISQR), 2)
            # append info to dataframe
            data = data.append({"filename": image, 
                                "distance": distance}, ignore_index = True)
    
    # save as csv in current directory
    data.to_csv(f"{target_name}_comparison.csv")
    # print that file has been saved
    print(f"output file is saved in current directory as {target_name}_comparison.csv")