# Search Engine

CREDS : I hereby thanks pyimagesearch for this awesome tutorial which helped me build this from his blog https://www.pyimagesearch.com/2014/01/27/hobbits-and-histograms-a-how-to-guide-to-building-your-first-image-search-engine-in-python/


### The 4 Steps to Building an Image Search Engine
On the most basic level, there are four steps to building an image search engine:

1. Define your descriptor: What type of descriptor are you going to use? Are you describing color? Texture? Shape?
2. Index your dataset: Apply your descriptor to each image in your dataset, extracting a set of features.
3. Define your similarity metric: How are you going to define how “similar” two images are? You’ll likely be using some sort of distance metric. Common choices include Euclidean, Cityblock (Manhattan), Cosine, and chi-squared to name a few.
4. Searching: To perform a search, apply your descriptor to your query image, and then ask your distance metric to rank how similar your images are in your index to your query images. Sort your results via similarity and then examine them.


In [5]:
import numpy as np
import matplotlib.pyplot as plt
import imutils
import cv2
import os
from imutils.paths import list_images
import pickle

In [11]:
args = {
    "dataset" : "dataset/images/",
    "index" : "index.cpickle" 
}

In [7]:
class RGBHistogram : 
    
    def __init__ (self,bins) :
        self.bins = bins
        
    def describe(self, image):
        # 3d histogram and processing it
        hist = cv2.calcHist([image], [0,1,2], None, self.bins, [0,256,0,256,0,256])
        # Normalizing it
        hist = cv2.normalize(hist, hist)
        return hist.flatten()
        

In [9]:
# Step 2 : Indexing our Dataset
# initialize the index dictionary to store our our quantifed
# images, with the 'key' of the dictionary being the image
# filename and the 'value' our computed features
index = {}
desc = RGBHistogram([8, 8, 8])

In [10]:
for imagepath in list_images(args["dataset"]):
#     Extracting image id
    k = imagepath[imagepath.rfind("/") + 1 :]
    image = cv2.imread(imagepath)
    features = desc.describe(image)
    index[k] = features
    

In [14]:
f = open(args["index"], "wb")
f.write(pickle.dumps(index))
f.close()
print("Done Indexing {} Images ".format(len(index)))

Done Indexing 25 Images 


In [15]:
# Step 3 : Searching
# Search on basis of similarities
class Searcher:
    
    def __init__(self, index):
        self.index = index
    
    def search(self, queryFeatures):
        results = {}
        for (k,features) in self.index.items():
            #We will use chi2Square Distance to compare similarity
            d = self.chi2distance(features, queryFeatures)
            results[k] = d
        results = sorted([(v,k) for (k,v) in results.items()])
        return results
    
    def chi2distance(self, histA, histB, eps = 1e-10):
        d = 0.5 * np.sum([((a-b) ** 2) / (a+b+eps) for (a,b) in zip(histA,histBs)])
        return d        