# Computing similarity measures between two images

In [4]:
import matplotlib.pyplot as plt
import numpy as np
from similarity.similarity import Measures

The state or fact of being similar or Similarity measures how much two objects are alike. Similarity measure in a data mining context is a distance with dimensions representing features of the objects. If distance is small, two objects are very similar where as if distance is large we will observe low degree of similarity.

There are lot of similarity measures. But here we will look into 2 most important measures

## 1. Cosine similarity measure

Cosine similarity is a metric used to measure how similar the documents are irrespective of their size. Mathematically, it measures the cosine of the angle between two vectors projected in a multi-dimensional space. The cosine similarity is advantageous because even if the two similar documents are far apart by the Euclidean distance (due to the size of the document), chances are they may still be oriented closer together. The smaller the angle, higher the cosine similarity.

![houses](images/cosine_similarity.png)

$$\operatorname{Cos} \theta=\frac{\vec{a} \cdot \vec{b}}{\|\vec{a}\|\|\vec{b}\|}=\frac{\sum_{1}^{n} a_{i} b_{i}}{\sqrt{\sum_{1}^{n} a_{i}^{2}} \sqrt{\sum_{1}^{n} b_{i}^{2}}}$$

In [5]:
# Read images
img_1 = plt.imread('images/image1_.jpg')
img_2 = plt.imread('images/image2_.jpg')

In [6]:
c = Measures.cosine([0, 1], [1, 1])

In [7]:
c

0.7071067811865475

In [8]:
# Let's use the built-in function in scikit-learn to compute the cosine similarity
from sklearn.metrics.pairwise import cosine_similarity

In [9]:
cosine_similarity([[0, 1]], [[1, 1]])

array([[0.70710678]])

Indeed, We got the same result !

## 2. Jaccard similarity measure

In [10]:
img_1 = plt.imread('images/image1_.jpg')
img_2 = plt.imread('images/image2_.jpg')

In [11]:
Measures.jaccard(img_1, img_2)

0.1423841059602649

In [12]:
# Using built-in function on scikit-learn
from sklearn.metrics import jaccard_score

In [13]:
# convert these images into boolean arrys first
from utils.tr_color_2_bi_img import tr_color_2_bi_img
bi_img_1 = tr_color_2_bi_img(img_1, 175).flatten()
bi_img_2 = tr_color_2_bi_img(img_2, 175).flatten()

jaccard_score(bi_img_1, bi_img_2)

0.1423841059602649

In [14]:
# We got the same result.

In [15]:
# jaccard similarity between arrays of boolearn
Measures.jaccard([[1,2,3], [1,2,3]], [[1,3,6], [1,2,3]], as_binary=False)

0.6470588235294118