## Image Hashing?

- It is called **perceptual hasing**
- Examining the contest of an image
- Consturcting a hash value that uniquely identifies an input image based on the contents of an image
- many kinds of hashing, such as aHash, pHash, dHash

## Perceptual Image Hashing / Difference Hashing(dHash)
1. Convert to grayscale (reduce scale)
    - hash the image faster (examine one channel)
    - match images that are identical but have slightly different color spaces
    
    
2. Resize (Reduce size)
    - ignore the ratio of the image (ignore the aspect of ratio)
    - resize it like 9X8, 5X7, 10X11 ... (different length)    


3. Compute the difference
    - different length will differently hash the image
    - 9x8, 13x15, 14x17 ...
    
    
4. Build the hash
    - assign bits and build the resulting hash
    - Given a difference image D and corresponding set of pixels P, we apply the following test: P[x] > P[x + 1] = 1 else 0.
    

### Benifits of dHash
- image hash won’t change if the aspect ratio of our input image changes (since we ignore the aspect ratio).
- adjusting brightness or contrast will either (1) not change hash value or (2) only change it slightly, ensuring that the hashes will lie close together.
- dufference hashing extremely fast

### Comparing difference hashes
- using [Hamming distance](https://en.wikipedia.org/wiki/Hamming_distance) to compare hashes

## Hashing with OpenCV + python

In [7]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import cv2

import os
import time
from hashlib import md5
import scipy

In [8]:
IMAGE_DIR = '../../../Data/Liver_png/1859/masked'

os.chdir(IMAGE_DIR)
os.getcwd()

'/home/jaewon/MyWork/Data/Liver_png/1859/masked'

In [9]:
image_files = os.listdir()
print(len(image_files))

13


In [10]:
image_files[0]

'1859-0007_masked.png'

In [11]:
imread(image_files[0]).shape

(600, 800, 3)

### Helper Functions

In [12]:
def filter_images(images):
    image_list = []
    for image in images:
        try:
            assert cv2.imread(image).shape[2] == 3
            image_list.append(image)
        except  AssertionError as e:
            print(e)
    return image_list

# change it to gray scale image
def img_gray(image):
    image = cv2.imread(image)
    return np.average(image, weights=[0.299, 0.587, 0.114], axis=2)

#resize image and flatten
def resize(image, height=30, width=30):
    row_res = cv2.resize(image,(height, width), interpolation = cv2.INTER_AREA).flatten()
    col_res = cv2.resize(image,(height, width), interpolation = cv2.INTER_AREA).flatten('F')
    return row_res, col_res

#gradient direction based on intensity 
def intensity_diff(row_res, col_res):
    difference_row = np.diff(row_res)
    difference_col = np.diff(col_res)
    difference_row = difference_row > 0
    difference_col = difference_col > 0
    return np.vstack((difference_row, difference_col)).flatten()
    #return difference_row
    #return np.vstack((difference_row, difference_col)) #str method
    
def file_hash(array):
    return md5(array).hexdigest()

def difference_score(image, height = 30, width = 30):
    gray = img_gray(image)
    row_res, col_res = resize(gray, height, width)
    difference = intensity_diff(row_res, col_res)
    
    return difference

def difference_score_dict_hash(image_list):
    ds_dict = {}
    duplicates = []
    hash_ds = []
    for image in image_list:
        ds = difference_score(image)
        hash_ds.append(ds)
        filehash = md5(ds).hexdigest()
        if filehash not in ds_dict:
            ds_dict[filehash] = image
        else:
            duplicates.append((image, ds_dict[filehash]) )
    
    return  duplicates, ds_dict, hash_ds

In [13]:
image_files = filter_images(image_files)
duplicates, ds_dict, hash_ds =difference_score_dict_hash(image_files)

In [14]:
len(duplicates)

0

In [15]:
for file_names in duplicates[:30]:
    try:
    
        plt.subplot(121),plt.imshow(imread(file_names[0]))
        plt.title('Duplicate'), plt.xticks([]), plt.yticks([])

        plt.subplot(122),plt.imshow(imread(file_names[1]))
        plt.title('Original'), plt.xticks([]), plt.yticks([])
        plt.show()
    
    except OSError as e:
        continue

In [21]:
import scipy.spatial

def hamming_distance(image, image2):
    score =scipy.spatial.distance.hamming(image, image2)
    return score


#Hamming
def difference_score_dict(image_list):
    ds_dict = {}
    duplicates = []
    for image in image_list:
        ds = difference_score(image)
        
        if image not in ds_dict:
            ds_dict[image] = ds
        else:
            duplicates.append((image, ds_dict[image]) )
    
    return  duplicates, ds_dict

In [22]:
image_files = filter_images(image_files)
duplicates, ds_dict =difference_score_dict(image_files)

In [23]:
len(duplicates)

0

In [24]:
len(ds_dict.keys())

13

In [25]:
import itertools
for k1,k2 in itertools.combinations(ds_dict, 2):
    if hamming_distance(ds_dict[k1], ds_dict[k2])< .10:
        duplicates.append((k1,k2))

In [26]:
len(duplicates)

1

In [35]:
duplicates

[('1859-0002_masked.png', '1859-0013_masked.png')]

In [29]:
for file_names in duplicates[:1]:
    try:
    
        plt.subplot(121), plt.imshow(imread(file_names[0]))
        plt.title('Duplicate'), plt.xticks([]), plt.yticks([])

        plt.subplot(122), plt.imshow(imread(file_names[1]))
        plt.title('Original'), plt.xticks([]), plt.yticks([])
        plt.show()
    
    except OSError as e:
        continue

AttributeError: module 'PIL.TiffTags' has no attribute 'IFD'

<Figure size 432x288 with 2 Axes>

Source: \
- https://www.pyimagesearch.com/2017/11/27/image-hashing-opencv-python/
- http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html
- https://www.pyimagesearch.com/2019/08/26/building-an-image-hashing-search-engine-with-vp-trees-and-opencv/
- https://github.com/moondra2017/Computer-Vision/blob/master/DHASH%20AND%20HAMMING%20DISTANCE_Lesson.ipynb