# Determine whether 2 images are equal or not

The pHash (perceptual hash) is a generated hash that is produced by a special algorithm.

<img src='https://miro.medium.com/max/1440/1*_RT4M10OFKqmA8XYavEN3Q.png' style='width:300px;' />


Refer: [phash.org](https://www.phash.org/)

**This hash is a fingerprint, which can be used to compare images by calculating the Hamming distance.**

<img src='https://lh3.googleusercontent.com/proxy/uBDHlo_Itx9EoM97qU2ifhWMH4ovCBSAHk4d8O98ry_2-R8YuBNDSgrz0Qvt2_0oHuSRdRe98Wr2tfN77el0CopNW8WxDbnzWqj6aRqe3f-QEZPjdTmNuXB-6Yg' style='width:300px;' />

That is the number of different individual bits. 
<br/>
If you use another hash technique to compare images, you will get a smaller change in the image.

* average hashing (aHash)
* perception hashing (pHash)
* difference hashing (dHash)
* wavelet hashing (wHash)

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
from PIL import Image
import imagehash

In [None]:
train = pd.read_csv('/kaggle/input/shopee-product-matching/train.csv')
train.head()


### [average hashing (aHash)](http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html)
average hash, for each of the pixels output 1 if the pixel is bigger or equal to the average and 0 otherwise.

In [None]:
path  = '/kaggle/input/shopee-product-matching/train_images/'
src = path+train.iloc[0].image
train_hash = train.iloc[0].image_phash

average_hash = imagehash.average_hash(Image.open(src))

print('train_hash:\t',train_hash)
print('average_hash:\t',average_hash)

### perception hashing ([pHash](https://fullstackml.com/wavelet-image-hash-in-python-3504fdd282b5))
perceptive hash, does the same as aHash, but first it does a Discrete Cosine Transformation and works in the frequency domain.


In [None]:
p_hash = imagehash.phash(Image.open(src))
print('phash:\t\t',p_hash)
print('train_hash:\t',train_hash)
print('average_hash:\t',average_hash)

### difference hashing (dHash)
gradient hash, calculate the difference for each of the pixel and compares the difference with the average differences.
![](https://miro.medium.com/fit/c/184/184/1*8oFOoawtCHeglebMnN9ryA.png)

In [None]:
d_hash = imagehash.dhash(Image.open(src))
print('dhash:\t\t',d_hash)
print('phash:\t\t',p_hash)
print('train_hash:\t',train_hash)
print('average_hash:\t',average_hash)

### wavelet hashing (wHash)
wavelet hashing, that I added to the library a couple days back. It works in the frequency domain as pHash but it uses DWT instead of DCT.
![](https://www.researchgate.net/profile/Mariusz_Jakubowski/publication/224067859/figure/fig1/AS:375162487754761@1466457115717/Random-tiling-of-Lenas-coarse-subband-in-a-threelevel-wavelet-decomposition-using-Haar.png/)
refer: https://fullstackml.com/wavelet-image-hash-in-python-3504fdd282b5

In [None]:
w_hash = imagehash.whash(Image.open(src))
print('whash:\t\t',w_hash)
print('dhash:\t\t',d_hash)
print('phash:\t\t',p_hash)
print('train_hash:\t',train_hash)
print('average_hash:\t',average_hash)