## Copy Move Forgery Detection(DBSCAN Clustering)

This is an implementation to detect Copy move forgery detection using DBSCAN clustering using OpenCV and sklearn. This technique can be used to detect a forgery in many cases. However, the detection accuracy is not the best and can be improved further by other techniques.

In [3]:
print("CoMoFoD using DBSCAN Clustering")

CoMoFoD using DBSCAN Clustering


Installing OpenCV library for processing images

In [6]:
!pip install opencv-python

Collecting opencv-python
  Downloading opencv_python-4.5.2.54-cp39-cp39-win_amd64.whl (34.7 MB)
Installing collected packages: opencv-python
Successfully installed opencv-python-4.5.2.54



Installing an older version of opencv to use cv2.xfeatures2d.SIFT_create() 

SIFT features extraction is moved from OpenCV as SIFT is patented. Downgraded OpenCV to use SIFT function in opencv.

In [9]:
!pip install opencv-contrib-python==3.4.11.45



Importing libraries

In [13]:
import os
import cv2
import matplotlib.pyplot as plt
import re
from sklearn.cluster import DBSCAN  # For DBSCAN
import numpy as np
%matplotlib inline

Creating an array of the paths of the images in the dataset folder (MICC-F220)

In [14]:
image_paths=[] #List to store path of all images

for dirname, _, filenames in os.walk('./MICC-F220'):
    for filename in filenames:
        if '.txt' in filename:
            continue
        image_paths.append(os.path.join(dirname, filename))

Dividing the dataset into original and tampered sets

In [15]:
original_images=[]
tampered_images=[]

for path in image_paths:
    
    if 'tamp' in path:              # As Observed from the above list tampered images name has tamp
        tampered_images.append(path)
    else:
        original_images.append(path)
tampered_images.sort()
original_images.sort()
print(len(original_images),len(tampered_images))

110 110


### Helper Functions

* **plot_image(img,size=(30,30))**: For plotting image.
* **siftDetector(img)**: For extracting SIFT features and descriptors.
* **get_original(tampered)**: Give the index of original image with the original image from the name of tampered image if present otherwise return -1.
* **show_sift_features(color_img, kp,size=(30,30),flag=None)**:  It marks the extracted features on the image and parameters are the image itself, kp are the SIFT keypoints and.

In [None]:
def plot_image(img,size=(8,8)):
    plt.figure(figsize = size)
    plt.imshow(cv2.cvtColor(img,cv2.COLOR_BGR2RGB)) #Since opencv store images as BGR

def siftDetector(img):
    sift = cv2.xfeatures2d.SIFT_create()
    gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) 
    key_points, descriptors = sift.detectAndCompute(gray, None)
    return key_points,descriptors

def get_original(tampered):
    name=re.findall(r'.*/(.*)tamp.*',tampered)
    original_index=-1
    if len(name)<1:
        return -1
    for index,names in enumerate(original_images):
        if name[0] in names:
            original_index=index
            break
            
    if original_index==-1:
        return original_index,-1
    else:
        image=cv2.imread(original_images[original_index])
        return image,original_index

def show_sift_features(color_img, kp,size=(8,8)):
    gray_img=cv2.cvtColor(color_img,cv2.COLOR_BGR2GRAY)
    plt.figure(figsize = size)
    plt.imshow(cv2.drawKeypoints(gray_img, kp, color_img.copy()))

aaaa

In [None]:
tampered1=cv2.imread(tampered_images[0])
plot_image(tampered1)

aaaa

In [None]:
original1 , index=get_original(tampered_images[0])
if index!=-1:
    plot_image(original1)

## DBSCAN Clustering
It is a density-based clustering non-parametric algorithm: given a set of points in some space, it groups together points that are closely packed together (points with many nearby neighbors), marking as outliers points that lie alone in low-density regions (whose nearest neighbors are too far away).
The DBSCAN algorithm basically requires 2 parameters:
* **eps:** specifies how close points should be to each other to be considered a part of a cluster. It means that if the distance between two points is lower or equal to this value (eps), these points are considered neighbors.
* **minPoints:** the minimum number of points to form a dense region. For example, if we set the minPoints parameter as 5, then we need at least 5 points to form a dense region.

Now, I have defined two functions for making clusters and detection of forgery using those clusters.
* make_clusters(de,eps,min_sample): This function will perform DBSCAN clustering and the role of parameters (eps,min_sample) is described above, it take another parameter which is de which is basically SIFT descriptor of the image.
* locate_forgery(img,clustering,kps): This function takes the image clusters made and the SIFT keypoints to mark the forgery on the image. It does so by drawing lines between points classified into the same clusters.


In [None]:
def make_clusters(de,eps=40,min_sample=2):
    clustering = DBSCAN(eps=eps, min_samples=min_sample).fit(de)
    return clustering

def locate_forgery(img,clustering,kps):
    forgery=img.copy()
    clusters = [[] for i in range(np.unique(clustering.labels_).shape[0]-1)]
    for idx in range(len(kps)):
        if clustering.labels_[idx]!=-1:
            clusters[clustering.labels_[idx]].append((int(kps[idx].pt[0]),int(kps[idx].pt[1])))
    for points in clusters:
        if len(points)>1:
            for idx1 in range(len(points)):
                for idx2 in range(idx1+1,len(points)):
                    cv2.line(forgery,points[idx2],points[idx1],(255,0,0),5)
    plot_image(forgery)

Let's check if we are able to detect forgery.

In [None]:
#Firs let us extract SIFT features
key_points,descriptors=siftDetector(tampered1)
show_sift_features(tampered1,key_points)

In [None]:
Make cluster and locate forgery

In [None]:
#Now Let's make clusters and locate forgery

clusters=make_clusters(descriptors)
locate_forgery(tampered1,clusters,key_points)

## Why this approach works?
This approach works well because of the scale invariance nature of SIFT and the clustering power of the DBSCAN algorithm.

Since the extracted features of the forged region should be similar to that of the original region so it is highly probable that they will be very close in the feature space and thus will be clustered together and this is the idea behind the above implementation.


### Some other examples and the role of eps

In [None]:
tampered=cv2.imread(tampered_images[20])
key_points,descriptors=siftDetector(tampered)
clusters=make_clusters(descriptors)
locate_forgery(tampered,clusters,key_points)

# Change Eps parameter to mark more/less features
clusters=make_clusters(descriptors,eps=80)
locate_forgery(tampered,clusters,key_points)

In [None]:
tampered=cv2.imread(tampered_images[50])
key_points,descriptors=siftDetector(tampered)
clusters=make_clusters(descriptors)
locate_forgery(tampered,clusters,key_points)

# Change Eps parameter to mark more/less features
clusters=make_clusters(descriptors,eps=80)
locate_forgery(tampered,clusters,key_points)