<a href="https://colab.research.google.com/github/sayan0506/Deep-Learning-CV-Hackathon/blob/main/Face_Verification_Based_Attendance_System_using_Arcface_Transfer_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Face Verification using Arcface, Siamese network training using triplet loss**

## **Import Dependencies**

In [1]:
import zipfile
import os

import torch

# **Environment Setup**

#### **Check GPU utilization**

In [2]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print('Running on device: {}'.format(device))
device.

Running on device: cuda:0


In [18]:
print(f'Device info\n{torch.cuda.get_device_properties(0)}')

Device info
_CudaDeviceProperties(name='Tesla T4', major=7, minor=5, total_memory=15109MB, multi_processor_count=40)


## **Drive mount**

In [3]:
# drive mount
from google.colab import drive

drive.mount('/content/drive')

Mounted at /content/drive


## **Data Load**

#### Unzipping

Defining the unzipping function

In [5]:
# unzip function
def unzip_file(src, dst):
  # create a destination folder if not exists
  if not os.path.isdir(dst):
    os.mkdir(dst)
  print(f'The unzipped files will be stored to "{dst}" destination folder')

  with zipfile.ZipFile(src, 'r') as zip_ref:
    zip_ref.extractall(dst)

**Drive link: [face_train_set.zip](https://drive.google.com/file/d/1r5QjkBxspHILq1Bc_xpr7apbGWx1cmzy/view?usp=sharing)**

In [6]:
# zip path(in drive) containing individual class folders
zip_filepath = '/content/drive/MyDrive/face_trainset/face_train_set.zip'

# destination face trainset folder
dst_path = 'face_trainset'

**Unzipping**

In [7]:
unzip_file(zip_filepath, dst_path)

The unzipped files will be stored to "face_trainset" destination folder


## **Data Analysis**

Note: It's checked that, there is no .dstore or other files in the dataset folder, so each of the folder corresponds to uniquee identity, thus total list of directories equals to total identities.

In [19]:
face_dataset_path = "/content/face_trainset/face_train_set"

#### Fetching image info corresponding to individual classes

Fetching image info to image dictionary. Key corresponds to individual class, and value for that class will be list of all the image ids

In [20]:
# image dictionary
img_dict = {}

# image counts
img_count = 0

for identity in os.listdir(face_dataset_path):
  path = os.path.join(face_dataset_path, identity)
  img_dict[identity] = os.listdir(path)
  img_count+= len(os.listdir(path))

print(f'Image dictionary\n{img_dict}')

Image dictionary
{'0013_0002037': ['0000011.jpg', '0000006.jpg', '0000002.jpg', '0000008.jpg', '0000005.jpg', '0013_0002037_script_2.jpg', '0013_0002037_script.jpg'], '0013_0002431': ['0013_0002431_script.jpg', '0000007.jpg', '0000006.jpg', '0000010.jpg', '0000008.jpg'], '0007_0000692': ['0000011.jpg', '0000006.jpg', '0000003.jpg', '0000002.jpg', '0007_0000692_script.jpg', '0000010.jpg', '0000008.jpg', '0000005.jpg', '0000000.jpg'], '0003_0000353': ['0000011.jpg', '0000007.jpg', '0003_0000353_script.jpg', '0000008.jpg'], '0007_0001050': ['0000011.jpg', '0000003.jpg', '0000001.jpg', '0000009.jpg', '0000010.jpg', '0007_0001050_script.jpg', '0000008.jpg', '0000005.jpg', '0000000.jpg'], '0012_0001660': ['0000007.jpg', '0000006.jpg', '0000004.jpg', '0000010.jpg', '0012_0001660_script.jpg', '0000005.jpg', '0000000.jpg'], '0007_0000980': ['0000011.jpg', '0000007.jpg', '0000006.jpg', '0000001.jpg', '0000002.jpg', '0000009.jpg', '0000010.jpg', '0000012.jpg', '0000008.jpg', '0000005.jpg', '0007_

In [21]:
# classifiers list
classifiers = img_dict.keys()

print(f'Total {img_count} images available corresponds to {len(img_dict.keys())} identities')

Total 4419 images available corresponds to 1012 identities


**To prepare the dataset, we need to first extract the faces to prepare face dataset for the identities, which needs an efficient face-detector.**

## **Face Extraction Test check**

For face detection in the images we are using pre-trained Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks.

[Reference paper](https://arxiv.org/abs/1604.02878) 

Git clone facenet_pytorch, which consists MTCNN face detector

[Git Repo](https://github.com/timesler/facenet-pytorch)

In [22]:
!git clone https://github.com/timesler/facenet-pytorch.git

Cloning into 'facenet-pytorch'...
remote: Enumerating objects: 1264, done.[K
remote: Counting objects: 100% (29/29), done.[K
remote: Compressing objects: 100% (27/27), done.[K
remote: Total 1264 (delta 11), reused 7 (delta 2), pack-reused 1235[K
Receiving objects: 100% (1264/1264), 22.89 MiB | 24.83 MiB/s, done.
Resolving deltas: 100% (613/613), done.


**Rename folder to "facenet_pytorch", as python module import does not accept module name which consists of "-" character.**

In [24]:
os.rename('facenet-pytorch','facenet_pytorch')

#### Import Dependencies for MTCNN face detector

In [25]:
from facenet_pytorch.models.mtcnn import MTCNN

#### **Define MTCNN module**

Default params shown for illustration, but not needed. Note that, since MTCNN is a collection of neural nets and other code, the device must be passed in the following way to enable copying of objects when needed internally.

See help(MTCNN) for more details.

In [27]:
# face detector object
mtcnn = MTCNN(
    image_size=160, margin=0, min_face_size=20,
    thresholds=[0.6, 0.7, 0.7], factor=0.709, post_process=True,
    device=device
)