# Face Recoginition using FaceNet trained on VGGFACE2
This code is a trimmed downn version of [facenet-pytroch](https://github.com/timesler/facenet-pytorch). It also uses the pretrained weights from the facenet-pytorch project

## Getting Started
When you start doing this project for the first time, you'll have only an empty folder. Here, we'll
- setup repositories
    - setup a git repository (You'll have it automatically since you clone an existing repo)
    - setup a hangar repository
- Add model to stockroom
    - Pull our pretrained models
    - commit using stockroom
- Add data to hangar
    - commit using stockroom

### Setup repositories
Stockroom needs the hangar repository already setup and stockroom relies on git for comprehending the current version. So let's start by setting up a hangar repository in the current folder which is a git repository already. We can use cli of stockroom, `stock` to setup the repository. An example setup is here. You could use `hangar` cli to initialize the hangar repository but `stock` will make sure the hangar repository and git is properly connected

```bash
stock init --name sherin --email a@b.c
```

Once the hangar repository is initialized, run the below cell to verify the existence

In [1]:
# verify the repository existence
from pathlib import Path
import warnings

cwd = Path.cwd()
if not cwd.joinpath('.git').exists():
    warnings.warn("Git repository does not exist")
if not cwd.joinpath('.hangar').exists():
    warnigns.warn("hangar repository does not exist")

### Adding model to stockroom
Using pretrained weights is very common in the deep learning community to avoid training of the huge network from the scratch. For our face recognition model, we'll download the pretrained weights downlaoded from [facenet-pytroch](https://github.com/timesler/facenet-pytorch). Once we have the pretrained weights, we need to add it to hangar using stockroom.
The network we use has different components and weights of each of them are saved separately. Here we download all of them and then load it into the runtime using `torch.load_state_dict`

In [2]:
from utils import download

onet_url = 'https://drive.google.com/uc?export=download&id=1dcyEOAa2fc4lILKKDbMWFaBpLyA_7GOe'
pnet_url = 'https://drive.google.com/uc?export=download&id=1p-aeR9jQ4kQNrPMVMTC5l_aTVrmJZ5c9'
rnet_url = 'https://drive.google.com/uc?export=download&id=1olU2yzLX1g2wQ6sTKqzktb2Q1oyfli4t'
resnet_url = 'https://drive.google.com/uc?export=download&id=1TES47D1ZP6NGF2GFw8ZUcKL205L9q3_f'

download(onet_url, 'onet.pth', cache=True)
download(pnet_url, 'pnet.pth', cache=True)
download(rnet_url, 'rnet.pth', cache=True)
download(resnet_url, 'resnet.pth', cache=True)

File cache exists, skipping download
File cache exists, skipping download
File cache exists, skipping download
File cache exists, skipping download


#### Committing it to hangar using stockroom
Here we use stockroom's python API to add model to hangar and `stock` cli to commit it

In [3]:
import torch
from models import mtcnn
from models import resnet

In [4]:
state_dict = {}
pnet_wt = {f"pnet.{key}": val for key, val in torch.load('pnet.pth').items()}
state_dict.update(pnet_wt)
rnet_wt = {f"rnet.{key}": val for key, val in torch.load('rnet.pth').items()}
state_dict.update(rnet_wt)
onet_wt = {f"onet.{key}": val for key, val in torch.load('onet.pth').items()}
state_dict.update(onet_wt)

In [5]:
mtcnn_model = mtcnn.MTCNN()
mtcnn_model.load_state_dict(state_dict)

<All keys matched successfully>

In [None]:
from stockroom import ModelStore
ms = ModelStore('torch')
ms['mtcnn'] = mtcnn_model.state_dict()

> /home/hhsecond/mypro/stockroom/stockroom/storages/modelstore.py(26)save_torch()
-> for i, (layer, arr) in enumerate(weights.items()):


(Pdb)  layer


'pnet.conv1.weight'


(Pdb)  aset_name


'_STOCK--_mtcnn'


(Pdb)  i


0


Now that we have added data to ModelStore. Let's commit. Remember, if we need this commit to be a part of git history (i.e. we might need to come back to this stage of code and data. not just data), we need to stock commit first. This adds the relavent information to the stock file which is then needs to be commited to git. 

```bash
stock commit -m 'adding mtcnn model'
git add head.stock
git commit -m 'added mtcnn model'
```

Once the model weights are commited, we can get it back using the dictionary style access
#### Fetching the weights back

In [4]:
state_dict = ms['mtcnn']
mtcnn_model.load_state_dict(state_dict)

 * Checking out COMMIT: 528e1d249ffafc7ccde04ac8fcc97d0ff21f5dba


ValueError: not enough values to unpack (expected 3, got 2)

But ModelStorage comes with an inbuilt API to do the state_dict loading without you doing that explicitly.
#### Using `set_weights` API

In [10]:
ms.set_weights(mtcnn_model, 'mtcnn')


 Neither BRANCH or COMMIT specified.
 * Checking out writing HEAD BRANCH: master


Let's save the resnet model also to stockroom

In [6]:
from stockroom import ModelStore
import torch

ms = ModelStore('torch')
num_classes = 5  # let's assume we knew it before hand
resnet_model = resnet.InceptionResnetV1(num_classes=num_classes)
state_dict = torch.load('resnet.pth')

The saved model `state_dict` does not have the weights for the final layer. The final layer needed to initialize and fine tune based `num_classes`. What is `num_classes`? So the pretrained facenet is good at figuring out key features it needs to recognize a face. But now we need to make this network recognize few of our friends. For that we'll the teach facenet how our friends looks like, by giving it few pictures of our friends. Well, facenet needs to know how many friends does it need to meet before we start the training. The `num_classes` represents this number. Here we are creating a dummy pytorch layer get the weight of the final layer

In [7]:
dummy_linear_layer = torch.nn.Linear(512, num_classes)  # we knew 512 before hand
state_dict.update({'logits.weight': dummy_linear_layer.state_dict()['weight']})
state_dict.update({'logits.bias': dummy_linear_layer.state_dict()['bias']})

resnet_model.load_state_dict(state_dict)
state_dict = resnet_model.state_dict()

In [8]:
ms['resnet'] = state_dict

ValueError: Arrayset name provided: `__STOCKROOM--_resnet--_36--_repeat_1.0.branch0.conv.weight--_32_256_1_1` is invalid. Can only contain alpha-numeric or "." "_" "-" ascii characters (no whitespace). Must be <= 64 characters long

### Adding data to stockroom
Data addition can be done in different ways, we are taking the most obvious here - to read the image from the disk using a package like PIL and add it to stockroom's `DataStore` like we added the model to `ModelStore`.  We will need to make arraysets (arraysets is the fundamental data structure of hangar that stores tensors/arrays. Read more about arraysets [here](https://hangar-py.readthedocs.io/en/stable/concepts.html#abstraction-2-what-makes-up-a-arrayset)). Hangar has CLI defined for making arraysets. Here we make a veriably shaped image arrayset that can take images of different size and a fixed size label arrayset that stores classes/labels

```bash
hangar arrayset create images UINT8 1000 1000 3 --variable-shape
hangar arrayset create label INT64 1
hangar commit -m 'arrayset init'
```

Note: hangar already has a plugin system built in which could make data addition easy

In [1]:
from pathlib import Path
from PIL import Image
import numpy as np

raw_data = Path('raw_data')

In [2]:
from stockroom import DataStore

ds = DataStore()
i = 0
label_dict = {}
for label_folder in raw_data.iterdir():
    if label_folder.name not in label_dict:
        label_dict[label_folder.name] = len(label_dict)
    for item in label_folder.iterdir():
        arr = np.array(Image.open(item).convert('RGB'))
        ds['images', i] = arr
        ds['label', i] = np.array([label_dict[label_folder.name]])
        i += 1

We can now commit the data addition to hangar using stock
```bash
stock commit -m 'data added'
```

Ok Great! we have finished the quick "getting started" guide. You must have a hangar repository now with the models and data required for training the face recoginition algorithm stored. Let's move on to the next notebook where we actually train a neural network and see how stockroom could run along side your normal work flow while version your data, model, hyperparameters and even metrics