# Project: Visual Embedding Chapter
## Fashion - select visually similar apparels 
## This notebook is for illustrating how to train a model using triplet-metric-learning

## Ingredients:
### Machine configurations
* A machine with a GPU supporting CUDA 9.2+
* Tensorflow 1.4
* Python 3
An alternative to build above environment is to directly use a docker with relevant infomation. You can download from here - [TF Docker](https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/rel_19.07.html#rel_19.07).

### Dataset
* Download fasihon dataset -  [In-Shop Clothes Retrieval Benchmark](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html)

### Preprocessing the dataset
In order to be able to use the training code, we will generate list of images (train, test, query) into csv file - then we will be able to feed in to the training code seamlessly.

*REQUIREMENT* The dataset (image_location) and (class_name) needs to be put in a csv format. A sample two row of a csv file would look like:

`train_images, 12`

`train_images, 33`

So there will be a need of three output files, one for each of - `train_set`, `test_set` or `query_set`.

The folder `preprocessors/` hasn an utility function for obtaining the csv from the downloaded dataset. 
Fashion - `Fashion_convert2defense_triplet_format.py` . Please replace the variable `split_file` appropriately from the downloaded dataset.

### Clone the infrastructure code

Location - [Github](https://github.com/VisualComputingInstitute/triplet-reid/tree/sampling)
*Note* that we are cloning the `sampling` branch which has three sampling (mining variants):
1. Batch All (BA)
2. Batch Hard (BH)
3. Batch Sample (BS)

In order to freeze the repo status used for the book results, you can refer to the fork here - [Link](https://github.com/ratnesh1729/triplet-reid/tree/sampling)

### Training Hyper-parameters 

We have the following parameters need to be set for training:
1. Network model: Choose a base architecture: Above code supplies options for Resnet-50, Resnet-101, MobileNet_v1. 
2. Pre-trained: Whether we need pre-trained model of above - Generally the answer is `yes`. 
3. Data augmentation: We could choose to randomly flip and crop images. 
4. Embedding dimension: Any feasible number. 
5. Batch Size: Parameterized by `P, K` , corresponding to `P` number of classes to choose and `K` samples from each class. So total batch size is `P*K`.
6. Crop initial images - If crop augmentation is used, we would need to supply initial crop width and height.
7. Network input size - Imagenet is generally trained with `224x224` and apparels are generally isotropic in dimensions, so we should stick to that. (Notice for training for person re-id, this assumption is not generally applicable as a person's `height > width`).
8. Mining variant: BS, BH, BA
9. Learning rate: Generally a low setting if we're utilizing pre-trained network.
10. Learning rate decay: Number of iterations before dropping the learning rate. 
11. Metric to compare: Choices are `square_euclidean` or `euclidean`. In practicle `euclidean` seems to work better (also on this dataset).
12. Margin: We will use the `softplus` option. Other possiblity would be to use `hard margin`, by supplying a float parameter.

### Training configuration
The following bash script could be directly use for training on fashion dataset.

*The paths should be set appropriately* - Train csv file, Output location, Input pre--trained model, Image file location

In [None]:
#!/bin/sh
####
#### This file calls train.py with all hyperparameters as for the triplet metric learning experiment on In--Store Shopping Retrieval Project.


#### Shift the arguments so that we can just forward the remainder.
IMAGE_ROOT=/datasets/fashion/
EXP_ROOT=/training_output/

CUDA_VISIBLE_DEVICES=0 python train.py \
    --train_set /fashion/codes_preprocessing/in_shop_defense_triplet_loss_format_TRAIN.csv \
    --model_name mobilenet_v1_1_224 \
    --image_root $IMAGE_ROOT \
    --initial_checkpoint /home/Downloads/mobilenet_v1_1.0_224/mobilenet_v1_1.0_224.ckpt\
    --experiment_root $EXP_ROOT \
    --flip_augment \
    --embedding_dim 128 \
    --batch_p 18 \
    --batch_k 4 \
    --net_input_height 224 --net_input_width 224 \
    --margin soft \
    --metric euclidean \
    --loss batch_all \
    --learning_rate 3e-4 \
    --train_iterations 100000 \
    --head_name direct \
    --decay_start_iteration 25000\
    "$@"

#### Checking the trainig progress:
As demonstrated by the authors of above could, we could use `tensorboard` to visualize the output. 
Here is an output of healthy run - healthy-run-sample


![title](healthy-run.png)

### Testing a trained model, Qualitatively

We could visualize the embeddings by showing them as following. First image in each row is the `query` image, while rest are `top-k` retrievals.

![title](viz/sample_visuals/000184.png)
![title](viz/sample_visuals/004032.png)
![title](viz/sample_visuals/007947.png)

##### Utility for above code
* `python viz/viz_retrievls.py ---h` . You would need Spotify's approximate nearest neighbor library - `pip install annoy`
* Sample command: 
`python viz_retrievals.py --img /datasets/fashion/ --query_csv /datasets/fashion/codes_preprocessing/in_shop_defense_triplet_loss_format_QUERY.csv --query_h5 trained_model_folder/in_shop_defense_triplet_loss_format_QUERY_embeddings.h5 --gallery_csv /datasets/fashion/codes_preprocessing/in_shop_defense_triplet_loss_format_GALLERY.csv --gallery_h5 trained_model_folder/in_shop_defense_triplet_loss_format_GALLERY_embeddings.h5 --k 5 --output top_5_viz_results`

*Notice* that the above code uses Soptify's Annoy library (Approximate nearest neighbors) for efficient retrievals.


### Testing a trained model, Quantitatively

1. Step 1: Generate embeddings (stored in `.h5`) file using `embed.py` for both `Query` and `Test` set.

A sample for query - `python embed.py --experiment_root /train_output --dataset in_shop_defense_triplet_loss_format_QUERY.csv --image_root /datasets/fashion/ --checkpoint checkpoint-100000`

A sample for test - `python embed.py --experiment_root /train_output --dataset in_shop_defense_triplet_loss_format_GALLERY.csv --image_root /datasets/fashion/ --checkpoint checkpoint-100000`
*Note* By default the `.h5` files are stored in the `training-output` directory.

2. Step 2: Evaluate these embeddings using `evaluate.py`. This will geneerate the `top-k` and `mAP` on the terminal.

A sample - 
` python evaluate.py --excluder diagonal --query_dataset in_shop_defense_triplet_loss_format_QUERY.csv --query_embeddings ./train_output/in_shop_defense_triplet_loss_format_QUERY_embeddings.h5 --gallery_dataset in_shop_defense_triplet_loss_format_GALLERY.csv --gallery_embeddings /train_output/in_shop_defense_triplet_loss_format_GALLERY_embeddings.h5 --metric euclidean`

A sample output would look like: 
`mAP: 72.40% | top-1: 86.40% top-2: 91.22% | top-5: 95.43% | top-10:96.85% | top-20: 97.83%`