# Project: Visual Embedding Chapter
## Fashion - select visually similar apparels 
### This notebook is for illustrating how to train a model using triplet-metric-learning

## High level summary of steps
1. Download the dataset.
2. Use the pre-processor script to generate .csv files containing list of images and their identities. These csv will be used for training and testing.
3. Train the network. (Download the pre-trained tensorflow imagenet weights)
4. Test the output network - both qualitatively and quantitatively.

## Ingredients:
### Machine configuration
* A machine with a GPU supporting CUDA 9.2+
* Ubuntu 16.04 or later
* Tensorflow 1.4+
* Python 3
An alternative to build above environment is to directly use a docker with relevant infomation. You can download from here - [TF Docker](https://docs.nvidia.com/deeplearning/frameworks/tensorflow-release-notes/rel_19.07.html#rel_19.07).
* For qualitative testing scripts - Opencv and Annoy (for nearest neighbors) - `pip install annoy` and `pip install opencv-python`

#### In the following we used python `virtualenv` to create an environment with above pre-prequisite configuration.
Please execute the following on a linux terminal.

1. `python3 -m pip install --upgrade pip`
2. `pip3 install virtualenv`
3. `mkdir ~/venv`
4. `virtualenv -p /usr/bin/python3.5 ~/venv/tf_1.9.0_cuda9.2`
5. Install tensorflow - `pip install tensorflow_gpu==1.9.0`
6. Optionally install visualization stuffs - `pip install annoy` and `pip install opencv-python`

**Below we use Nvidia Cuda 9.2 on GTX 2080Ti.**

### Steps to download above dataset
1. Visit the link to download on item 2 [In-Shop Clothes Retrieval Benchmark](http://mmlab.ie.cuhk.edu.hk/projects/DeepFashion.html) 
![title](viz/in-shop-retrieval.png)
2. This is a google drive link and please go to folder "In shop clothes retrieval benchmark".
3. Download the 3 folders -- `Anno/` , `Eval/` and `Img/`
4. Unzip them.

### Preprocessing the dataset
In order to be able to use the training code, we will generate list of images (train, test, query) into a csv file - thereafter we will be able to feed in to the training/testing code seamlessly.

*REQUIREMENT* The dataset (image_location) and (class_name) needs to be put in a csv format. A sample two row of a csv file would look like:

`train_images, 12`

`train_images, 33`

So there will be a need of three output files, one for each of - `train_set`, `test_set` or `query_set`.

(The folder `preprocessors/` hasn an utility function for obtaining the csv from the downloaded dataset. 
Fashion - `Fashion_convert2defense_triplet_format.py` . Please replace the variable `split_file` appropriately from the downloaded dataset.)

**Alternatively** you could use the cell below to get the required csv.

In [1]:
# SET up variables for pre-processor
# Variable split_file is obtained from downaloading the datasets
split_file = "/datasets/fashion/Eval/list_eval_partition.txt" # this text file is inside the downloaded folder
# following are the output CSV's we will need for train, test. 
output_train_csv = "/datasets/fashion/in_shop_defense_triplet_loss_format_TRAIN.csv"
output_query_csv = "/datasets/fashion/in_shop_defense_triplet_loss_format_QUERY.csv"
output_gallery_csv = "/datasets/fashion/in_shop_defense_triplet_loss_format_GALLERY.csv"

In [2]:
id_mapper = {}
id_counter = -1
# This snippet uses the configuration input and output paths set above
with open(split_file) as fp, open(output_train_csv, "w") as tr, open(output_gallery_csv, "w") as ga, open(output_query_csv, "w") as qu:
    line = fp.readline()
    cnt = 0
    while line:
        line = line.strip()  # remove leading and trailing whitespaces
        cnt += 1
        if cnt >= 3:
            metadata = []
            for tmp in line.split(" "):
                if len(tmp) is not 0:
                    metadata.append(tmp)
            #print("metadata: {}".format(metadata)) # if you want to display the inputs in terminal
            _path = metadata[0]
            _id = metadata[1]
            _categ = metadata[2]
            #print("_categ: {}".format(_categ))
            assert(_categ == "train" or _categ ==
                   "query" or _categ == "gallery")
            if _id not in id_mapper.keys():
                id_counter += 1
                id_mapper[_id] = id_counter
            if _categ == "train":
                tmp_str = str(id_counter) + "," + _path
                tr.write(tmp_str)
                tr.write("\n")
            elif _categ == "query":
                tmp_str = str(id_counter) + "," + _path
                qu.write(tmp_str)
                qu.write("\n")
            elif _categ == "gallery":
                tmp_str = str(id_counter) + "," + _path
                ga.write(tmp_str)
                ga.write("\n")
            else:
                print("Not possible to reach here!! ")
        line = fp.readline()

#### Please check the folder for the output csv from the above pre-processor. There should be 3 output csvs as indicated (set) in the pre-processor above.

## Clone the infrastructure code (training and testing)

Location - [Github](https://github.com/VisualComputingInstitute/triplet-reid/tree/sampling)
*Note* that we are cloning the `sampling` branch which has three sampling (mining variants):
1. Batch All (BA)
2. Batch Hard (BH)
3. Batch Sample (BS)

In order to freeze the repo status used for the book results, you can refer to the fork here - [Link](https://github.com/ratnesh1729/triplet-reid/tree/sampling)

#### Steps to follow to procure the above codebase
Please execute the following on Linux Terminal.
* For the project, we will clone it inside `/code/`
* `cd /code` 
* `git clone https://github.com/VisualComputingInstitute/triplet-reid`
* `git checkout sampling`

### Training Hyper-parameters 

We have the following parameters need to be set for training:
1. Network model: Choose a base architecture: Above code supplies options for Resnet-50, Resnet-101, MobileNet_v1. 
2. Pre-trained: Whether we need pre-trained model of above - Generally the answer is `yes`. 
3. Data augmentation: We could choose to randomly flip and crop images. 
4. Embedding dimension: Any feasible number. 
5. Batch Size: Parameterized by `P, K` , corresponding to `P` number of classes to choose and `K` samples from each class. So total batch size is `P*K`.
6. Crop initial images - If crop augmentation is used, we would need to supply initial crop width and height.
7. Network input size - Imagenet is generally trained with `224x224` and apparels are generally isotropic in dimensions, so we should stick to that. (Notice for training for person re-id, this assumption is not generally applicable as a person's `height > width`).
8. Mining variant: BS, BH, BA
9. Learning rate: Generally a low setting if we're utilizing pre-trained network.
10. Learning rate decay: Number of iterations before dropping the learning rate. 
11. Metric to compare: Choices are `square_euclidean` or `euclidean`. In practicle `euclidean` seems to work better (also on this dataset).
12. Margin: We will use the `softplus` option. Other possiblity would be to use `hard margin`, by supplying a float parameter.

### Training configuration and Download Imagenet pre-trained weights
The following bash script could be directly use for training on fashion dataset.

*The paths should be set appropriately* - Train csv file, Output location, Input pre--trained model, Image file location

### Download the pre-trained Mobilenet-v1 from [here](https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models) and put inside `/datasets/train/pre-trained/`. Alternatively you could choose any available architecture from Resnet-50, Resnet-101. 
Please make sure to unzip the tar.gz file - `tar -xvzf mobilenet_v1_1.0_224.tgz`

In [9]:
%%bash
#!/bin/sh
####
#### This file calls train.py with all hyperparameters as for the triplet metric learning experiment on In--Store Shopping Retrieval Project.

source ~/venv/tf_1.9.0_cuda9.2/bin/activate ## This is needed if you're using virtual environment in python.

cd /code/triplet-reid ## IMP: Change to the clone directory as to run the code

IMAGE_ROOT=/datasets/fashion/
EXP_ROOT=/datasets/train/BS_fashion/ ## THIS WILL BE THE OUTPUT FOLDER

CUDA_VISIBLE_DEVICES=0 python train.py \
    --train_set /datasets/fashion/in_shop_defense_triplet_loss_format_TRAIN.csv \
    --model_name mobilenet_v1_1_224 \
    --image_root $IMAGE_ROOT \
    --initial_checkpoint /datasets/train/pre-trained/mobilenet_v1_1.0_224.ckpt\
    --experiment_root $EXP_ROOT \
    --flip_augment \
    --embedding_dim 128 \
    --batch_p 18 \
    --batch_k 4 \
    --net_input_height 224 --net_input_width 224 \
    --margin soft \
    --metric euclidean \
    --loss batch_all \
    --learning_rate 3e-4 \
    --train_iterations 100000 \
    --head_name direct \
    --decay_start_iteration 25000\
    "$@"

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



## Checking the trainig progress:

### Training logs will be stored in the output folder above - i.e. `datasets/train/BS_fashion/`
Users could use `tail -f /datasets/train/BS_fashion/train.log` to see ETA and related details on the terminal.

### Visualizing the training progress
Tensorflow training can be monitored with Tensorbord which provides api to visualize training progress, e.g. train precision at current epoch and more. More details could be found here `https://www.tensorflow.org/tensorboard`


* In order to run tensorboard from terminal - `tensorboard logdir=/datasets/train/BS_fashion` 
* This will launch a visualizer on a browser - `localhost:6006`. The exact IP address could also be seen on terminal echo area.

#### Sample healthy run


As demonstrated by the authors of above could, we could use `tensorboard` to visualize the output. 
Here is an output of healthy run - healthy-run-sample


![title](healthy-run.png)

### Testing a trained model, Quantitatively

1. Step 1: Generate embeddings (stored in `.h5`) file using `embed.py` for both `Query` and `Test` set.

*Note* By default the `.h5` files are stored in the `training-output` directory.

2. Step 2: Evaluate these embeddings using `evaluate.py`. This will geneerate the `top-k` and `mAP` on the terminal.

Lets look at both these steps in the cells below.

A sample output from `evaluate.py` would look like: 
`mAP: 72.40% | top-1: 86.40% top-2: 91.22% | top-5: 95.43% | top-10:96.85% | top-20: 97.83%`

In [12]:
%%bash
# Generate embeddings for the gallery and query images. Input to this code is the appropriate csv we generated above.

### Set up bash paths to point to the code
source ~/venv/tf_1.9.0_cuda9.2/bin/activate 
cd /code/triplet-reid
## Generate embeddings for query set
python embed.py --experiment_root /datasets/train/BS_fashion/ --dataset /datasets/fashion/in_shop_defense_triplet_loss_format_QUERY.csv --image_root /datasets/fashion/ --checkpoint checkpoint-100000

## Generate embeddings for the gallery set
python embed.py --experiment_root /datasets/train/BS_fashion/ --dataset /datasets/fashion/in_shop_defense_triplet_loss_format_GALLERY.csv --image_root /datasets/fashion/ --checkpoint checkpoint-100000

Loading args from /datasets/train/BS_fashion/args.json.
Evaluating using the following parameters:
aggregator: None
batch_k: 4
batch_p: 18
batch_size: 256
checkpoint: checkpoint-100000
checkpoint_frequency: 1000
crop_augment: None
dataset: /datasets/fashion/in_shop_defense_triplet_loss_format_QUERY.csv
decay_start_iteration: 25000
detailed_logs: False
embedding_dim: 128
experiment_root: /datasets/train/BS_fashion/
filename: /datasets/train/BS_fashion/in_shop_defense_triplet_loss_format_QUERY_embeddings.h5
flip_augment: False
head_name: direct
image_root: /datasets/fashion/
initial_checkpoint: /datasets/train/pre-trained/mobilenet_v1_1.0_224.ckpt
learning_rate: 0.0003
loading_threads: 8
loss: batch_all
loss_ignore_zero: False
margin: soft
metric: euclidean
model_name: mobilenet_v1_1_224
net_input_height: 224
net_input_width: 224
optim: AdamOptimizer(learning_rate)
pre_crop_height: 288
pre_crop_width: 144
quiet: False
resume: False
train_iterations: 100000
train_set: /datasets/fashion/in

Instructions for updating:
keep_dims is deprecated, use keepdims instead
2020-02-16 07:48:06.553114: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635
pciBusID: 0000:4c:00.0
totalMemory: 10.73GiB freeMemory: 10.53GiB
2020-02-16 07:48:06.697790: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 1 with properties: 
name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:4b:00.0
totalMemory: 11.91GiB freeMemory: 9.92GiB
2020-02-16 07:48:06.697824: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0, 1
2020-02-16 07:48:07.458774: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-16 07:48:07.458804: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 1 
2020-02-16 07:48:07.458809: I tensorflow/core/common_runtim

In [13]:
%%bash
# Quantitative testing - Using the embeddings generated from the cell above, we will compute accuracies
### Set up bash paths to point to the code
source ~/venv/tf_1.9.0_cuda9.2/bin/activate 
cd /code/triplet-reid

python evaluate.py --excluder diagonal --query_dataset /datasets/fashion/in_shop_defense_triplet_loss_format_QUERY.csv --query_embeddings /datasets/train/BS_fashion/in_shop_defense_triplet_loss_format_QUERY_embeddings.h5 --gallery_dataset /datasets/fashion/in_shop_defense_triplet_loss_format_GALLERY.csv --gallery_embeddings /datasets/train/BS_fashion/in_shop_defense_triplet_loss_format_GALLERY_embeddings.h5 --metric euclidean

Evaluating batch 0-256/14218Evaluating batch 256-512/14218Evaluating batch 512-768/14218Evaluating batch 768-1024/14218Evaluating batch 1024-1280/14218Evaluating batch 1280-1536/14218Evaluating batch 1536-1792/14218Evaluating batch 1792-2048/14218Evaluating batch 2048-2304/14218Evaluating batch 2304-2560/14218Evaluating batch 2560-2816/14218Evaluating batch 2816-3072/14218Evaluating batch 3072-3328/14218Evaluating batch 3328-3584/14218Evaluating batch 3584-3840/14218Evaluating batch 3840-4096/14218Evaluating batch 4096-4352/14218Evaluating batch 4352-4608/14218Evaluating batch 4608-4864/14218Evaluating batch 4864-5120/14218Evaluating batch 5120-5376/14218Evaluating batch 5376-5632/14218Evaluating batch 5632-5888/14218Evaluating batch 5888-6144/14218Evaluating batch 6144-6400/14218Evaluating batch 6400-6656/14218Evaluating batch 6656-6912/14218Evaluating batch 6912-7168/14218Evaluating batch 7168-7424/14218Evaluating batch 7424-7680/14218Evaluating batch 7

2020-02-16 07:49:02.638374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties: 
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635
pciBusID: 0000:4c:00.0
totalMemory: 10.73GiB freeMemory: 10.53GiB
2020-02-16 07:49:02.765705: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 1 with properties: 
name: TITAN X (Pascal) major: 6 minor: 1 memoryClockRate(GHz): 1.531
pciBusID: 0000:4b:00.0
totalMemory: 11.91GiB freeMemory: 9.92GiB
2020-02-16 07:49:02.765742: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0, 1
2020-02-16 07:49:03.509566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-16 07:49:03.509593: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958]      0 1 
2020-02-16 07:49:03.509599: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   N N 
2020-02-16 07:49:03.509602: I tensorfl

### Testing a trained model, Qualitatively

We could visualize the embeddings by showing them as following. First image in each row is the `query` image, while rest are `top-k` retrievals.

![title](viz/sample_visuals/000184.png)
![title](viz/sample_visuals/004032.png)
![title](viz/sample_visuals/007947.png)

##### Utility for above code
* `python viz/viz_retrievals.py ---h` . 
* To use above script - we would need to point to query and gallery csv and .h5 embeddings (obtained from above quantitative testing)
* `--output` is the desired output folder for above images.

*Notice* that the above code uses Soptify's Annoy library (Approximate nearest neighbors) for efficient retrievals.


In [17]:
%%bash
# Get output qualitative images as shown in above cell.
source ~/venv/tf_1.9.0_cuda9.2/bin/activate 
cd viz

# Sample command: 
python viz_retrievals.py --img /datasets/fashion/ --query_csv /datasets/fashion/in_shop_defense_triplet_loss_format_QUERY.csv --query_h5 /datasets/train/BS_fashion/in_shop_defense_triplet_loss_format_QUERY_embeddings.h5 --gallery_csv /datasets/fashion/in_shop_defense_triplet_loss_format_GALLERY.csv --gallery_h5 /datasets/train/BS_fashion/in_shop_defense_triplet_loss_format_GALLERY_embeddings.h5 --k 5 --output top_5_viz_results

# Troubleshooting

1. Every python code used above has `--h` arg option to list the required arguments and instructions.
2. If you re-run "train.py" - make sure to look into `train.log` for an ETA or any mis-behaviors.