# Experiment framework
This notebook gives an example of how the framework can be used to train and evaluate models on standard vehicle identification datasets. The framework can be invoked via the CLI or using python classes. 

This notebook shows how the experiments can be run using the CLI


## Colab specific
Modify and run these cells to prepare the colab environment for the project
- Setup integration with google drive
    - Needs these paths: mount point, Dataset storage path, checkpoint storage path, prediction storage path
- Setup content folder as git repo and pull codebase from github
    - For now we can manually set it up by using a [PAT](https://docs.github.com/en/github/authenticating-to-github/keeping-your-account-and-data-secure/creating-a-personal-access-token). We read this PAT from a location on google drive


### The project expects a few folders in the root directory. Since the colab environment is transient, these folders need to be recreated everytime a fresh runtime is started. The below cell creates links to the source folder present in google drive:
- checkpoints  (For storing logs and model checkpoints)
- predictions (For storing the predictions while evaluating)
- carzam (For downloading the compressed dataset files, used in the setup_dataset method)
- dataset (NOT CREATED HERE, Running setup_dataset scripts creates this folder and copies the extracted dataset files)

In [None]:
import os
# Path constants
STORAGE_ROOT='/content/drive'                                       # google drive mount point
CARZAM_ROOT= os.path.join(STORAGE_ROOT, 'MyDrive/Gatech/CARZAM')    # Project root within drive 
CHECKPOINT_ROOT=os.path.join(CARZAM_ROOT, 'checkpoints')            # Experiment root within project
PREDICTION_ROOT=os.path.join(CARZAM_ROOT, 'predictions')
DATASET_ROOT=os.path.join(CARZAM_ROOT, 'Datasets')

In [None]:
from google.colab import drive
drive.mount(STORAGE_ROOT)
!mkdir -p "{CHECKPOINT_ROOT}"
!mkdir -p "{PREDICTION_ROOT}"
!ln -s "{CHECKPOINT_ROOT}" "checkpoints"
!ln -s "{PREDICTION_ROOT}" "predictions"
!ln -s "{CARZAM_ROOT}" "carzam"


### Setup up codebase from github
Download the code from github using a access token stored on google drive. The access token path needs to be changed as per your google drive structure

In [None]:

GITHUB_PAT_PATH=os.path.join(STORAGE_ROOT, 'MyDrive/Gatech/github_pat_colab.txt')

with open(GITHUB_PAT_PATH) as reader:
    GITHUB_PAT = reader.readline()


Init the current folder as a git repo and link it to remote github repo before pulling. This allows us to clone into a folder with existing files. We also set the remote as upstream to allow changes to be done from colab to the project files.

In [None]:
!git init
!git remote add origin "https://{GITHUB_PAT}@github.com/piyengar/vehicle-predictor.git" 
!git pull origin master
!git branch --set-upstream-to=origin/master master

## Install project
This will add the project and packages to the python env

In [None]:
# %%capture --no-stderr
!pip install -e .

## Define dataset location 


In [None]:
DATASET_ROOT='./dataset'
# Set a name for the experiment, This will be used to create a folder for storing the logs, predictions and model checkpoints
EXPERIMENT_NAME= 'brand'
# Path where the experiment script exists
EXPERIMENT_SCRIPT= "experiments/brand/train_brand.py"

### Download and setup datasets
The below method downloads and extracts the datasets into the dataset folder in the project root. The table also gives an estimate of the space each dataset takes

| Dataset         |  GB |
|-----------------|----:|
| VeRi_with_plate | 1.1 |
| CompCars        | 2.5 |
| Cars196         | 1.9 |
| BoxCars116k     | 9.2 |

### CompCars Dataset setup

In [None]:
%%capture --no-stderr
!python setup_dataset.py download -n COMP_CARS -s gdrive -d ./dataset_source
!python setup_dataset.py setup -n COMP_CARS -s ./dataset_source -d {DATASET_ROOT}

### BoxCars116k dataset Setup

In [None]:
%%capture --no-stderr
!python setup_dataset.py download -n BOXCARS116K -s gdrive -d ./dataset_source
!python setup_dataset.py setup -n BOXCARS116K -s ./dataset_source -d {DATASET_ROOT}

### Cars196 dataset Setup

In [None]:
# %%capture --no-stderr
!python setup_dataset.py download -n CARS196 -s gdrive -d ./dataset_source
!python setup_dataset.py setup -n CARS196 -s ./dataset_source -d {DATASET_ROOT}

### VehicleID Dataset setup

In [None]:
%%capture --no-stderr
!python setup_dataset.py download -n VEHICLE_ID -s gdrive -d ./dataset_source
!python setup_dataset.py setup -n VEHICLE_ID -s ./dataset_source -d {DATASET_ROOT}

## Training

### Training params
Change the training hyper params as required

In [None]:
# one of : VEHICLE_ID, BOXCARS116K, CARS196, COMBINED
# train_dataset_type = 'CARS196'
# train_dataset_type = 'BOXCARS116K'
train_dataset_type = 'VEHICLE_ID'
# train_dataset_type = 'COMBINED'

# Learning rate/eta0
lr=4e-2
lr2=1e-5 # squeezenet 
lr_step=1
lr_step_factor=0.9

#Early stop patience
patience = 4

batch_size=128
max_epochs=1

# "resnet18",
# "resnet50",
# "resnet152",
# "mobilenetv3-small",
# "efficientnet-b0",
# "squeezenet",
model_arch="efficientnet-b0"

# development
is_dev_run=False

# The number of gpus to use for training. The below line sets it to the num of available gpus
num_gpus = -1
num_dataloader_workers = 2

### Dataset statistics
Prints out the class distribution statistics for the train dataset

In [None]:
! python {EXPERIMENT_SCRIPT} train_stats test_stats --train_dataset CARS196 --test_dataset CARS196

### Tune Learning Rate
Helps us find an approximate learning rate for training by running heuristics

In [None]:
! python experiments/brand/train_brand.py tune --train_dataset CARS196 --model_arch {model_arch}

---
## Train
Train the model defined in the experiment using the "brand" experiment as an example

In [None]:
! python {EXPERIMENT_SCRIPT} train \
    --model_arch "efficientnet-b0" \
    --max_epochs {max_epochs} \
    --gpus -1 \
    --train_dataset VEHICLE_ID \
    --data_dir {DATASET_ROOT}

---
## Visualize
The logs will be stored in \<checkpoints>/\<EXPERIMENT_NAME>/lightning_logs/*

In [None]:
# Start tensorboard.
%load_ext tensorboard
%tensorboard --logdir checkpoints/{EXPERIMENT_NAME}/lightning_logs/

## Evaluate Predictions
We run evaluations in two stages. First the predictions are stored in an output file. Then, the prediction files are used to compute the metrics. This gives us the flexibility to checkpoint our progress as some of them might take long even with GPUs and might get aborted due to environment constraints.

### Predict
Runs predictions on the dataset's test data and saves it in the prediction folder. The get_conf_data method generates the persistence path and model path based on input params in a deterministic fashion

The prediction loads the model from the provided \*.ckpt file .

The predictions are stored into the file provided by the `--prediction_file_name` arg or  \<predictions\>/\<EXPERIMENT_NAME>/\<test_dataset>\_\<model_checkpoint_file>\_\<time.time()>.txt

Note down the prediction file path being printed and use it in the next step. 

`TIP: you can call both the predict and evaluate commands in a single step and skip providing the prediction file arg. The prediction file will still be saved in the file path mentioned above`

In [None]:
! python {EXPERIMENT_SCRIPT} predict --model_arch "efficientnet-b0" \
    --gpus -1 --test_dataset VEHICLE_ID \
    --model_checkpoint_file "checkpoints/brand/lightning_logs/version_13/checkpoints/epoch=0-step=738.ckpt"

### Evaluate
Evaluates various metrics from stored prediction files and stores it in files. 


In [None]:
! python {EXPERIMENT_SCRIPT} evaluate \
    --test_dataset VEHICLE_ID \
    --prediction_file_path "predictions/brand/prediction_file.txt"