##  Lenet5 Inspired Models -JackieN 
This File Produces A number of Lenet5 inspired Models and Predictions based on varying degrees of cleaned Train data.

Based on https://medium.com/@mgazar/lenet-5-in-9-lines-of-code-using-keras-ac99294c8086 and 

https://deepai.org/publication/towards-good-practices-on-building-effective-cnn-baseline-model-for-person-re-identification#:~:text=The%20last%20key%20practice%20is%20to%20train%20CNN,based%20on%20the%20adaptive%20estimates%20of%20lower-order%20moments.

The best score produced from the model using the clean data with all outliers removed is: 2.48797  

Placing at position 51 on the leaderboard

![](https://i.imgur.com/BiDsWBP.jpg)
 

### Imports

Set the UTILS_PATH to be the locaiton of your utils directory.  This will allow for the use of the loading of load_models and predict_models

In [None]:
#Set the utils path to point to the utils directory locally
UTILS_PATH = "MODIFY THIS"

import os, sys
sys.path.append(UTILS_PATH)
from load_models import LoadTrainModels
from predict_models import PredictModels
import imp
import pickle



### Helper Path Functions

Two helper functions were created for simplification:

- set_train_paths: This sets the following paths and must be done prior to calling create_model() and create_predictions() 
    - model_path - location the model files should be saved 
    - train_path - location of the clean train pickle files to use for model creation

- set_test_paths: This sets the following paths and must be done prior to calling create_model() and create_predictions() 
    - test_path - location of test pickle file
    - id_lookup - location the id_lookup pickle file 
    - prediction_path - location of where the prediction csv should be saved
 

In [None]:
def set_train_paths(model_path, train_path="C:/Data/CleanTrain/", ):
    global file_path
    global trainer
    global output_model_path
    file_path = train_path
    output_model_path=model_path
    trainer = LoadTrainModels(output_model_path, file_path)
    trainer.print_paths()

def set_test_paths(test_path="../Data/test.p", id_lookup_path="../Data/id_lookup.p", prediction_path = "C:/data/Predictions/"):
    global pred_path
    global predictor
    global id_lookup
    global test

    id_lookup = pickle.load( open(id_lookup_path , "rb" ) )
    test = pickle.load( open(test_path , "rb" ) )
    pred_path = prediction_path
    predictor = PredictModels(output_model_path,pred_path , id_lookup)
    predictor.print_paths()


### Helper Model Functions

Two helper functions were created for simplification:

- create_model: This takes care of opening files in a directory and passing along setting to the utils class LoadTrainModels which will apply any augmentation, split, and train the models. The model files will be stored at the specified location. You must ensure that the set_train_paths() funciton is called prior to this with the appropriate paths set. 

- create_predictions: This takes care of opening model files in a directory and passing along settings to the utils class PredictModels which will generate a predictions csv per model. You must ensure that the set_train_paths() and set_test_paths() funcitons are called prior to this with the appropriate paths set. 

In [None]:
def create_model(aug = False, vary_layers = False, hoizontal_flip = False,brightness = 1, dim = 1, separate = False):
    files = os.listdir(file_path)
    num_layers = 6
    #For every version of a cleaned Train file in CleanTrain directory, create and save a model
    for filename in files: 
        print("Opening file: ", filename)
        clean_file = "".join((file_path,filename))
        train_data = pickle.load( open( clean_file, "rb" ) )
        train_data = train_data.drop(['level_0', 'check_sum', 'index'], axis=1,errors='ignore')
        print("Train Shape:", train_data.shape)
        filename = str(filename).replace('.p', '').strip()
        

        #Setting layers:
        #layers = 2 equates to 9 model layers
        #layers = 3 equates to 11 model layers
        #layers = 4 equates to 13 model layers
        #layers >= 5 equates to 15 model layers
        if vary_layers:
            #Now for each model, let's try different layers
            for num_layers in range(2,6):
                print("Begin model and train:")
                model_name = "".join((filename,str(num_layers),"layers_Lenet5"))
                print("Model name:", model_name)
                model, history = trainer.train_model(model_name, train_data, hoizontal_flip = hoizontal_flip,aug = aug, brightness = brightness, dim = dim,layers=num_layers, separate = separate)
                print("End model and train")    
                print()
        else:
            print("Begin model and train:")
            model_name = "".join((filename,"_Lenet5"))
            print("Model name:", model_name)
            model, history = trainer.train_model(model_name, train_data, aug = aug, hoizontal_flip = hoizontal_flip,brightness = brightness, dim = dim,layers=num_layers, separate = separate)
            print("End model and train")    
            print()

        


def create_predictions(columns = "Full"):
    files = os.listdir(output_model_path)
    #For every model in file_path, predict using the model and save the predictions in CSV file
    for filename in files:
        if ".h5" in filename:
            base_name = filename[:-3]
            model_json = ''.join((base_name,".json"))
            print("Working with: ", base_name)
            print("Begin Predict")
            #The predict_standard makes predictions and stores them in a pred_path location speficied.
            #pred_path is set via the set_test_paths function call
            Y= predictor.predict_standard(base_name, filename, model_json, test, columns=columns)
            print("End model and train")    
            print()

def combine_predictions(full_path, seperate_path):
    predictor.combine_predictions(full_path, seperate_path)

## Baseline test

To begin, let's run the model against the raw train data to determine baseline. Once we have a baseline, we can attempt to improve from that.  

This cell calls the set_train_paths with the paths of the output of the model creation and the path of the train file. It then creates the model. The directory output is seen below:

![](https://i.imgur.com/qT7mF5c.jpg)







In [None]:

set_train_paths("C:/data/Jackie_Lenet5_Raw", "C:/Data/RawTrain/")
#Get the files in the clean directory
create_model()
    

## Baseline Prediction 

For the model created above, predict using the model and save the predictions in CSV file for submission. 

Following the cell above, it's now time to make some predictions.  The following cell, 

- sets the test paths: set the path where train dataset is location, set the path the prediction should be saved
- loops through the directory and for each model (json file)
    1. Create a prediction and store in specified location.

The baseline approach was submitted and recieved the score below: 

![](https://i.imgur.com/JW2wJfQ.jpg)


Note: If you would like to run this cell, please update the paths accordingly. 

In [None]:
#Time to make some predictions
set_test_paths()
create_predictions()




## Improvement to Baseline

Four attemps were made to improve against baseline and followed the following pipeline approach:

![](https://i.imgur.com/jlxPolW.png)

- Approach 1: all versions of cleaned train data (clean section) set were used to create models and predictions 
- Approach 2: all versions of cleaned train data (clean section) set were used + varying the layers in the model used to create models and predictions
- Approach 3: all versions of cleaned train data (clean section) set were used + varying layers + image augmentation (brightness and dim) to create models and predictions


The following two appoaches take a slightly different approach by creating a model with 30 keypoints and then with 8 keypoints and combining the predictions into a single CSV. 

![](https://i.imgur.com/1UUDUSy.jpg)

- Approach 4: use different cleaned versions of train data set flip the images, add brigthness=0 dim = 0 for 30 keypoints and 8 keypoints. 
- Approach 5: use only clean_all_outliers of train data set flip the images, add brigthness=1.4 dim = 0.3 for 30 keypoints and 8 keypoints full layers. (best result came from this test)

All five are described in the following cells. 




### Approach 1: Use different cleaned versions of train data set

Now it's time to see if we can improve from baseline.  For this attempt, we will create a model for every version of a clean Train file in a given path, create and save a model.

![](https://i.imgur.com/S7FhUkH.jpg)





This cell looped through the directory of clean trail files which appears below and created a model for each file. Please refer to the Readme file for more information on each.


![](https://i.imgur.com/bNZTV5a.jpg)



The following cell produced the prediction with the best result for the clean file named: clean_all_outliers.  This means that the train file that was cleaned by removing all outliers produced the best result with these settings. 

![](https://i.imgur.com/kbpD4Eo.jpg)

In [None]:
set_train_paths(model_path= "C:/data/Jackie_Lenet5_AllClean")
#Get the files in the clean directory and create a model for each
create_model(vary_layers = False)

#Peform the predictions
set_test_paths()
create_predictions()

### Approach 2: Use different cleaned versions of train data set and vary the layers of the model. 

The following cell is an advanced version.  No transformations to the data were applied but the model is adjusted by adding layers. This cell will create 5 models with varying layers per clean file (e.g. if you have 2 clean files you will end up with 10 models). 

Setting layers:

- layers = 2 equates to 9 model layers
- layers = 3 equates to 11 model layers
- layers = 4 equates to 13 model layers
- layers >= 5 equates to 15 model layers


In this example, I only use the clean_all_outliers clean train file since it produced the best results previously.

![](https://i.imgur.com/0UPWIj1.jpg)

Note: run at your own risk suggest only one clean file in the directory at a time. I did run this on all 7 clean files and it worked! 

![](https://i.imgur.com/NDedKbW.jpg)


In [None]:
set_train_paths(model_path= "C:/data/Jackie_Lenet5_Layers", train_path= "C:/Data/CleanTrain_1/")
#Get the files in the clean directory, try different layers to create some models
create_model(vary_layers = True)

#Peform the predictions
set_test_paths()
create_predictions()
    

### Approach 3: Use different cleaned versions of train data set and vary the layers of the model and augment the data. 

The following cell is an advanced version.  The brightness and dim were adjusted on the images and the model is adjusted by adding layers. This cell will creates 4 models with varying layers per clean file (e.g. if you have 2 clean files you will end up with 8 models).  

In this example, I only use the clean_all_outliers clean train file since it produced the best results previously.  

The best performing was the layers=2 (9 model layers) with bright and dim set but still did not beat approach 1: 

![](https://i.imgur.com/rF9Crwr.jpg)

Note: run at your own risk suggest only one clean file in the directory at a time. I did run this on all 7 clean files and it worked! 

In [None]:
set_train_paths(model_path = "C:/data/Jackie_Lenet5_BD", train_path= "C:/Data/CleanTrain_1/")
#Get the files in the clean directory, try different layers and ajust the brightness and dim level of each image
create_model(vary_layers = True, hoizontal_flip = False,brightness = 1.4, dim = .3)

#Peform the predictions
set_test_path
create_predictions()

### Approach 4: Use different cleaned versions of train data set flip the images, add brigthness=0 dim = 0 for 30 keypoints and 8 keypoints. 

The following cell is an advanced version.  The images are flipped horizontaly and brightness and dim are set to 0. 

For every model file in a given path, 

- create, train and fit model with augmentation and with 30 keypoints, predict and save predictions
- create, train and fit model with augmentation and with 8 keypoints, predict and save predictions
- combine the predictions into a single file for submission

Output for 30 keypoints: 
![](https://i.imgur.com/iWpWR5C.jpg)

Output for 8 keypoints: 
![](https://i.imgur.com/I9S2rMh.jpg)


In this example, I only use the clean_all_outliers clean train file since it produced the best results previously.  

This yielded the 3rd best result:  


![](https://i.imgur.com/RrpnwyW.jpg)


Note: one less layer was tested here

In [None]:
#Use 30 keypoints
full_path = "C:/data/Jackie_Lenet5_30RawAug"

set_train_paths(full_path, "C:/Data/CleanTrain_30/")
#Get the files in the clean directory, try different layers to create some models and flip the images horizontally
#create_model(vary_layers = False, hoizontal_flip = False,brightness = 1, dim = 1, seperate = False)
create_model((aug = True, hoizontal_flip = True, brightness = 0, dim = 0)

#Peform the predictions
set_test_paths(prediction_path = "C:/data/Predictions_30/")
create_predictions(columns = "Full")




In [None]:
#Use only 8 keypoints
seperate_path = "C:/data/Jackie_Lenet5_8RawAug"
set_train_paths(seperate_path, "C:/Data/CleanTrain_8/")
create_model((aug = True, hoizontal_flip = True, brightness = 0, dim = 0, separate = True)
set_test_paths(prediction_path = "C:/data/Predictions_8/")
create_predictions(columns = "False")

combine_predictions("C:/data/Predictions_30/", "C:/data/Predictions_8/" )

### Approach 5: Use only clean_all_outliers of train data set flip the images, add brigthness=1.4 dim = 0.3 for 30 keypoints and 8 keypoints - full layers. 

The following cell is an advanced version.  The images are flipped horizontaly and brightness=1.4 and dim=0.3 

For every model file in a given path, 

- create, train and fit model with augmentation and with 30 keypoints, predict and save predictions
- create, train and fit model with augmentation and with 8 keypoints, predict and save predictions
- combine the predictions into a single file for submission

Output for 30 keypoints:

![](https://i.imgur.com/yWrWgCB.jpg)

Output for 8 keypoints: 

![](https://i.imgur.com/y6iDWSc.jpg)

In this example, I only use the clean_all_outliers clean train file since it produced the best results previously. 

![](https://i.imgur.com/YuhHwCL.jpg)

This yielded the best result:  

![](https://i.imgur.com/BiDsWBP.jpg)




In [None]:
#Use 30 keypoints
full_path = "C:/data/Jackie_Lenet5_30_1_layer"

set_train_paths(full_path, "C:/Data/CleanTrain_30_1/")
#Get the files in the clean directory, try different layers to create some models and flip the images horizontally
#create_model(vary_layers = False, hoizontal_flip = False,brightness = 1, dim = 1, seperate = False)
create_model(aug = True, hoizontal_flip = True, brightness = 1.4, dim = 0.3)

#Peform the predictions
set_test_paths(prediction_path = "C:/data/Predictions_30_1_layer/")
create_predictions(columns = "Full")

In [None]:
#Use only 8 keypoints
seperate_path = "C:/data/Jackie_Lenet5_8_1_layer"
set_train_paths(seperate_path, "C:/Data/CleanTrain_8_1/")
create_model(aug = True,hoizontal_flip = True, brightness = 0, dim = 0, separate = True)
set_test_paths(prediction_path = "C:/data/Predictions_8_1/")
create_predictions(columns = "False")


combine_predictions("C:/data/Predictions_30_1/", "C:/data/Predictions_8_1_layer/" )