# Distraction Driver Detection Project
## Project Aim: To predict the activity of the driver from the image
This document is a test notebook to test and implement the codes for the Distracted Driver detection project.

This is some plain text that forms a paragraph.
Add emphasis via **bold** and __bold__, or *italic* and _italic_.

Paragraphs must be separated by an empty line.

* Sometimes we want to include lists.
 * Which can be indented.

1. Lists can also be numbered.
2. For ordered lists.

[It is possible to include hyperlinks](https://www.example.com)

Inline code uses single backticks: `foo()`, and code blocks use triple backticks:

```
bar()
```

Or can be intented by 4 spaces:

    foo()


## Step 0: Importing the datasets

We start by importing the images from the datasets. The datasets are too large for the memory to load all the data simultaneously. So we start by loading the file names and the labels. We also split the files to train, test and validation sets here, as the actual test set peovided to us has no labels. So we are sticking to validate and test our models with the train set itself. 

In [1]:
from sklearn.datasets import load_files
from sklearn.model_selection import train_test_split
from keras.utils import np_utils
import numpy as np
from glob import glob

#defining a function to load the dataset
def load_dataset(path):
    data = load_files(path)
    driver_images = np.array(data['filenames'])
    driver_activities = np_utils.to_categorical(np.array(data['target']))
    return driver_images, driver_activities

#loading the datasets
#change the directory in the github as well
images, targets = load_dataset('../input/state-farm-distracted-driver-detection/train/')

#splitting data to train, test and validation datasets
images_train, images_rest, targets_train, targets_rest = train_test_split( images, targets, train_size=0.8, random_state=42)
images_val, images_test, targets_val, targets_test = train_test_split( images_rest, targets_rest, train_size=0.5, random_state=42)


#printing the dataset statistics
print('There are %d total number of driver images' % len(images))


print('There are %d number of train images' % len(images_train))
print('There are %d number of validation images' % len(images_val))
print('There are %d number of test images' % len(images_test))


import os
print(os.listdir("../input/state-farm-distracted-driver-detection/train"))


  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


There are 22424 total number of driver images
There are 17939 number of train images
There are 2242 number of validation images
There are 2243 number of test images
['c9', 'c0', 'c8', 'c1', '.DS_Store', 'c7', 'c5', 'c6', 'c4', 'c2', 'c3']




## Step 2: Data Generator

We found our the RAM we have on this computer/kernel is not large enough to run this algorithm for this large dataset. Hence, we train the model in batches. We define a new class  inherited from the Sequence class in keras and modify it to our requirements. The batch size is set to 32 images by default, which we will increase depending on other factors. For preprocessing, we normalize the pixels in the images. Normalization and image size setting is done in the data generator class itself to reduce the need for preprocessing in  the later steps and also to reduce the size of the generated data.

In [4]:
#import cv2
#import matplotlib.pyplot as plt
from keras.preprocessing import image
from tqdm import tqdm
from PIL import ImageFile                            
ImageFile.LOAD_TRUNCATED_IMAGES = True
import keras
#define a function that reads a path to an image and returns a tensor suitable for keras


#Create Data Generator class that can generate data in batches and split them to train, test and validation sets
class DataGenerator(keras.utils.Sequence):
    
    #def __init__(self, list_file_paths, labels, batch_size=32, dim=(32,32,32), n_channels=1,
    #             n_classes=10, shuffle=True):
    def __init__(self, list_file_paths, labels, batch_size=32, shuffle=True):
        
        #Initialization
        #self.dim = dim
        self.batch_size = batch_size
        self.labels = labels
        self.list_file_paths = list_file_paths
        #self.n_channels = n_channels
        #self.n_classes = n_classes
        self.shuffle = shuffle
        self.on_epoch_end()
        
    def __len__(self):
        #Denotes the number of batches per epoch
        return int(np.floor(len(self.list_file_paths) / self.batch_size))
    
    def __getitem__(self, index):
        #Generate one batch of data
        # Generate indexes of the batch
        indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]

        # Find list of IDs
        list_IDs_temp = [self.list_IDs[k] for k in indexes]

        # Generate data
        X = self.__data_generation(list_IDs_temp)
        y = self.labels[indexes]

        return X, y
    
    def on_epoch_end(self):
        #Updates indexes after each epoch
        self.indexes = np.arange(len(self.list_file_paths))
        if self.shuffle == True:
            np.random.shuffle(self.indexes)
    
    def path_to_tensor(path):
        img = image.load_img(path, target_size=(256, 256))
        x = image.img_to_array(img)
        return np.expand_dims(x, axis=0)
    
    #define a function that reads a path to an image and returns a tensor suitable for keras
    #def paths_to_tensor(paths_list):
    #    img_list = [path_to_tensor(img_path) for img_path in tqdm(paths_list)]
    #    return np.vstack(img_list)
    
    def __data_generation(self, list_file_paths_temp):
        
        #Generates data containing batch_size samples
        # X : (n_samples, *dim, n_channels)
        # Initialization
        img_list = [path_to_tensor(img_path) for img_path in list_file_paths_temp]
        return np.vstack(img_list).astype('float32')/255



               

#normalize all images: divide the tensors by 255
#pre-process the data for Keras
#train_tensors = paths_to_tensor(images_train)#.astype('float32')/255
#valid_tensors = paths_to_tensor(images_val)#.astype('float32')/255
#test_tensors = paths_to_tensor(images_test)#.astype('float32')/255

print("atleast this is working!")



atleast this is working!


# Step 3 : Instantiate data generator objects

Here, we create the data generator objects for train. test and validation objects. 

In [6]:
from keras.models import Sequential

# Parameters for the Data Generator
params = {'batch_size' : 64,
          'shuffle': True}

# creating generators
training_generator = DataGenerator(images_train, targets_train, **params)
validation_generator = DataGenerator(images_val, targets_val, **params)
testing_generator = DataGenerator(images_test, targets_test, **params)




# Step 4 : Construct & Train the Vanilla model

Here, we construct a Vanilla CNN model. It should contain a basic CNN layer, followed by RELU, Maxpool and Softmax layers respectively in that order. 

In [None]:
test_tensors = paths_to_tensor(images_test)

# Step 5 : Test the Vanilla Model

Here we test the vanilla model and get the accuracy of the model. We also calculate the multiclass logloss value in this step


# Step 6 : Load RESNET model

# Step 7 : Transfer Learning using the RESNET model

# Step 8 : Test the new model

# Step 9 : Show some image results with the new model

# Step 10 : Conclusion