# Introduction
Feature extraction is a time intensive effort and hence we have isolated all steps of feature extraction to this notebook. Once the feature extraction is done, we will use the [W281_Fall_2022_Final_Report_Driver_Behavior_Detection](W281_Fall_2022_Final_Report_Driver_Behavior_Detection.ipynb) notebook for feature analysis, model training and model evaluation.
## Prerequisites For This Notebook
This notebook needs that all depedencies are installed. Please follow the steps in [Readme](../README.md) for setting up the environment and installing dependencies.

## Step 0: Notebook Initialization

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [6]:
import time
import os
from tqdm.notebook import tqdm
from collections import defaultdict

import numpy as np
import pandas as pd

import seaborn as sns
from matplotlib import pyplot as plt
import matplotlib.ticker as ticker
import matplotlib

import cv2 as cv
import torch
from torchvision import transforms

import transformers
import eda_helpers
import feature_helpers
import viz
import configuration
import customdataset
import enums

device = 'cpu'
config = configuration.Configuration()
face_config = configuration.FaceConfig(config)
pose_config = configuration.PoseConfig(config)
vizualizer = viz.Vizualizer(config, face_config, pose_config, tqdm=tqdm)
feature_extractor = feature_helpers.FeatureExtractor(config, face_config, pose_config, tqdm)

IMAGE_TYPES = [enums.ImageTypes.ORIGINAL, enums.ImageTypes.POSE, enums.ImageTypes.FACE]

## Step 1: Face Extraction
We will use pre-trained Multi-task Cascaded Convolutional Networks model[<sup>1</sup>](#cite_mtcnn) to extract faces from our images. The MTCNN model uses 3 stages:
* Stage 1: The Proposal Network (P-Net), a fully connected network, that predicts bounding boxes around faces in the image
* Stage 2: The Refine Network (R-Net), a CNN with dense layer to predict if the boxes identified by P-Net has a face or not
* Stage 3: The Output Network (O-Net), another CNN that outputs the 5 facial keypoints.


In [3]:
%%time
# Extract face from each image. Takes a few hours to run.
def extract_faces():
    eda_helpers.FaceExtractor(config, tqdm).extract_faces(face_config.FEATURES_FOLDER, face_config.FACE_SUMMARY_NAME, config.ANNOTATION_FILE)
extract_faces() # Commented out as this takes a long time to run on our 20,000+ images.

  0%|          | 0/22424 [00:00<?, ?files/s]

CPU times: user 4h 51min 18s, sys: 17min 23s, total: 5h 8min 41s
Wall time: 52min 57s


## Step 2: Train-Test-Validation Split
Our training dataset has 22,000+ 640x480x3 color images. In order to iterate faster, we decided to limit the dataset to only those images for which MTCNN model was able to identify faces. This brought our data set to about 600 images per class. We applied 80-20-20 split and created the train-validation-test splits. This also helped make our next step of pose detection more manageable.

In [4]:
def split_dataset():
    splitter = eda_helpers.SampleSplitter(config, face_config, pose_config, tqdm=tqdm)
    splitter.sample(config.class_dict.keys(), samples_per_class=[600, 480, 60, 60], out_file=config.ANNOTATION_FILE)

split_dataset() # We have commented this code out as this is an one time task.

Total samples: 22424


  0%|          | 0/6000 [00:00<?, ?file/s]

Validating and saving the split-up...
Created 4800 training samples
Created 600 validation samples
Created 600 testing samples
Left with 16424 unused samples


## Step 3: Pose Detection For Train-Test-Validation Splits
Our intution is that the pose of the driver can be used to differentiate the classes. In order to build pose based features we want to extract human pose from the images using Google's [Movenet model](https://www.tensorflow.org/hub/tutorials/movenet) which is based on MobileNetV2[<sup>2</sup>](#cite_mobilenet). 

Each image is padded and resized to 192x192 pixels, as needed by the lightning version of [Tensorflow MoveNet model](https://www.tensorflow.org/hub/tutorials/movenet). Inference yields the coordinates and the associated scores of each key point detected.

In [11]:
%%time
def extract_pose():
    original_backend = matplotlib.get_backend()
    print(f'Switching MatPlotLib backend from {original_backend} to Agg')
    # Pose extraction uses plt's canvas. We need a non-interactive backend to avoid memory leaks.
    matplotlib.use('Agg')
    pose_extractor = eda_helpers.PoseExtractor(config, face_config, pose_config, tqdm)
    pose_extractor.extract_poses(pose_config.FEATURES_FOLDER, pose_config.SUMMARY_NAME)
    matplotlib.use(original_backend)

extract_pose() #Takes long time

Switching MatPlotLib backend from agg to Agg
Loading model...


  0%|          | 0/6000 [00:00<?, ?it/s]

CPU times: user 14.1 s, sys: 2.12 s, total: 16.2 s
Wall time: 18 s


In [12]:
%matplotlib inline

# Switch matplotlib's backend to default backend just in case the last cell left it changed.

## Step 4: Generate Feature Vectors

In [14]:
%%time  
# faces, original_images, poses, y, filenames = load_data(30, other_types=[enums.ImageTypes.ORIGINAL, enums.ImageTypes.POSE, enums.ImageTypes.FACE], included_labels=config.included_labels)
feature_extractor = feature_helpers.FeatureExtractor(config, face_config, pose_config, tqdm)
data = feature_extractor.load_data(image_types=IMAGE_TYPES, sample_type=enums.SampleType.TRAIN_TEST_VALIDATION, shuffle=True)

Loading 6000 samples:   0%|          | 0/6000 [00:00<?, ?samples/s]

CPU times: user 7min 44s, sys: 1min 21s, total: 9min 6s
Wall time: 1min 35s


In [15]:
%%time
hog_features, hogs = feature_extractor.get_hog_features(data[enums.ImageTypes.FACE.name.lower()])

Building HOG Features:   0%|          | 0/6000 [00:00<?, ?images/s]

CPU times: user 2min 44s, sys: 3.19 s, total: 2min 47s
Wall time: 2min 50s


In [16]:
%%time
pixel_features = feature_extractor.get_pixel_features(data[enums.ImageTypes.FACE.name.lower()])

Building Pixel Features:   0%|          | 0/6000 [00:00<?, ?images/s]

CPU times: user 583 ms, sys: 631 ms, total: 1.21 s
Wall time: 1.27 s


In [17]:
%%time
# GPU does not seems to help much.
cnn_features = feature_extractor.get_cnn_features(data[enums.ImageTypes.ORIGINAL.name.lower()], device='cpu')

Building ResNet152 Features:   0%|          | 0/6000 [00:00<?, ?images/s]

CPU times: user 1h 50min 8s, sys: 21min 21s, total: 2h 11min 30s
Wall time: 22min 50s


In [18]:
%%time
canny_features, cannies = feature_extractor.get_canny_features(data[enums.ImageTypes.FACE.name.lower()])

Building Edge Features:   0%|          | 0/6000 [00:00<?, ?images/s]

CPU times: user 4.38 s, sys: 2.82 s, total: 7.2 s
Wall time: 4.53 s


In [19]:
%%time
pose_features = feature_extractor.get_pixel_features(data[enums.ImageTypes.POSE.name.lower()])

Building Pixel Features:   0%|          | 0/6000 [00:00<?, ?images/s]

CPU times: user 1.68 s, sys: 3.35 s, total: 5.04 s
Wall time: 6.89 s


In [20]:
%%time
# Generate the arms position and orientation features
_ = feature_extractor.get_body_part_features(out_csv='body_parts_feat.csv')

Loading 6000 samples:   0%|          | 0/6000 [00:00<?, ?samples/s]

Building Body-Part Features:   0%|          | 0/6000 [00:00<?, ?images/s]

  left_arm_feature, l_exists = self._get_bp_feature_for([255, 115, 75], channel_1) # (255, 115, 75) is color for left arm
  left_arm_feature, l_exists = self._get_bp_feature_for([255, 115, 75], channel_1) # (255, 115, 75) is color for left arm
  left_arm_feature, l_exists = self._get_bp_feature_for([255, 115, 75], channel_1) # (255, 115, 75) is color for left arm
  left_arm_feature, l_exists = self._get_bp_feature_for([255, 115, 75], channel_1) # (255, 115, 75) is color for left arm
  left_arm_feature, l_exists = self._get_bp_feature_for([255, 115, 75], channel_1) # (255, 115, 75) is color for left arm
  left_arm_feature, l_exists = self._get_bp_feature_for([255, 115, 75], channel_1) # (255, 115, 75) is color for left arm
  left_arm_feature, l_exists = self._get_bp_feature_for([255, 115, 75], channel_1) # (255, 115, 75) is color for left arm
  left_arm_feature, l_exists = self._get_bp_feature_for([255, 115, 75], channel_1) # (255, 115, 75) is color for left arm
  left_arm_feature, l_ex

CPU times: user 1h 22min 34s, sys: 7min 17s, total: 1h 29min 51s
Wall time: 15min 28s


In [22]:
eye_count = feature_extractor.detect_eyes(config.TRAIN_DATA, data[enums.DataColumn.LABEL.name.lower()], data[enums.DataColumn.FILENAME.name.lower()])

Getting eye counts:   0%|          | 0/6000 [00:00<?, ?images/s]

In [23]:
def eye_summary(eye_count):
    df = pd.DataFrame(eye_count, columns=['count'])
    print(f'0: {df[df["count"] == 0].shape[0]}, 1: {df[df["count"] == 1].shape[0]}, 2: {df[df["count"] == 2].shape[0]}')
          
eye_summary(eye_count)

0: 4711, 1: 1148, 2: 131


In [24]:
%%time
# Save the generated feature vectors.
features_list = [pixel_features, hog_features, cnn_features, canny_features, pose_features, None]
feature_extractor.save_feature_vectors(config.FEATURE_VECTORS_FOLDER, data['filename'], data['label'], features_list)

Saving feature vectors:   0%|          | 0/6000 [00:00<?, ?images/s]

CPU times: user 10.7 s, sys: 30.6 s, total: 41.3 s
Wall time: 48.5 s


In [25]:
print(f'Loaded {data.shape[0]} samples.')
print(f'hog_features:{hog_features.shape}, hog_features.min:{np.min(hog_features)}, hog_features.max:{np.max(hog_features)}')
print(f'pixel_features:{pixel_features.shape}, pixel_features.min:{np.min(pixel_features)}, pixel_features.max:{np.max(pixel_features)}')
print(f'cnn_features:{cnn_features.shape}, cnn_features.min:{np.min(cnn_features)}, cnn_features.max:{np.max(cnn_features)}')
print(f'canny_features:{canny_features.shape}, canny_features.min:{np.min(canny_features)}, canny_features.max:{np.max(canny_features)}')
print(f'pose_features:{pose_features.shape}, pose_features.min:{np.min(pose_features)}, pose_features.max:{np.max(pose_features)}')
# print(f'body_parts_features.min:{np.min(body_parts_features)}, body_parts_features.max:{np.max(body_parts_features)}')
print()


Loaded 6000 samples.
hog_features:(6000, 5776), hog_features.min:0.0, hog_features.max:1.0
pixel_features:(6000, 25600), pixel_features.min:0.0, pixel_features.max:0.9999000430107117
cnn_features:(6000, 2048), cnn_features.min:0.0, cnn_features.max:1.6112374067306519
canny_features:(6000, 25600), canny_features.min:0, canny_features.max:255
pose_features:(6000, 65536), pose_features.min:0.0, pose_features.max:0.6640035510063171



## References
<span id="cite_mtcnn">X. He, P. Wang, Z. Zhao, Y. Zhao and F. Su, "MTCNN with Weighted Loss Penalty and Adaptive Threshold Learning for Facial Attribute Prediction," 2019 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2019, pp. 180-185, doi: 10.1109/ICMEW.2019.00-90.</span>

<span id="cite_mobilenet">Sandler, M., Howard, A., Zhu, M., et al. (2018) Mobilenetv2: Inverted Residuals and Linear Bottlenecks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, 18-23 June 2018, 4510-4520.
https://doi.org/10.1109/CVPR.2018.00474</span>