# Table of Contents
- [Objective](#objective)
- [Image preparation](#img_prep)
- [Build and train CNN](#build_and_train_cnn)
- [Predict using model](#predict_cnn)

In [55]:
import numpy as np
import tensorflow as tf
import pandas as pd
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Activation, Dense, Flatten, BatchNormalization, Conv2D, MaxPool2D
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from sklearn.metrics import confusion_matrix
import itertools
import os
import shutil
import random
import glob
import matplotlib.pyplot as plt
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
%matplotlib inline

In [56]:
# check if tensorflow is identifying gpu, uncomment if using gpu
#physical_devices = tf.config.experimental.list_physical_devices('GPU')
#print("Num GPUs Available: " , len(physical_devices))
#tf.config.experimental.set_memory_growth(physical_devices[0], True)

In [3]:
# original working directory
owd = os.getcwd()
print(owd)

/Users/mherarabian/Desktop/ml_projects/face_mask_detection


<a id='objective'></a> 
# Objective
Build and train a **Convolutional Neural Network** using Tensorflow's Keras API that can classify images of people by detecting face masks. The goal is to train a NN model that can take an image and identify if individual(s) in the image are not wearing face masks, or are wearing one incorrectly. 

The model should classify images as either "mask" (1)  or "no-mask" (0). 

<a id='img_prep'></a> 
# Image Preparation
We will go thru all the **image preparation and processing** steps needed to train our **convolutional neural network**. 

First thing we need to do is to get and prepare out data set for which we will be training our model.
The data set we’ll use is from the Kaggle [Face Mask Detection Dataset](https://www.kaggle.com/wobotintelligence/face-mask-detection-dataset).

## Data preparation (including data wrangling, and cleaning)

Read about data wrangling, data cleaning [here](https://mc.ai/data-cleaning-vs-data-wrangling-2/) and [here](https://theappsolutions.com/blog/development/data-wrangling-guide-to-data-preparation/). In short, *data wrangling* is about transforming the data into the right format. *Data cleaning* is finding and removing incorrect and inaccurate records from a recordset or a data source and modifying or deleting this data

First let's organize data into train, valid, test directories.

Our raw data set contains 6,024 total images (in .png .jpg .jpeg formats) found under directory *data/medical_masks/images*. The first 1800 images are not labeled, and should be used for the **test set**. The remaining 4,224 can be used for **training** and **validation sets**.

For now, let us only use a small subset of this data to train faster. Lets work with a subset consisting of 1,000 images in training set, 200 in validation set, and 100 in our test set. Each of those sets are going to be split evenly amongst **mask** and **no-mask**.

Let's start with loading the labels. We will first **load** the *data/train.csv* file into a Pandas DataFrame, which contains the labels of our images in data set (excluding first 1800 images). Each image will have a unique identifier **name**, and a **classname** which is one of the classes defined in *data/medical_masks/meta.json* (e.g. face_with_mask, face_no_mask, mask_surgical, mask_colorful, face_shield, etc).

### Load a csv file of data using Pandas

In [90]:
df = pd.read_csv('data/train.csv')

In the dataframe below, each sample has a: 

- **name** - these are the names of the images in our dataset.
- **x1, x2, y1, y2** - [bounding box](https://medium.com/anolytics/the-use-of-bounding-boxes-in-image-annotation-for-object-detection-6371711eabba) coordinates, we will ignore these.
- **classname** - one of 20 possible classes: hijab_niqab, mask_colorful, mask_surgical, face_no_mask, face_with_mask_incorrect, face_with_mask, face_other_covering, scarf_bandana, balaclava_ski_mask, face_shield, other, gas_mask, turban, helmet, sunglasses, eyeglasses, hair_net, hat, goggles, hood.

These are our *column labels*. Lets check a few samples (rows) in our dataframe.

In [87]:
# Check out a few rows of the dataframe
print(df.shape)
df.head()

(15412, 6)


Unnamed: 0,name,x1,x2,y1,y2,classname
0,2756.png,69,126,294,392,face_with_mask
1,2756.png,505,10,723,283,face_with_mask
2,2756.png,75,252,264,390,mask_colorful
3,2756.png,521,136,711,277,mask_colorful
4,6098.jpg,360,85,728,653,face_no_mask


### Drop columns we don't need
* include subset of columns of larger data frame

In [88]:
# drop columns we don't want
# axis=0 means along the rows and axis=1 along the columns.
df = df.drop(labels=['x1', 'x2', 'y1', 'y2'], axis=1)
print(df.shape)

(15412, 2)


### Filtering Data (slicing)
* include a subset (slice) of rows from larger data frame

In [99]:
#interested_classnames = ['mask_colorful', 'mask_surgical', 'face_no_mask', 'face_with_mask_incorrect', 'face_with_mask', 'other', 'turban', 'helmet', 'sunglasses', 'eyeglasses', 'hair_net', 'hat', 'goggles', 'hood']
# Select only classnames that we are interested in
#df = df[df['classname'] == 'mask_colorful']


(1876, 6)

### Drop duplicates

In [85]:
# drop duplicates on specific column
df = df.drop_duplicates(subset=['name'])
print(df.shape)

(4326, 2)


### Check for missing and null values

In [89]:
# Check for missing values
print(df['classname'].hasnans)
# Check for null values
print(sum(df['classname'].isnull()))

False
0


### Reset indices in dataframe

In [79]:
df = df.reset_index(drop=True)
df.head()

Unnamed: 0,name,classname
0,2756.png,face_with_mask
1,6098.jpg,face_no_mask
2,6427.png,face_with_mask_incorrect
3,4591.png,face_with_mask
4,5392.jpg,face_other_covering


**Get entry for image 1801**

In [80]:
df.loc[df['name'] == '1801.jpg']

Unnamed: 0,name,classname
3742,1801.jpg,face_no_mask


### Change all classnames to mask or no-mask

In [95]:
# get classname Pandas Series as Python list
classnames = list(df['classname'])

# takes in a list, and returns new list where each entry is either 'mask' or 'no-mask'
def mask_or_no_mask(classnames):
    mask = ['mask_colorful', 'mask_surgical', 'face_with_mask']
    no_mask = ['hijab_niqab', 'face_no_mask', 'face_with_mask_incorrect', 'face_other_covering', 'scarf_bandana', 'balaclava_ski_mask, face_shield, other, gas_mask, turban, helmet, sunglasses, eyeglasses, hair_net, hat, goggles, hood]
    new_list = []

SyntaxError: invalid syntax (<ipython-input-95-4d81abb00572>, line 5)

In [24]:
# change directory to medical_masks dir
os.chdir('data/medical_masks')

# if train directory does not exist, make it
if os.path.isdir('train/mask') is False:
    os.makedirs('train/mask')
    os.makedirs('train/no-mask')
    os.makedirs('valid/mask')
    os.makedirs('valid/no-mask')
    os.makedirs('test/mask')
    os.makedirs('test/no-mask')
    
    # place images in training set
    
    
    # place images in validation set

    
    # place images in test set

<a id='build_and_train_cnn'></a>
# Build and Train a Convolutional Neural Network with Tensorflow's Keras API

<a id='predict_cnn'></a>
# Convolutional Neural Network Predictions With TensorFlow's Keras API