# CarScan Task 2.1 - Damage Classifier
<b>Note:</b> The following notebook was executed on Kaggle playground, and hence does not include code for downloading the dataset seperately. Some of the code may be specific to the Kaggle environment.

This notebook trains a model for classifying vehicles' images as damaged and not damaged.

In [1]:
import tensorflow as tf
from tensorflow import keras
import cv2
import matplotlib.pyplot as plt

In [2]:
# change current working directory as kaggle is weird
# import os

# os.chdir('../input/car-damage-detection')
# os.getcwd()

In [3]:
img = cv2.imread('/kaggle/input/car-damage-detection/data1a/training/00-damage/0001.JPEG')
plt.imshow(img)
plt.show()

In [4]:
img.shape

### 1. Preprocessing and Augmentation
Data Augmentation is a technique used to create more data from existing ones by applying different kinds of transformations on it. Some of them applied here are:
- Rotation
- Horizontal flip
- Width and Height shifting
<br/><br/>

Some preprocessing done to the data are:
- Rescaling
- Feature-wise centering
- Sample-wise centering
This is done as the model used here (MobileNetV2) expects values between [-1,1]
An ImageDataGenerator is defined and points to the training and validation directories, which can be fed to the model to train.

In [5]:
datagen = keras.preprocessing.image.ImageDataGenerator(
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    horizontal_flip=True,
    rescale=2/255.0,
    featurewise_center=True, 
    samplewise_center=True
)

batch_size = 16

train_datagen = datagen.flow_from_directory(
    directory='/kaggle/input/car-damage-detection/data1a/training',
    target_size=(256, 256),
    batch_size=batch_size,
    subset='training',
    color_mode='rgb',
    class_mode='categorical',
    shuffle=True,
    seed=22
)

validation_generator = datagen.flow_from_directory(
    directory='/kaggle/input/car-damage-detection/data1a/validation',
    target_size=(256, 256),
    batch_size=batch_size,
    subset='training',
    color_mode='rgb',
    class_mode='categorical',
    shuffle=True,
    seed=22
)

### 2. Model Definition and Training
The model used here is a pre-trained MobileNetV2, and is fine-tuned using transfer learning. This seemed like a good way to approach the problem as it is generally more accurate.

In [6]:
w, h, c = (256, 256, 3)

mobilenet = keras.applications.MobileNetV2(
    input_shape=(w,h,c),
    pooling='avg',
    include_top=False,
    weights='imagenet',
    classes=2
)

In [7]:
# mobilenet.summary()

In [8]:
len(mobilenet.layers)

The first few layers of the model are frozen (non-trainable). This is done for some reasons like:
- Reducing training complexity/time
- Reducing number of trainable parameters

In [9]:
# Total of 175 layers, set first 120 as non-trainable
for layer in mobilenet.layers[:110]:
    layer.trainable = False

for layer in mobilenet.layers[110:]:
    layer.trainable = True
        
mobilenet.summary()

In [10]:
model = keras.models.Sequential([
    mobilenet,
    keras.layers.Dense(512, activation=keras.activations.relu),
    keras.layers.Dense(128, activation=keras.activations.relu),
    keras.layers.Dense(2, activation=keras.activations.softmax)
])

In [11]:
model.summary()

In [13]:
import os
print(os.getcwd())

The following callbacks are defined:
- Model checkpointing, for storing model weights after each epoch
- Early stopping with patience, to avoid overfitting

In [14]:
base_learning_rate = 0.0001
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=base_learning_rate),
              loss=tf.keras.losses.binary_crossentropy,
              metrics=['accuracy'])

ckpt_path = os.path.join(os.getcwd(), 'checkpoint')
checkpoint = keras.callbacks.ModelCheckpoint(
    filepath=ckpt_path,
    monitor='val_accuracy',
    mode='max',
    save_freq='epoch'
)

earlystopping = keras.callbacks.EarlyStopping(
    monitor='val_accuracy',
    patience=5,
    mode='max'
)

callbacks = [checkpoint, earlystopping]

In [15]:
EPOCHS = 50

history = model.fit(
    train_datagen,
    validation_data=validation_generator,
    steps_per_epoch=train_datagen.n//train_datagen.batch_size,
    validation_steps=validation_generator.n//validation_generator.batch_size,
    epochs=EPOCHS,
    callbacks=callbacks
)

Generally, the model is fine tuned by unfreezing the previously frozen layers and further decreasing the learning rate. Since this model consistently achived validation accuracies >90% without fine-tuning, it was not done.
<br/><br/>
After training, the model weights are saved in a directory ('./checkpoint') and is compressed into a zip file. A download link is generated that automatically downloads the file upon clicking.

In [16]:
!zip -r file.zip {os.path.join(os.getcwd(), 'checkpoint')}

In [17]:
from IPython.display import FileLink
FileLink('./file.zip')

For convinience, the trained model's weights have been uploaded to GDrive and the link is provided below:
https://drive.google.com/drive/folders/15PDR5tNwqAQ6TobLVU6LRNhOVSN5RcCA?usp=sharing