**Problem Statement:** Company X owns a movie application and repository which caters movie streaming to millions of users who on subscription
basis. Company wants to automate the process of cast and crew information in each scene from a movie such that when a user pauses on
the movie and clicks on cast information button, the app will show details of the actor in the scene. Company has an in-house computer
vision and multimedia experts who need to detect faces from screen shots from the movie scene.
The data labelling is already done. Since there higher time complexity is involved in the

In [None]:
#importing neccesary libraries
import numpy as np
import cv2
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, concatenate
from google.colab import drive

In [None]:
drive.mount('/content/drive')

In [None]:
file_path = '/content/drive/My Drive/images.npy'

In [None]:
data = np.load(file_path, allow_pickle=True)

In [None]:
print('Shape of X_train: ', X_train.shape)
print('Shape of mask array: ', masks.shape)

In [None]:
data[70][1]

In [None]:
from tensorflow.keras.applications.mobilenet import preprocess_input

masks = np.zeros((int(data.shape[0]), 224, 224))
X_train = np.zeros((int(data.shape[0]), 224, 224, 3))
for index in range(data.shape[0]):
    img = data[index][0]
    img = cv2.resize(img, dsize = (224, 224), interpolation = cv2.INTER_CUBIC)
    try:
      img = img[:, :, :3]
    except:
      continue
    X_train[index] = preprocess_input(np.array(img, dtype = np.float32))
    for i in data[index][1]:
        x1 = int(i["points"][0]['x'] * 224)
        x2 = int(i["points"][1]['x'] * 224)
        y1 = int(i["points"][0]['y'] * 224)
        y2 = int(i["points"][1]['y'] * 224)
        masks[index][y1:y2, x1:x2] = 1

In [None]:
n = 12
print(X_train[n])
plt.imshow(X_train[n])

In [None]:
plt.imshow(masks[n])

In [None]:
import tensorflow as tf

In [None]:
from tensorflow.keras.applications import MobileNet

In [None]:

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2D, Reshape
from tensorflow.keras.layers import Conv2D, UpSampling2D, concatenate
from tensorflow.keras.layers import Concatenate, UpSampling2D

In [None]:
def create_model(trainable = True):
    IMG_SHAPE = (224, 224, 3)
    model = MobileNet(input_shape = IMG_SHAPE, alpha = 1.0, include_top = False, weights = 'imagenet')
    for layer in model.layers:
        layer.trainable = trainable

    block0 = model.get_layer('conv_pw_1_relu').output
    block1 = model.get_layer('conv_pw_3_relu').output
    block2 = model.get_layer('conv_pw_5_relu').output
    block3 = model.get_layer('conv_pw_11_relu').output
    block4 = model.get_layer('conv_pw_13_relu').output

    x = Concatenate()([UpSampling2D()(block4), block3])
    x = Concatenate()([UpSampling2D()(x), block2])
    x = Concatenate()([UpSampling2D()(x), block1])
    x = Concatenate()([UpSampling2D()(x), block0])
    x = UpSampling2D()(x)
    x = Conv2D(1, kernel_size = 1, activation = "sigmoid")(x)

    x = Reshape((224, 224))(x)

    return Model(inputs = model.input, outputs = x)

In [None]:
# Give trainable=False as argument, if you want to freeze lower layers for fast training (but low accuracy)
model = create_model()

# Print summary
model.summary()

**Design your own Dice Coefficient and Loss function:**

In [None]:
from tensorflow.keras.losses import binary_crossentropy
from tensorflow.keras.backend import log, epsilon

def dice_coefficient(y_true, y_pred):
    numerator = 2 * tf.reduce_sum(y_true * y_pred)
    denominator = tf.reduce_sum(y_true + y_pred)

    return numerator / (denominator + tf.keras.backend.epsilon())

In [None]:
def loss(y_true, y_pred):
    return binary_crossentropy(y_true, y_pred) - log(dice_coefficient(y_true, y_pred) + tf.keras.backend.epsilon())

In [None]:
from tensorflow.keras.optimizers import Adam
optimizer = Adam(learning_rate=1e-4, beta_1=0.9, beta_2=0.999, epsilon=None, amsgrad=False)
model.compile(loss=loss, optimizer=optimizer, metrics=[dice_coefficient])

In [None]:
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping, ReduceLROnPlateau

checkpoint = ModelCheckpoint("model-{loss:.2f}.h5", monitor="loss", verbose=1, save_best_only=True,
                             save_weights_only=True, mode="min", save_freq=1)
stop = EarlyStopping(monitor="loss", patience=5, mode="min")
reduce_lr = ReduceLROnPlateau(monitor="loss", factor=0.2, patience=5, min_lr=1e-6, verbose=1, mode="min")

In [None]:
model.fit(X_train, masks, epochs = 10, batch_size = 1, validation_split = 0.1,  #splitting 10% of data into validation set
                    callbacks = [checkpoint, reduce_lr, stop],
                    workers = 8,
                    use_multiprocessing = True,
                    verbose = 1)

In [None]:
n = 10
sample_image = X_train[n]
final_image = sample_image
print(sample_image.shape)
plt.imshow(sample_image)

**Conclusion:**

Project was all about how we can make use of a pretrained MobileNet (Transfer Learning) and on top of it add all the UNET layers to train, fit and evaluate model with an objective to predict the boundaries(mask) around the face in a given image.

    Model was complied using binary cross entropy as loss, adam optimizer and dice coefficient as metrics.
    
    Model checkpoint, early stopping and learning rate reducers were used as callbacks.
    Data was split into train and validation using 90/10 ratio. Best loss I got is 0.4323 and dice_coefficient of 0.7652 on the training data with just 10 epochs.
    
    Model weights for this were used and then used to predict on validation data to get mask.
    
    Further checked on sample image and imposed mask on the image.
    As seen in the above images, it can be seen that model does a very good job in predicting the masks.
