### `list_attr_celeba` Dataset
A popular component of computer vision and deep learning revolves around identifying faces for various applications from logging into your phone with your face or searching through surveillance images for a particular suspect. This dataset is great for training and testing models for face detection, particularly for recognising facial attributes such as finding people with brown hair, are smiling, or wearing glasses. Images cover large pose variations, background clutter, diverse people, supported by a large quantity of images and rich annotations. This data was originally collected by researchers at MMLAB, The Chinese University of Hong Kong (specific reference in Acknowledgment section).



- 202,599 number of face images of various celebrities
- 10,177 unique identities, but names of identities are not given
- 40 binary attribute annotations per image

You can obtain the dataset from https://www.kaggle.com/jessicali9530/celeba-dataset

In [None]:
import numpy as np
import pandas as pd
import os
import keras
import tensorflow as tf
import matplotlib.pyplot as plt
import random
from keras import optimizers
from keras.utils import to_categorical
from keras.models import Model
from tensorflow.keras.preprocessing.image import ImageDataGenerator, load_img  # Import load_img from TensorFlow
from keras.applications.vgg16 import VGG16, preprocess_input
from keras.models import Sequential
from keras.layers import Dense, Dropout, GlobalAveragePooling2D, BatchNormalization
from keras.callbacks import EarlyStopping, ModelCheckpoint
from sklearn.model_selection import train_test_split



Using TensorFlow backend.


In [None]:
mypath='/Users/gceran/Google Drive/Courses/MagniMind/Mentorship Program/DL-MentorProgram/DL_Mentor_Week2/Class 1'
print(os.listdir(mypath))

df=pd.read_csv(mypath+'/list_attr_celeba.csv')

df.head()
df.columns.values

['Facial_keypoints.ipynb', 'GenderIDex.JPG', 'Transfer_learning.ipynb', 'img_align_celeba.zip', 'Week2-Class1-St.zip', 'weights.best.hdf5', '.DS_Store', 'Week2-Class1-St', 'test', 'weights-improvement-17-0.66.hdf5', 'weights-improvement-76-0.74.hdf5', 'weights-improvement-71-0.73.hdf5', 'input', 'small_c_d', 'weights-improvement-43-0.69.hdf5', 'weights-improvement-109-0.76.hdf5', 'celeb_small', 'weights-improvement-51-0.69.hdf5', 'weights-improvement-28-0.67.hdf5', 'weights-improvement-140-0.76.hdf5', 'Celeb_sets', 'P1_Facial_Keypoints', 'Gender_ID_VGG16.py', 'weights-improvement-57-0.71.hdf5', 'get-start-image-classification.ipynb', 'weights-improvement-79-0.75.hdf5', 'cats_dogs', 'list_attr_celeba.csv', 'test1.zip', 'train', 'weights-improvement-01-0.66.hdf5', 'model.h5', 'vgg16_1.h5', 'Gender_ID_VGG16_v02.py', '.ipynb_checkpoints', 'weights-improvement-41-0.69.hdf5', 'Gender_ID_Inception.py', 'diabetes.csv', 'Week2-Class1-Inst', 'weights-improvement-29-0.68.hdf5', 'weights-improveme

array(['image_id', '5_o_Clock_Shadow', 'Arched_Eyebrows', 'Attractive',
       'Bags_Under_Eyes', 'Bald', 'Bangs', 'Big_Lips', 'Big_Nose',
       'Black_Hair', 'Blond_Hair', 'Blurry', 'Brown_Hair',
       'Bushy_Eyebrows', 'Chubby', 'Double_Chin', 'Eyeglasses', 'Goatee',
       'Gray_Hair', 'Heavy_Makeup', 'High_Cheekbones', 'Male',
       'Mouth_Slightly_Open', 'Mustache', 'Narrow_Eyes', 'No_Beard',
       'Oval_Face', 'Pale_Skin', 'Pointy_Nose', 'Receding_Hairline',
       'Rosy_Cheeks', 'Sideburns', 'Smiling', 'Straight_Hair',
       'Wavy_Hair', 'Wearing_Earrings', 'Wearing_Hat', 'Wearing_Lipstick',
       'Wearing_Necklace', 'Wearing_Necktie', 'Young'], dtype=object)

#### See sample image

In [None]:
# Function to load and preprocess images
def load_preprocess_image(filename):
    img = load_img(os.path.join(img_directory, filename), target_size=(178, 218))
    img = preprocess_input(np.array(img))
    return img

In [None]:
# Display a sample image
sample_image_id = df['image_id'].iloc[0]  # Take the first image from the dataset as a sample
sample_image = load_preprocess_image(sample_image_id)

# Denormalize the image to display it correctly
denormalized_image = (sample_image + 1) / 2  # Convert from [-1, 1] range to [0, 1] range

# Display the image
plt.imshow(denormalized_image)
plt.axis('off')
plt.show()

### 4. Build Model

- First, copy VGG16 without the dense layers, use the weights from `imagenet`. Set the input shape to `(178,218,3)`.
- Freeze the layers except the last two layers and print to see if the layers are trainable or not.
- Build your sequential model (you are free to use a functioanl API as a further exercise). Include all the frozen VGG layers to your model. Add a Dense layer with 128 inouts and `relu` activation. Add a batch nomalizer, then a dense layer as the output layer.
- Create an early stopping criteria monitorin the loss value for the validation set. Stop the search if the loss value deosnt change for two consecutive times.
- Compile the model.
- Save the best model automatically based on the performance of the validation set.

In [None]:
# Build the Model
vgg_base = VGG16(weights='imagenet', include_top=False, input_shape=(178, 218, 3))

# Freeze layers except the last two
for layer in vgg_base.layers[:-2]:
    layer.trainable = False

# Create the sequential model and add the layers
model = Sequential()
model.add(vgg_base)
model.add(GlobalAveragePooling2D())
model.add(Dense(128, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(40, activation='sigmoid'))  # Assuming there are 40 binary attributes

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])



In [None]:
# Early stopping and ModelCheckpoint callbacks and save the best model automatically
early_stopping = EarlyStopping(monitor='val_loss', patience=2, verbose=1, mode='auto')
checkpoint = ModelCheckpoint(filepath='best_model.h5', monitor='val_accuracy', save_best_only=True, verbose=1)



## 5. Data Preparation

- Create a validation set with 20% of the data. Check the number of data points per class from both the train and validation sets.
- Set your batch size to 20.
- Create the data generator and set the `preprocessing_function` to `preprocess_input` of VGG16.
- Create train and validation data generators (batches will be picked up from the dataframe). Set target size to (178,218) (you can try something else, but you need to do the corresponding change in the model).
- Set your validation  and epoch step size (`validation_steps` and `steps_per_epoch`)

In [None]:
# Data Preparation
# Create a validation set with 20% of the data
train_df, val_df = train_test_split(df, test_size=0.2, random_state=42)

# Check the number of data points per class in both train and validation sets
train_counts = train_df.sum(axis=0).iloc[1:]
val_counts = val_df.sum(axis=0).iloc[1:]
print("Train Data Class Distribution:")
print(train_counts)
print("\nValidation Data Class Distribution:")
print(val_counts)



In [None]:
# Set batch size
batch_size = 20

# Create data generators
train_data_gen = ImageDataGenerator(
    preprocessing_function=preprocess_input,
    horizontal_flip=True,
    vertical_flip=True,
)

val_data_gen = ImageDataGenerator(
    preprocessing_function=preprocess_input
)



## 6. Train the Model

- Fit the model
- save the model

In [None]:
# Train and validation data generators
train_generator = train_data_gen.flow_from_dataframe(
    dataframe=train_df,
    directory=img_directory,
    x_col='image_id',
    y_col=train_df.columns[1:],
    batch_size=batch_size,
    shuffle=True,
    target_size=(178, 218),
    class_mode='raw'
)

val_generator = val_data_gen.flow_from_dataframe(
    dataframe=val_df,
    directory=img_directory,
    x_col='image_id',
    y_col=val_df.columns[1:],
    batch_size=batch_size,
    shuffle=False,
    target_size=(178, 218),
    class_mode='raw'
)



In [None]:
# Set the validation and epoch step size
validation_steps = len(val_generator)
steps_per_epoch = len(train_generator)

In [None]:
# Train the model
history = model.fit(
    train_generator,
    validation_data=val_generator,
    validation_steps=validation_steps,
    steps_per_epoch=steps_per_epoch,
    epochs=20,  # You can adjust the number of epochs as needed
    callbacks=[early_stopping, checkpoint]
)

In [None]:
# Save the final model
model.save('final_model.h5')
