<a href="https://colab.research.google.com/github/soohyunnie/Human-Age-Detection/blob/main/pretrained_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import os, shutil, random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import matplotlib.image as mpimg
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator
from keras import models
from keras.layers import Dense, Dropout, Conv2D, MaxPooling2D, Flatten, Activation, BatchNormalization
from keras.regularizers import l2
from sklearn.utils import class_weight 
from keras.applications import vgg16
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import seaborn as sns 
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In this notebook, we are going to model our data with pretrained model VGG16.

This was done in google colab so the directory for the images and folders are different than the other notebook.

First, we are going to unzip our split folder which contains train, validation, and test folders.

In [2]:
!unzip "/content/drive/My Drive/split2.zip"

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: split2/validation/age_0_20_imgs/1_0_0_20161219202455708.jpg  
  inflating: split2/validation/age_0_20_imgs/1_0_0_20161219203657925.jpg  
  inflating: split2/validation/age_0_20_imgs/1_0_0_20161220220239129.jpg  
  inflating: split2/validation/age_0_20_imgs/1_0_0_20170103210905939.jpg  
  inflating: split2/validation/age_0_20_imgs/1_0_0_20170109191432590.jpg  
  inflating: split2/validation/age_0_20_imgs/1_0_0_20170109191725028.jpg  
  inflating: split2/validation/age_0_20_imgs/1_0_0_20170109192836519.jpg  
  inflating: split2/validation/age_0_20_imgs/1_0_0_20170109192948605.jpg  
  inflating: split2/validation/age_0_20_imgs/1_0_0_20170109193440113.jpg  
  inflating: split2/validation/age_0_20_imgs/1_0_0_20170109193511684.jpg  
  inflating: split2/validation/age_0_20_imgs/1_0_0_20170109193826712.jpg  
  inflating: split2/validation/age_0_20_imgs/1_0_0_20170109194400094.jpg  
  inflating: split2/validation/age_

In [3]:
train_dir = 'split2/train'
validation_dir = 'split2/validation'
test_dir = 'split2/test'

Now, we are going to genrate ImageDataGenerator and get the images from each folders.

In [4]:
data_gen_aug = ImageDataGenerator(rescale=1./255, 
                                        rotation_range=30, 
                                        horizontal_flip=True)
                                        
train_generator_aug = data_gen_aug.flow_from_directory(train_dir, target_size=(256, 256), batch_size=128)

validation_generator_aug = data_gen_aug.flow_from_directory(validation_dir, target_size=(256, 256), batch_size=128)

test_data_gen= ImageDataGenerator(rescale=1./255)

test_generator = test_data_gen.flow_from_directory(test_dir, target_size=(256, 256), batch_size=32)

Found 16000 images belonging to 5 classes.
Found 5000 images belonging to 5 classes.
Found 3108 images belonging to 5 classes.


For the first model, we are going to have a pretrained VGG16 model with just one more Dense layer with 512 neurons.

In [5]:
# Create a VGG16 base model
cnn_base = vgg16.VGG16(weights='imagenet',
                 include_top=False,
                 input_shape=(256, 256, 3))

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5


In [7]:
for layers in cnn_base.layers:
  layers.trainable=False

model = models.Sequential()
model.add(cnn_base)
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dense(5, activation='softmax'))

model.compile(optimizer='Adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

result = model.fit(train_generator_aug,
                    steps_per_epoch=40,
                    epochs=30,
                    validation_data=(validation_generator_aug),
                    validation_steps=25)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [16]:
models.save_model(model, '/content/drive/My Drive/pretrained_model1.h5')

In [8]:
cnn_result_train = model.evaluate(train_generator_aug)
cnn_result_validation = model.evaluate(validation_generator_aug)

print(cnn_result_train, cnn_result_validation)

[0.8675124049186707, 0.6230000257492065] [0.9841499924659729, 0.5619999766349792]


The first model with VGG16 pretrained model gave ~62% accuracy in train data and ~56% on validation data.

To increase the accuracy, let's try adding more dense layers to the model.

In [6]:
for layers in cnn_base.layers:
  layers.trainable=False

model2 = models.Sequential()
model2.add(cnn_base)
model2.add(Flatten())
model2.add(Dense(512, activation='relu'))
model2.add(Dense(64, activation='relu'))
model2.add(Dense(128, activation='relu'))
model2.add(Dense(256, activation='relu'))
model2.add(Dense(128, activation='relu'))
model2.add(Dense(64, activation='relu'))
model2.add(Dense(64, activation='relu'))
model2.add(Dense(5, activation='softmax'))


model2.compile(optimizer='Adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

result2 = model2.fit(train_generator_aug,
                    steps_per_epoch=40,
                    epochs=30,
                    validation_data=(validation_generator_aug),
                    validation_steps=25)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [7]:
models.save_model(model2, '/content/drive/My Drive/pretrained_model2.h5')

In [8]:
cnn_result_train2 = model2.evaluate(train_generator_aug)
cnn_result_validation2 = model2.evaluate(validation_generator_aug)

print(cnn_result_train2, cnn_result_validation2)

[0.8693507313728333, 0.6220625042915344] [0.9482882022857666, 0.5932000279426575]


By adding more dense layers, the validation accuracy went up by ~3%.

Let's see if adding CNN layers will increase the accuracy.

In [6]:
for layers in cnn_base.layers:
  layers.trainable=False

model3 = models.Sequential()
model3.add(cnn_base)
model3.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model3.add(MaxPooling2D((2, 2)))
model3.add(Conv2D(64, (3, 3), activation='relu', padding='same'))
model3.add(MaxPooling2D((2, 2)))
model3.add(Flatten())
model3.add(Dense(512, activation='relu'))
model3.add(Dense(64, activation='relu'))
model3.add(Dense(128, activation='relu'))
model3.add(Dense(256, activation='relu'))
model3.add(Dense(128, activation='relu'))
model3.add(Dense(64, activation='relu'))
model3.add(Dense(64, activation='relu'))
model3.add(Dense(5, activation='softmax'))

model3.compile(optimizer='Adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

result3 = model3.fit(train_generator_aug,
                    steps_per_epoch=40,
                    epochs=30,
                    validation_data=(validation_generator_aug),
                    validation_steps=25)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [7]:
models.save_model(model3, '/content/drive/My Drive/pretrained_model3.h5')

In [8]:
cnn_result_train3 = model3.evaluate(train_generator_aug)
cnn_result_validation3 = model3.evaluate(validation_generator_aug)

print(cnn_result_train3, cnn_result_validation3)

[0.8750633597373962, 0.6124374866485596] [0.9527990221977234, 0.5835999846458435]


Let's see if adding BatchNormalization and regularization will increase the accuracy.

In [14]:
for layers in cnn_base.layers:
  layers.trainable=False

model4 = models.Sequential()
model4.add(cnn_base)
model4.add(Conv2D(64, (3, 3), activation='relu', padding='same', kernel_regularizer=l2(l2=0.001)))
model4.add(MaxPooling2D((2, 2)))
model4.add(Conv2D(64, (3, 3), padding='same', kernel_regularizer=l2(l2=0.001)))
model4.add(MaxPooling2D((2, 2)))
model4.add(BatchNormalization())
model4.add(Activation('relu'))
model4.add(Dropout(0.2))
model4.add(Flatten())
model4.add(Dense(512, activation='relu', kernel_regularizer=l2(l2=0.001)))
model4.add(Dense(64, activation='relu', kernel_regularizer=l2(l2=0.001)))
model4.add(Dense(128, kernel_regularizer=l2(l2=0.001)))
model4.add(BatchNormalization())
model4.add(Activation('relu'))
model4.add(Dense(256, activation='relu', kernel_regularizer=l2(l2=0.001)))
model4.add(Dense(128, activation='relu', kernel_regularizer=l2(l2=0.001)))
model4.add(Dropout(0.2))
model4.add(Dense(64, activation='relu', kernel_regularizer=l2(l2=0.001)))
model4.add(Dense(64, kernel_regularizer=l2(l2=0.001)))
model4.add(BatchNormalization())
model4.add(Activation('relu'))
model4.add(Dropout(0.2))
model4.add(Dense(5, activation='softmax'))

model4.compile(optimizer='Adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

result4 = model4.fit(train_generator_aug,
                    steps_per_epoch=40,
                    epochs=30,
                    validation_data=(validation_generator_aug),
                    validation_steps=25)

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


In [19]:
models.save_model(model4, '/content/drive/My Drive/pretrained_model4.h5')

In [15]:
cnn_result_train4 = model4.evaluate(train_generator_aug)
cnn_result_validation4 = model4.evaluate(validation_generator_aug)

print(cnn_result_train4, cnn_result_validation4)

[1.2455511093139648, 0.5223749876022339] [1.2912092208862305, 0.5012000203132629]


It seems like adding BatchNormalization and regularization decreased the accuracy.