<img style="float:left;" src="https://img.icons8.com/carbon-copy/100/000000/futurama-bender.png"/>
<h1 style="text-align: center; font-size:50px;"> AGE, GENDER and ETHNICITY </h1> 
<hr>
<hr>

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
import tensorflow as tf
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.models import Model
from tensorflow.keras.callbacks import ModelCheckpoint
from keras.utils import np_utils
from sklearn.preprocessing import LabelEncoder
from tensorflow.keras.preprocessing import text
from sklearn.metrics import classification_report
from tokenizers import Tokenizer, models, pre_tokenizers, decoders, processors
from tensorflow.keras.callbacks import ReduceLROnPlateau, EarlyStopping
import plotly.express as px
import eli5
import matplotlib.pyplot as plt
import seaborn as sns

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

### Content

This dataset includes a CSV of facial images that are labeled on the basis of age, gender, and ethnicity.
The dataset includes 27305 rows and 5 columns.

***To see the preprocessing please unhide the following cell***

In [None]:
data = pd.read_csv('/kaggle/input/age-gender-and-ethnicity-face-data-csv/age_gender.csv')
data['pixels'] = data.pixels.apply(lambda x: x.split(' '))
data['pixels'] = data.pixels.apply(lambda x: np.array([int(v) for v in x]))
data['pixels'] = data.pixels.apply(lambda x: x.reshape(48,48))

## Table of contents

- [Exploring the data](#a)
- [SE-RESNET block](#b)
- [SENET model](#c)
- [Learning rate](#d)
- [Training step](#e)

<h2 style="text-align: center;">Exploring the data</h2> <a id=a><a/>
<hr>

In [None]:
plt.figure(figsize=[16,16])
for i in range(1500,1520):
    plt.subplot(5,5,(i%25)+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(data['pixels'].iloc[i])
    plt.xlabel(
        "Age:"+str(data['age'].iloc[i])+
        "  Ethnicity:"+str(data['ethnicity'].iloc[i])+
        "  Gender:"+ str(data['gender'].iloc[i])
    )
plt.show()

**We decided to split 25% of the data for the validation set**

Then we expand the dimension of the predictors (X_train and X_val). This is an essential step.

**Reminder:**
- the inputs for conv 2D is (height, width, channels).
- In our case channel is equal to 1 because there is only one color (1 array).
- height and width can be easily calculated as the racine of the array's length which is 48.

In [None]:
X_train, X_val, y_train, y_val = train_test_split(data.drop(['age','ethnicity','gender','img_name'], axis=1),
                                                  data[['age','ethnicity','gender']], random_state=0, test_size=0.25)


def preprocess (df, y):
    """Redim df"""
    X = np.zeros((len(df.values), 48, 48, 1))
    for idx,array in enumerate(df[y]):
        X[idx, :, :, 0] = array
    return X

# We expand dimension to fit with the CNN inputs
Xtrain = preprocess(X_train, 'pixels')
Xval = preprocess(X_val, 'pixels')

# We decided to make prediction only on age but it can easily be done on the other 
ytrain = y_train.age.values
yval = y_val.age.values

<h2 style="text-align: center;">SE-RESNET Block<h2><a id=b><a/>
<hr>

OK, the following cell is probably the most important cell of the notebook.
We implement here a residual bloc and an SE block which combined together can make really good predictions.

A SE block is not looking for spatial patterns like CNN, it learns the caracteristics which work well in group. Like nose and mouth are relatively close on a face the NN will expect to see eyes. If it constats a high activation for the nose and mouth feature cards and a medium one for the eyes, the block will excite the last one.

A block SE has only 3 layers and pulls out a vector which will multiply the feature cards of a previous resnet block.

<img style="float:left;" src="https://www.bing.com/images/search?view=detailV2&ccid=hRXYOTa3&id=C717317A9D1CA65A7614794180E81FD091010768&thid=OIP.hRXYOTa3gMIA9iyvnuNyEgHaEJ&mediaurl=https%3a%2f%2fpic1.zhimg.com%2fv2-8515d83936b780c200f62caf9ee37212_r.jpg&exph=705&expw=1260&q=senet+resnet+block&simid=608043120371368827&ck=FC3F8BB7ABDA784C8F6C42541078B8EC&selectedIndex=4&FORM=IRPRST&ajaxhist=0"/> 

In [None]:
class ResidualUnit(keras.layers.Layer):
    def __init__(self, filters, strides=1, activation="relu", **kwargs):
        super().__init__(**kwargs)
        self.activation = keras.activations.get(activation)
        self.main_layers = [keras.layers.Conv2D(filters, 3, strides=strides, padding="same",use_bias=False),
                            keras.layers.BatchNormalization(), # Normalize the outputs
                            self.activation,
                            keras.layers.Conv2D(filters, 3, strides=1, padding='same', use_bias=False),
                            keras.layers.BatchNormalization()]
        self.skip_layers = [
            keras.layers.Conv2D(filters, 1, strides=strides,padding="same",use_bias=False),
            keras.layers.BatchNormalization()
        ]
    
    # We don't forget the call method which is called during the training and prediction
    def call(self, inputs):
        Z = inputs
        for layer in self.main_layers:
            Z = layer(Z)
        skip_Z = inputs
        for layer in self.skip_layers:
            skip_Z = layer(skip_Z)
        return self.activation(Z + skip_Z)
    
class SEBloc(keras.layers.Layer):
    def __init__(self, pool, **kwargs):
        super().__init__(**kwargs)
        self.main_layers = [keras.layers.AveragePooling2D(
                                pool_size=pool, strides=1, padding="same"), # pool_size is important, we need a scalar per feature card
                            keras.layers.Dense(5, activation='relu'), # embedding
                            keras.layers.Dense(64, activation='sigmoid')] # outputs 
    
    def call(self, inputs):
        Z = inputs
        for layer in self.main_layers:
            Z = layer(Z)
        return Z

<h2 style="text-align: center;">SENET model<h2><a id=c><a/>
<hr>

this is not a real SENET because it contains only one SE block and RES block but this is the theory.

In [None]:
EPOCHS = 50
BATCH_SIZE = 32

inputs = tf.keras.Input(shape=(48,48,1), dtype="float32")
x = keras.layers.Conv2D(64, 5, strides=2, input_shape=[48,48,1])(inputs)
x = keras.layers.BatchNormalization()(x)
x = keras.layers.Activation("relu")(x)
x = keras.layers.MaxPool2D(pool_size=2, strides=2, padding='same')(x)

x_res = ResidualUnit(64, strides=1)(x) # RES

x_se = SEBloc(x.shape[1])(x) #SE

x_res_se = keras.layers.Multiply()([x_res, x_se]) # Multiply outputs of SE and RES
x = keras.layers.Add()([x_res_se, x])

x = keras.layers.Dropout(0.5)(x)
x = keras.layers.Flatten()(x)
output = keras.layers.Dense(1, activation='relu')(x) # One output with relu for the regression
model = tf.keras.Model(inputs, output)

<h2 style="text-align: center;">Learning rate<h2><a id=d><a/>
<hr>

the learning rate is an important hyperparameter. It can save you a lot of time. the following cell show you how to have a precise idea of the optimal lr to choose.

In [None]:
#Learning Rate is one of the most important hyperparameter so the following piece of code is a way to find a good LR
import keras
class ExponentialLearningRate(keras.callbacks.Callback):
    
    def __init__(self, K, factor):
        self.factor = factor
        self.rates = []
        self.losses = []
        self.K = K
        
    def on_batch_end(self, batch, logs):
        
        self.rates.append(self.K.get_value(self.model.optimizer.lr))
        self.losses.append(logs["loss"])
        self.K.set_value(self.model.optimizer.lr, self.model.optimizer.lr * self.factor)
        
        
def bestLearningRate():
        
        print("\n\n********************** Best learning rate calculation ******************\n\n")
        K = keras.backend
        model.compile(loss='mse', optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), metrics=[tf.keras.metrics.MeanAbsoluteError()])
        expon_lr = ExponentialLearningRate(K,factor=1.0003)
        model.fit(Xtrain, ytrain, validation_data=(Xval, yval), epochs = 20, callbacks=[expon_lr])
        print("*************************************************************************\n\n")
        
        print("********************** Loss as function of learning rate plot displayed ********************\n\n")
        
        fig = px.line(
        x=expon_lr.rates, y=expon_lr.losses,
        labels={'index': 'learning rate', 'value': 'loss'}, 
        title='Training History')
        fig.show()
        
        id_min = np.argmin(expon_lr.losses)
        return expon_lr.rates[id_min]
        
lr = bestLearningRate()
print('the best learning rate is: ',lr)

<h2 style="text-align: center;">Training step<h2><a id=e><a/>
<hr>

I used two callbacks. One to stop the training when the loss is under a limit. The second is a learning rate decreasing processus in case of plateau.

In [None]:
class myCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs={}):
        if(logs.get('val_loss')<91):
            print("\nReached 110 val_loss so cancelling training!")
            self.model.stop_training = True
        
callback = myCallback()
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=5, min_lr=0.001)

SGD = tf.keras.optimizers.Adam(learning_rate=0.0035) 
model.compile(loss='mse', optimizer=SGD ,metrics=[tf.keras.metrics.MeanAbsoluteError()])

history = model.fit(Xtrain, ytrain, batch_size=BATCH_SIZE, epochs=EPOCHS, validation_data=(Xval, yval), callbacks = [callback, reduce_lr], verbose=1)

Finally let's see the graph of losses.

In [None]:
fig = px.line(
    history.history, y=['loss', 'val_loss'],
    labels={'index': 'epoch', 'value': 'loss'}, 
    title='Training History')
fig.show()