Since others population is rather small, if removing all men population of race other(about 3.5% of the total population) from the training data, it will not affect the performance of gender classification as a whole but will have very significant negative effect on gender classification performance on men of race other

In [1]:
#helper functions:

#imports
import sys
import os
import pandas as pd
#!{sys.executable} -m pip install tensorflow==2.4
import tensorflow as tf
import PIL # Python Image Library
from IPython.display import Image, display # To show images
import numpy as np
#!{sys.executable} -m pip install plotly # Used to create charts in analysis section
import plotly.express as px # For histograms
import plotly.graph_objects as go # For pie charts
#!{sys.executable} -m pip install scikit-learn
from sklearn.metrics import precision_score, recall_score # For performance metrics

#define constants:

# Relative path to the image files
IMG_PATH = "UTKFace/"
FILENAME_MATCH = "[0-9]*_[0-9]*_[0-9]*_[0-9]*.jpg" # This is using "glob", not regex, but the same idea

# Label locations in the filename
AGE_INDEX = 0
GENDER_INDEX = 1
RACE_INDEX = 2

# Class labels
GENDER_MALE = 0
GENDER_FEMALE = 1
RACE_WHITE = 0
RACE_BLACK = 1
RACE_ASIAN = 2
RACE_INDIAN = 3
RACE_OTHER = 4

# Text labels for gender and race classes
GENDER_LABELS = {
    0: "Male",
    1: "Female"
}

RACE_LABELS = {
    0: "White",
    1: "Black",
    2: "Asian",
    3: "Indian",
    4: "Other"
}


### FUNCTIONS FIRST USED IN THIS SECTION ###

def get_label(file_path, attribute_to_classify):
    '''
        Takes the file path of a UTKFace image and an attribute (the integer index of the 
        attribute in the file path) and returns the image's label for that attribute.
    '''
    # Split the path into components using the current OS's file path separator
    parts = tf.strings.split(file_path, os.path.sep)
    # The last part is the filename
    file_name = parts[-1]
    # Split the filename at underscores to get each label
    labels = tf.strings.split(file_name, "_")
    return int(labels[attribute_to_classify])


def convert_file_path_to_df_row_dictionary(file_path, columns):
    '''
        Takes the file path of a UTKFace image and converts it to a dictionary with the
        format "feature_name e.g. gender": label_value. This dictionary can be added to
        a Pandas dataframe
    '''
    row = {}
    # Split the path into components using the current OS's file path separator
    parts = tf.strings.split(file_path, os.path.sep)
    # The last part is the filename
    file_name = parts[-1]
    # Split the filename at underscores to get each label
    labels = tf.strings.split(file_name, "_")
    
    for i, col_name in enumerate(columns):
        row[col_name] = int(labels[i])
    return row 


def create_dataframe_from_dataset(ds):
    '''
        Takes a dataset of UTKFace images (that have not yet been converted into image:label pairs)
        and returns a dataframe containing one row for each image and a column for each label.
        This function is SLOW.
    '''
    COLS=["age", "gender", "race"]
    df = pd.DataFrame(columns=COLS) # Creates an empty dataframe with column headings
    for image in ds:
        image_path = str(image.numpy().decode("utf8"))
        row = convert_file_path_to_df_row_dictionary(image_path, COLS[0:3])
        df = df.append(row, ignore_index=True)
    return df


### VISUALIZATION FUNCTIONS ###


def draw_hist(df, col, nbins, title="Distribution"):
    '''
        Draws a Plotly histogram from the given data.
    '''
    fig = px.histogram(df, x=col, nbins=nbins)
    fig.update_layout(title_text=title)
    fig.show()
    

def draw_pie(df, col, title="Distribution", text_labels=None):
    '''
        Draws a Plotly pie chart from the given data.
    '''
    labels = df[col].value_counts().index.tolist()
    if text_labels != None:
        labels = [text_labels[val] for val in labels]
    counts = df[col].value_counts().values.tolist()

    plot = go.Pie(labels=labels, values=counts)
    fig = go.Figure(data=[plot])
    fig.update_layout(title_text=title)
    fig.show()
    


### FUNCTIONS FIRST USED IN THIS SECTION ###


def process_path(file_path, attribute_to_classify):
    '''
        Takes the file path of an image and the index of an attribute in the filename
        and returns the image data and the attribute label
    '''
    label = get_label(file_path, attribute_to_classify)
    # load the image from the file
    img_string = tf.io.read_file(file_path)
    # Get the raw data from the image
    img_data = tf.image.decode_jpeg(img_string, channels=3)
    return img_data, label


def process_path_age(file_path):
    '''
        Convenience function to get the age label from an image file path.
    '''
    return process_path(file_path, AGE_INDEX)


def process_path_gender(file_path):
    '''
        Convenience function to get the gender label from an image file path.
    '''
    return process_path(file_path, GENDER_INDEX)


def process_path_race(file_path):
    '''
        Convenience function to get the race label from an image file path.
    '''
    return process_path(file_path, RACE_INDEX)


def configure_for_performance(ds):
    BATCH_SIZE = 32
    ds = ds.cache()
    ds = ds.shuffle(buffer_size=1000)
    ds = ds.batch(BATCH_SIZE)
    ds = ds.prefetch(buffer_size=tf.data.AUTOTUNE)
    return ds




def prepare_model(predicted_attribute):
    num_classes = 5 if predicted_attribute == RACE_INDEX else 1
    last_layer_activation = "linear" if predicted_attribute == AGE_INDEX \
                                     else "sigmoid" if predicted_attribute == GENDER_INDEX \
                                     else "softmax"
    
    layers = tf.keras.layers # A shorthand for creating the layers with fewer .
    
    return tf.keras.Sequential([
        layers.experimental.preprocessing.Rescaling(1./255, input_shape=(200, 200, 3)),
        layers.Conv2D(16, 3, padding="same", activation="relu"),
        layers.MaxPooling2D(),
        layers.Conv2D(32, 3, padding="same", activation="relu"),
        layers.MaxPooling2D(),
        layers.Conv2D(64, 3, padding="same", activation="relu"),
        layers.MaxPooling2D(),
        layers.Flatten(),
        layers.Dense(128, activation="relu"),
        layers.Dense(num_classes, activation=last_layer_activation)
    ])


def compile_model(model, predicted_attribute):
    loss_func=tf.losses.BinaryCrossentropy() if predicted_attribute == GENDER_INDEX \
              else tf.losses.SparseCategoricalCrossentropy() if predicted_attribute == RACE_INDEX \
              else tf.losses.MSE
    metric="mae" if predicted_attribute == AGE_INDEX else "accuracy"
    
    model.compile(
        optimizer="adam",
        loss=loss_func,
        metrics=metric
    )


def is_gender_and_race(file_path, gender_label, race_label):
    parts = tf.strings.split(file_path, os.path.sep)
    # The last part is the filename
    file_name = parts[-1]
    # Split the filename at underscores to get each label
    labels = tf.strings.split(file_name, "_")
    return int(labels[GENDER_INDEX]) == gender_label and int(labels[RACE_INDEX]) == race_label

def is_not_race(file_path, race_label):
    parts = tf.strings.split(file_path, os.path.sep)
    # The last part is the filename
    file_name = parts[-1]
    # Split the filename at underscores to get each label
    labels = tf.strings.split(file_name, "_")
    return int(labels[RACE_INDEX]) != race_label

def is_race(file_path, race_label):
    parts = tf.strings.split(file_path, os.path.sep)
    # The last part is the filename
    file_name = parts[-1]
    # Split the filename at underscores to get each label
    labels = tf.strings.split(file_name, "_")
    return int(labels[RACE_INDEX]) == race_label


def get_gender_race(file_path):
    parts = tf.strings.split(file_path, os.path.sep)
    # The last part is the filename
    file_name = parts[-1]
    # Split the filename at underscores to get each label
    labels = tf.strings.split(file_name, "_")
    return int(labels[GENDER_INDEX]), int(labels[RACE_INDEX])

In [2]:
def is_not_gender_and_race(file_path, gender_label, race_label):
    parts = tf.strings.split(file_path, os.path.sep)
    # The last part is the filename
    file_name = parts[-1]
    # Split the filename at underscores to get each label
    labels = tf.strings.split(file_name, "_")
    return int(labels[GENDER_INDEX]) != gender_label or int(labels[RACE_INDEX]) != race_label


In [3]:
full_ds = tf.data.Dataset.list_files(IMG_PATH + FILENAME_MATCH, shuffle=False)
image_count = tf.data.experimental.cardinality(full_ds).numpy()
full_ds = full_ds.shuffle(image_count, seed=32, reshuffle_each_iteration=False)
print("Dataset size:", tf.data.experimental.cardinality(full_ds).numpy())
# Print out a handful of items in the dataset to see what's in there
for f in full_ds.take(5):
    print(f.numpy())

Dataset size: 23705
b'UTKFace/26_1_3_20170119193111002.jpg.chip.jpg'
b'UTKFace/28_1_1_20170113012805025.jpg.chip.jpg'
b'UTKFace/16_1_0_20170109214621700.jpg.chip.jpg'
b'UTKFace/70_1_1_20170119205140215.jpg.chip.jpg'
b'UTKFace/21_0_0_20170116201127126.jpg.chip.jpg'


In [4]:
full_df = create_dataframe_from_dataset(full_ds)
print("Created dataframe with", full_df.shape[0], "rows")
full_df.head()

Created dataframe with 23705 rows


Unnamed: 0,age,gender,race
0,26,1,3
1,28,1,1
2,16,1,0
3,70,1,1
4,21,0,0


In [5]:
#split dataset to training, val and test set:
image_count = tf.data.experimental.cardinality(full_ds).numpy()

# Calculate the number of examples that should be in the test set (20% of the full dataset)
test_size = int(image_count * 0.2)

# Create a temporary dataset of everything EXCEPT the first 20%
train_val_ds = full_ds.skip(test_size)

# Create the test set by taking the first 20% of the full dataset
test_ds = full_ds.take(test_size)

# Split the train_val dataset into train (80% of the images) and validation (20%)
val_size = int(tf.data.experimental.cardinality(train_val_ds).numpy() * 0.2)
train_ds = train_val_ds.skip(val_size)
val_ds = train_val_ds.take(val_size)

In [6]:
#check sum:
# Print the length of each dataset, just to make sure all looks right
train_length = tf.data.experimental.cardinality(train_ds).numpy()
val_length = tf.data.experimental.cardinality(val_ds).numpy()
test_length = tf.data.experimental.cardinality(test_ds).numpy()

print("Train size =", train_length)
print("Validation size =", val_length)
print("Test size =", test_length)
print("Total =", train_length + val_length + test_length)

Train size = 15172
Validation size = 3792
Test size = 4741
Total = 23705


## Hypothesis verification:


Since others population is rather small, if removing all men population of race other(about 3.5% of the total population) from the training data, it will not affect the performance of gender classification as a whole but will have very significant negative effect on gender classification performance on men of race other

In [7]:
for each in [train_ds, val_ds]:
    image_count = tf.data.experimental.cardinality(each).numpy()
    each = each.shuffle(image_count, seed=32, reshuffle_each_iteration=False)


no_men_of_other_race_train_ds = train_ds.filter(lambda x: is_not_gender_and_race(x, GENDER_MALE, RACE_OTHER))
no_men_of_other_race_val_ds = val_ds.filter(lambda x: is_not_gender_and_race(x, GENDER_MALE, RACE_OTHER))

no_men_of_other_race_train_val_ds = no_men_of_other_race_train_ds.concatenate(no_men_of_other_race_val_ds)

print("Total population in training data without men of other race should be around 80% of 3.6% of the 23705 population or around:  " , int(23705*(1-0.036)*0.8))
print("Actual: ", len(list(no_men_of_other_race_train_val_ds.as_numpy_iterator())))

no_men_of_other_race_train_ds_gender = no_men_of_other_race_train_ds.map(process_path_gender, num_parallel_calls=tf.data.AUTOTUNE)
no_men_of_other_race_val_ds_gender = no_men_of_other_race_val_ds.map(process_path_gender, num_parallel_calls=tf.data.AUTOTUNE)


#configure for performance:
no_men_of_other_race_train_ds_gender = configure_for_performance(no_men_of_other_race_train_ds_gender)
no_men_of_other_race_val_ds_gender = configure_for_performance(no_men_of_other_race_val_ds_gender)

Total population in training data without men of other race should be around 80% of 3.6% of the 23705 population or around:   18281
Actual:  18358


In [8]:
no_other_men_model = prepare_model(GENDER_INDEX)
compile_model(no_other_men_model, GENDER_INDEX)
no_other_men_model.fit(
    no_men_of_other_race_train_ds_gender, 
    validation_data=no_men_of_other_race_val_ds_gender,
    epochs=3
)

Epoch 1/3
Epoch 2/3
Epoch 3/3


<tensorflow.python.keras.callbacks.History at 0x161674820>

In [9]:
test_ds_gender = test_ds.map(process_path_gender, num_parallel_calls=tf.data.AUTOTUNE)
test_ds_gender = configure_for_performance(test_ds_gender)

In [10]:
#no asian model general performance:
no_other_men_test_results = no_other_men_model.evaluate(test_ds_gender)



In [11]:
print(no_other_men_test_results)

[0.30439841747283936, 0.8707023859024048]


In [12]:
#helper function to use the trained model to predict outcome of learned data; ds to df:
def get_prediction_binary(input_example, model_ = no_other_men_model):
    '''
        Converts the probability returned by the binary classifier to a class label
    '''
    predictions = (model_.predict(input_example) > 0.5).astype("int32")
    return predictions[0][0]


def get_prediction_multiclass(input_example, model_ = no_other_men_model):
    '''
        Returns the predicted class (multi-class classification)
    '''
    prediction = np.argmax(model_.predict(input_example), axis=-1)
    return prediction[0]


def get_prediction_linear(input_example, model_ = no_other_men_model):
    '''
        Returns the predicted value (linear regression)
    '''
    prediction = model_.predict(input_example)
    return prediction[0][0]


def is_correct_categorical(predicted, actual):
    '''
        Checks if the predicted label matches the actual label (for categorical classification, either
        binary or multi-class)
    '''
    return predicted == actual


def is_correct_linear(predicted, actual, tolerance=0.5):
    '''
        Checks if the predicted value is within the actual value plus or minus the
        given tolerance. Default tolerance is 0.5.
    '''
    return predicted >= actual - tolerance and predicted <= actual + tolerance


def get_detailed_results(ds, predicted_attribute, model_ = no_other_men_model, tolerance=0.5):
    '''
        Gets predictions for each image in a dataset of file paths. predicted_attribute
        is a string describing the class the model is predicting e.g. gender. Returns a dataframe 
        containing all class labels and the outcome of the prediction (correct or not).
    '''
    COLS = ["age", "gender", "race", "image_path", "is_correct"]
    df = pd.DataFrame(columns=COLS)
    for image in ds:
        img_string = tf.io.read_file(image)
        img_data = tf.image.decode_jpeg(img_string, channels=3)
        img_expanded = np.expand_dims(img_data, axis=0)
        prediction = get_prediction_binary(img_expanded, model_) if predicted_attribute == "gender" \
                     else get_prediction_multiclass(img_expanded, model_) if predicted_attribute == "race" \
                     else get_prediction_linear(img_expanded, model_)
        row = convert_file_path_to_df_row_dictionary(image.numpy(), COLS[0:3])
        row["image_path"] = image.numpy().decode("utf8")
        row["is_correct"] = is_correct_linear(prediction, row[predicted_attribute], tolerance) if predicted_attribute == "age" \
                            else is_correct_categorical(prediction, row[predicted_attribute])
        df = df.append(row, ignore_index=True)
    return df

In [13]:
#create a dataframe to validate test and visualize performance
no_other_men_gender_results = get_detailed_results(test_ds, "gender", no_other_men_model)
display(no_other_men_gender_results.head())
# Create a dataframe containing only the rows where the condition is met--the classifier predi
no_other_men_correct = no_other_men_gender_results[no_other_men_gender_results["is_correct"] == True]
no_other_men_incorrect = no_other_men_gender_results[no_other_men_gender_results["is_correct"] == False]
num_correct = float(no_other_men_correct.shape[0])
total = float(no_other_men_gender_results.shape[0])
print("Overall accuracy", num_correct / total)

Unnamed: 0,age,gender,race,image_path,is_correct
0,26,1,3,UTKFace/26_1_3_20170119193111002.jpg.chip.jpg,True
1,28,1,1,UTKFace/28_1_1_20170113012805025.jpg.chip.jpg,True
2,16,1,0,UTKFace/16_1_0_20170109214621700.jpg.chip.jpg,True
3,70,1,1,UTKFace/70_1_1_20170119205140215.jpg.chip.jpg,False
4,21,0,0,UTKFace/21_0_0_20170116201127126.jpg.chip.jpg,True


Overall accuracy 0.8707023834634043


In [16]:
only_other_race = test_ds.filter(lambda x: is_race(x, RACE_OTHER))
only_other_gender_results = get_detailed_results(only_other_race, "gender", no_other_men_model)

only_other_race_correct = only_other_gender_results[only_other_gender_results["is_correct"] == True]
only_other_race_incorrect = only_other_gender_results[only_other_gender_results["is_correct"] == False]

Compared to the baseline of 0.8726, we got 0.8707 which is a miniscule dip in performance which is expected.

## visualize performance

In [17]:
draw_pie(no_other_men_correct, "gender", "Gender - Global correct predictions", GENDER_LABELS)
draw_pie(no_other_men_incorrect, "gender", "Gender - Global incorrect predictions", GENDER_LABELS)

draw_pie(only_other_race_correct, "gender", "Gender - Other race correct predictions", GENDER_LABELS)
draw_pie(only_other_race_incorrect, "gender", "Gender - Other race incorrect predictions", GENDER_LABELS)

We found something very interesting here. Compared to the starter code model(baseline) the result when trained with all population is similar and is indeed dipped a tiny bit which is expected in the hypothesis. However, something that is unexpected is that while the "other race" men population is only 3.5% within around 50% men, it has a significant effect on negative performance on the global test. The baseline got 36.8% representing 222 male wrong, where as this model got 58.1% representing 356 men wrong. This is unexpected.

What is expected is that the model performance on telling race other men and women apart is apparently. Not only does the performance on men are worse when it guessed right, it also identified men of other race wrong more often.

Thus, we can conclude that our suspicion in our hypothesis to be correct. 