# Bias Buccaneers Image Recognition Challenge: Quickstart

This notebook will introduce you to the data and describe a workflow to train and evaluate a baseline model on it.

## Initial Setup

We start with loading the required packages.

In [10]:
!pip install tensorflow nb-black

Keyring is skipped due to an exception: 'keyring.backends'
Collecting tensorflow
  Using cached tensorflow-2.11.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (588.3 MB)
Collecting nb-black
  Downloading nb_black-1.0.7.tar.gz (4.8 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting opt-einsum>=2.3.2
  Using cached opt_einsum-3.3.0-py3-none-any.whl (65 kB)
Collecting libclang>=13.0.0
  Using cached libclang-14.0.6-py2.py3-none-manylinux2010_x86_64.whl (14.1 MB)
Collecting protobuf<3.20,>=3.9.2
  Using cached protobuf-3.19.6-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB)
Collecting grpcio<2.0,>=1.24.3
  Using cached grpcio-1.50.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.7 MB)
Collecting gast<=0.4.0,>=0.2.1
  Using cached gast-0.4.0-py3-none-any.whl (9.8 kB)
Collecting flatbuffers>=2.0
  Using cached flatbuffers-22.10.26-py2.py3-none-any.whl (26 kB)
Collecting absl-py>=1.0.0
  Using cached absl_py-1.3.0-py3-none-any.whl 

In [11]:
%load_ext lab_black

In [44]:
import numpy as np
import pandas as pd
import io
import boto3
import json
import os

from sklearn.preprocessing import LabelBinarizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix

In [24]:
import tensorflow as tf
import tensorflow.keras as K
from tensorflow.keras.preprocessing import image
from tensorflow.keras.applications.resnet50 import ResNet50

## Load the Data

Make sure to download and uncompress the data (`data_bb1_img_recognition.zip`) in the folder you're working off of.

We first load the file containing the labels, binarize labels of each of the three classes as a numpy array and store them as a list.

In [31]:
# Define data location
bucket = "gmml-test"
LOADPATH = "00_raw_bb1/train/"
data_key = LOADPATH + "labels.csv"

In [32]:
# Load data
s3 = boto3.client("s3")
obj = s3.get_object(Bucket=bucket, Key=data_key)
df = pd.read_csv(obj["Body"])

In [33]:
# Extract data
df_labeled = df[df["skin_tone"].notna()]  # take only labeled data

# Converting labels to np array
cat = ["skin_tone", "gender", "age"]
lbs = [LabelBinarizer() for i in range(3)]
Y = []
for i in range(3):
    lab = lbs[i].fit_transform(df_labeled[cat[i]])
    if lab.shape[1] == 1:
        Y.append(np.hstack((1 - lab, lab)))
    else:
        Y.append(lab)

We then load the images under the training set and convert them to numpy arrays. This may take a while.

In [50]:
# loading and converting data into np array
print("Loading images")
s3 = boto3.resource("s3")
length = width = 64  # size for each input image, increase if you want
nn = df_labeled.shape[0]
all_imgs = [
    image.load_img(
        io.BytesIO(
            s3.Object(bucket, LOADPATH + df_labeled.iloc[i]["name"])
            .get()["Body"]
            .read()
        ),
        target_size=(length, width),
    )
    for i in range(nn)
]

print("Converting images to np array")
X = np.empty([nn, length, width, 3], dtype=float)
for i in range(nn):
    X[i, :] = image.img_to_array(all_imgs[i])
X = K.applications.resnet50.preprocess_input(X)

Loading images
Converting images to np array


## Specify the Model

We define a single model class that is able train on the data in `X` and `Y` and predict outcomes for all three classes.

In [53]:
class PredictionModel:
    def __init__(self, X, Y, idx):
        self.X = X
        self.Y = Y
        self.idx = idx
        self.trainX, self.testX = X[idx[0], :], X[idx[1], :]
        self.trainY, self.testY = [Y[i][idx[0], :] for i in range(3)], [
            Y[i][idx[1], :] for i in range(3)
        ]
        self.cat = ["skin_tone", "gender", "age"]
        self.loss = ["categorical_crossentropy" for i in range(3)]
        self.metrics = [["accuracy"] for i in range(3)]
        self.models = [None] * 3

    # train a model specific for a certain class index in self.cat
    def fit(
        self,
        index,
        model,
        epochs=5,
        batch_size=32,
        save=False,
        save_location=None,
        verbose=1,
    ):

        if verbose:
            print("Training model for " + self.cat[index])
        model.add(K.layers.Dense(self.trainY[index].shape[1], activation="softmax"))
        model.compile(
            loss=self.loss[index], optimizer="Adam", metrics=self.metrics[index]
        )
        model.fit(
            self.trainX,
            self.trainY[index],
            validation_data=(self.testX, self.testY[index]),
            batch_size=batch_size,
            epochs=epochs,
            verbose=verbose,
        )
        if save:
            if os.path.exists(SAVEPATH) == False:
                print("save location " + SAVEPATH + " did not exist. creating")
                os.makedirs(SAVEPATH)
            SAVE_LOCATION = save_location + "model_" + cat[index] + ".h5"
            print("saving model at " + SAVE_LOCATION)
            model.save(SAVE_LOCATION)
        self.models[index] = model

    def predict(self, newX):
        predictions = [model.predict(newX) for model in self.models]
        return predictions

## Initialize and Train a Model

We now train a `PredictionModel` to predict the likely skin tone, gender, and age of an input image. This baseline model is initialize on imagenet weights and uses the ResNet50 architecture. We strongly recommend using a GPU to reduce training time.

In [55]:
# function to initialize a model
def initializeModel():
    res_model = ResNet50(
        include_top=False,
        weights="imagenet",
        input_tensor=K.Input(shape=[length, width, 3]),
    )

    # freeze all but the last layer
    for layer in res_model.layers[:143]:
        layer.trainable = False
    model = K.models.Sequential()
    model.add(res_model)
    model.add(K.layers.Flatten())
    model.add(K.layers.BatchNormalization())
    model.add(K.layers.Dense(256, activation="relu"))
    model.add(K.layers.Dropout(0.5))
    model.add(K.layers.BatchNormalization())
    model.add(K.layers.Dense(128, activation="relu"))
    model.add(K.layers.Dropout(0.5))
    model.add(K.layers.BatchNormalization())
    model.add(K.layers.Dense(64, activation="relu"))
    model.add(K.layers.Dropout(0.5))
    model.add(K.layers.BatchNormalization())
    return model

In [None]:
nntrain = int(0.7*nn)
np.random.seed(42)
indices = np.random.permutation(nn)
train_idx, test_idx = indices[:nntrain], indices[nntrain:]
mymodel = PredictionModel(X=X, Y=Y, idx=[train_idx,test_idx])

# train model
for i in range(3):
    mymodel.fit(index=i, model=initializeModel(), epochs=5, save=True, save_location=SAVEPATH)

## Evaluate the Model

We now evaluate the model on the test data. To do this, let's first load up that data and structure it similarly.

In [None]:
# load labels data
TESTPATH = './test/'
df_test = pd.read_csv(TESTPATH+'labels.csv')

# Convert labels to np array
print("Converting test labels to np array")
testY = []
for i in range(3):
    lab = lbs[i].fit_transform(df_test[cat[i]])
    if lab.shape[1]==1:
        testY.append(np.hstack((1-lab,lab)))
    else:
        testY.append(lab)
        
# load and convert images into np array
print("Loading test images")
nt = df_test.shape[0]
all_imgs = [image.load_img(TESTPATH+df_test.iloc[i]['name'], target_size=(length,width)) for i in range(nt)]

print("Converting test images to np array")
testX = np.empty([nt, length, width, 3], dtype=float)
for i in range(nt):
    testX[i,:] = image.img_to_array(all_imgs[i])
testX = K.applications.resnet50.preprocess_input(testX)

We then obtain predicted labels for skin tone, gender, and age as a list of lists.

In [34]:
pred = mymodel.predict(testX)
predY = [[np.argmax(pred[i][j,:]) for j in range(nt)] for i in range(3)]
predLabels = [[lbs[i].classes_[j] for j in predY[i]] for i in range(3)]

Finally, we calculate the label-wise accuracy and disparity.

In [35]:
# calculate accuracy
acc = {}
for i in range(3):
    icat = cat[i]
    iacc = accuracy_score(df_test[cat[i]], predLabels[i])
    acc[icat] = iacc

# calculate disparity
def disparity_score(ytrue, ypred):
    cm = confusion_matrix(ytrue,ypred)
    cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
    all_acc = list(cm.diagonal())
    return max(all_acc) - min(all_acc)

disp = {}
for i in range(3):
    icat = cat[i]
    idisp = disparity_score(df_test[cat[i]], predLabels[i])
    disp[icat] = idisp
disp

results = {'accuracy': acc, 'disparity': disp}
results

{'accuracy': {'skin_tone': 0.25766666666666665,
  'gender': 0.7966666666666666,
  'age': 0.5783333333333334},
 'disparity': {'skin_tone': 0.5514223194748359,
  'gender': 0.1493182689960596,
  'age': 0.7802908824936}}

# Score Model and Prepare Submission

Based on the above metric, we now calculate the score to evaluate your submission. This score will be displayed in your public leaderboard.

In [89]:
def getScore(results):
    acc = results['accuracy']
    disp = results['disparity']
    ad = 2*acc['gender']*(1-disp['gender']) + 4*acc['age']*(1-disp['age']**2) + 10*acc['skin_tone']*(1-disp['skin_tone']**5)
    return ad

title = '8-Bit Bias Bounty Baseline'
    
submission = {
    'submission_name': title,
    'score': getScore(results),
    'metrics': results
}
submission

{'submission_name': '8-Bit Bias Bounty Baseline',
 'score': 4.705572543130653,
 'metrics': {'accuracy': {'skin_tone': 0.25766666666666665,
   'gender': 0.7966666666666666,
   'age': 0.5783333333333334},
  'disparity': {'skin_tone': 0.5514223194748359,
   'gender': 0.1493182689960596,
   'age': 0.7802908824936}}}

Finally, let's export this as a json file to upload as part of filling out your [submission form](https://docs.google.com/forms/d/e/1FAIpQLSfwqtVkJBVRP6TnFp7vHbbH8SlwKZJFIjvGQy7TyYFc8HR1hw/viewform).

In [6]:
with open("baseline_score.json", "w") as f:
    json.dump(submission, f, indent=4)