---
# FIRST NOTES
- Sample weights for each group of users with more than X members (create all possible groups)
- Provar d'usar el backbone inception resnet
- Final layer changes
- Evaluar en validation

- Add sample weights to each experiment training\
Exp 1: Resnet weights imagenet + regression layer - 1 training step\
Exp 2: InceptionResnet weights imagenet + regression layer - 1 training step\
Exp 3: Take the best backbone and add some layers at top - 1 training step\
Exp 4: Repeat 3 using two training steps (freeze backbone + full)\
Exp 5: Repeat 3 using two training steps (full + freeze backbone)\
Exp 6: Repeat 3 using three training steps (freeze backbone + freeze top + full)\
Exp 7: Based on the results get the best and try to set a bias initializer\
Exp 8: Change optimizer, loss, learning rate, batch size\
Exp 9: Change the sample weights, use less groups (e.g. excluding gender that is already balanced)
---

# **The Problem: Automatic Apparent Age Estimation**

# Pre-requisites:
Installing tensorflow-gpu (GPU) and OpenCv.
Check GPU usage instructions [here](https://research.google.com/colaboratory/faq.html#gpu-availability)

In [1]:
!pip install tensorflow-gpu==2.4.0
!pip install opencv-python
!pip install h5py

You should consider upgrading via the '/home/xavi/Documents/MasterDataScience/subject_repos/UB-DS-CV/cv-env/bin/python -m pip install --upgrade pip' command.[0m[33m
You should consider upgrading via the '/home/xavi/Documents/MasterDataScience/subject_repos/UB-DS-CV/cv-env/bin/python -m pip install --upgrade pip' command.[0m[33m
You should consider upgrading via the '/home/xavi/Documents/MasterDataScience/subject_repos/UB-DS-CV/cv-env/bin/python -m pip install --upgrade pip' command.[0m[33m
[0m

---
# Downloading and decompressing the Appa-Real Age Dataset [(source)](http://chalearnlap.cvc.uab.es/challenge/13/track/13/description/)
- As default, RGB images (cropped faces) are in the range of [0, 255], and labels are in the range of ~0.9 to ~90 (years old).
- The data is divided in train, validation and test set. 
- Matadata is also provided
  - gender: male / female 
  - ethnicity: asian / afroamerican / caucasian
  - facial expression: neutral / slightlyhappy / happy / oth

In [2]:
from zipfile import ZipFile

In [3]:
# downloading the data
!wget https://data.chalearnlap.cvc.uab.cat/Colab_2021/app_data.zip

with ZipFile('app_data.zip','r') as zipp:
   zipp.extractall()
   print('Data decompressed successfully')

# removing the .zip file after extraction to clean space
!rm app_data.zip

--2022-03-26 11:16:29--  https://data.chalearnlap.cvc.uab.cat/Colab_2021/app_data.zip
Resolving data.chalearnlap.cvc.uab.cat (data.chalearnlap.cvc.uab.cat)... 158.109.8.102
Connecting to data.chalearnlap.cvc.uab.cat (data.chalearnlap.cvc.uab.cat)|158.109.8.102|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 799021037 (762M) [application/zip]
Saving to: ‘app_data.zip’


2022-03-26 11:17:38 (11.4 MB/s) - ‘app_data.zip’ saved [799021037/799021037]

Data decompressed successfully


# Loading the train/validation data, and re-scaling the labels to [0..1]
- X_[train,valid,test] = Face images
- Y_[train,valid,test] = Ground truth 
- M_[train,valid,test] = Metadata (gender, ethnicicy, facial expression)

In [2]:
import numpy as np

# loading the train data
X_train = np.load('./data/data_train.npy')
Y_train = np.load('./data/labels_train.npy')
M_train = np.load('./data/meta_data_train.npy')

# loading the validation data
X_valid = np.load('./data/data_valid.npy')
Y_valid = np.load('./data/labels_valid.npy')
M_valid = np.load('./data/meta_data_valid.npy')

# loading the test data
X_test = np.load('./data/data_test.npy')
Y_test = np.load('./data/labels_test.npy')
M_test = np.load('./data/meta_data_test.npy')

# train labels are real numbers, ranging from ~0.9 to ~89 (years old);
# we will re-scale the labels to [0,1] by using a normalization factor of 100,
# assuming there is no sample with age > 100.
Y_train = Y_train/100
Y_valid = Y_valid/100
# Y_test = Y_test/100 # -> we don't normalize the test labels as we will evaluate 
                      # them using the raw data, i.e., the apparent age values

print('Train data size and shape', X_train.shape)
print('Train labels size and shape', Y_train.shape)
print('Train metadata size and shape', M_train.shape)
print('----')
print('Valid data size and shape', X_valid.shape)
print('Valid labels size and shape', Y_valid.shape)
print('Valid metadata size and shape', M_valid.shape)
print('----')
print('Test data size and shape', X_test.shape)
print('Test labels size and shape', Y_test.shape)
print('Test metadata size and shape', M_test.shape)

Train data size and shape (4065, 224, 224, 3)
Train labels size and shape (4065,)
Train metadata size and shape (4065, 3)
----
Valid data size and shape (1482, 224, 224, 3)
Valid labels size and shape (1482,)
Valid metadata size and shape (1482, 3)
----
Test data size and shape (1978, 224, 224, 3)
Test labels size and shape (1978,)
Test metadata size and shape (1978, 3)


---
# Download a pretrained model

In [3]:
import tensorflow as tf
from tensorflow.keras.models import Model, load_model
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.optimizers import Adam

2022-03-26 18:36:58.312940: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0


## Download ResNet50 model pre-trained on faces

In [4]:
# downloading the data
!wget https://data.chalearnlap.cvc.uab.cat/Colab_2021/model.zip

# decompressing the data
with ZipFile('model.zip','r') as zipp:
   zipp.extractall()
   print('Model decompressed successfully')

# removing the .zip file after extraction  to clean space
!rm model.zip

--2022-03-26 18:36:59--  https://data.chalearnlap.cvc.uab.cat/Colab_2021/model.zip
S'està resolent data.chalearnlap.cvc.uab.cat (data.chalearnlap.cvc.uab.cat)... 158.109.8.102
S'està connectant a data.chalearnlap.cvc.uab.cat (data.chalearnlap.cvc.uab.cat)|158.109.8.102|:443... conectat.
HTTP: s'ha enviat la petició, s'està esperant una resposta... 200 OK
Mida: 107893665 (103M) [application/zip]
S'està desant a: «model.zip»

model.zip             3%[                    ]   3,12M  1,47MB/s               ^C


NameError: name 'ZipFile' is not defined

## Load the pre-trained ResNet

In [6]:
from tensorflow.keras.applications.resnet50 import preprocess_input

MODEL_NAME = 'resnet'

# loading the pretrained model
model = tf.keras.models.load_model('./model/weights.h5')

# print the model summary
print(model.summary())

Model: "model_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
base_input (InputLayer)         [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
conv1/7x7_s2 (Conv2D)           (None, 112, 112, 64) 9408        base_input[0][0]                 
__________________________________________________________________________________________________
conv1/7x7_s2/bn (BatchNormaliza (None, 112, 112, 64) 256         conv1/7x7_s2[0][0]               
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 112, 112, 64) 0           conv1/7x7_s2/bn[0][0]            
____________________________________________________________________________________________

## Load InceptionResnet from keras pre-trained on imagenet

In [3]:
from tensorflow.keras.applications.inception_resnet_v2 import InceptionResNetV2
from tensorflow.keras.applications.inception_resnet_v2 import preprocess_input

MODEL_NAME = 'inception_resnet'

# loading the pretrained model
model = InceptionResNetV2(weights='imagenet', include_top=True)

# print the model summary
print(model.summary())

2022-03-26 12:37:18.588810: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2022-03-26 12:37:18.589424: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2022-03-26 12:37:18.612957: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-03-26 12:37:18.613076: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties: 
pciBusID: 0000:2b:00.0 name: NVIDIA GeForce RTX 3060 computeCapability: 8.6
coreClock: 1.852GHz coreCount: 28 deviceMemorySize: 11.77GiB deviceMemoryBandwidth: 335.32GiB/s
2022-03-26 12:37:18.613089: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2022-03-26 12:37:18.615283: I tensorflow/stream_executor/platform/d

Model: "inception_resnet_v2"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 299, 299, 3) 0                                            
__________________________________________________________________________________________________
conv2d (Conv2D)                 (None, 149, 149, 32) 864         input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 149, 149, 32) 96          conv2d[0][0]                     
__________________________________________________________________________________________________
activation (Activation)         (None, 149, 149, 32) 0           batch_normalization[0][0]        
________________________________________________________________________________

# Adapting the model to our needs
- In summary, we will ignore the last layer 'classifier_low_dim' and will include a few other layers on top of our backbone. Here, we also define the activation function we are going to use as output of the last FC layer (Sigmoid, in the case).

In [7]:
# Get the number of layers in the backbone model
BACKBONE_NUM_LAYERS = len(model.layers)-1

In [8]:
# Get the last layer before the classification one
last_layer = model.get_layer(model.layers[-2].name).output

# adding a dropout layer to minimize overfiting problems
dp_layer = Dropout(0.5)(last_layer)

# adding a few hidden FC layers to learn hidden representations
fc_128 = Dense(128, activation='relu', name='f_128')(dp_layer)
fc_32 = Dense(32, activation='relu', name='f_32')(fc_128)

output = Dense(1, activation='sigmoid', name='predict')(fc_32)

# building and pringing the final model
model = Model(inputs=model.get_layer(model.get_layer(model.layers[0].name).name).output, outputs=output)
print(model.summary())

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
base_input (InputLayer)         [(None, 224, 224, 3) 0                                            
__________________________________________________________________________________________________
conv1/7x7_s2 (Conv2D)           (None, 112, 112, 64) 9408        base_input[0][0]                 
__________________________________________________________________________________________________
conv1/7x7_s2/bn (BatchNormaliza (None, 112, 112, 64) 256         conv1/7x7_s2[0][0]               
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 112, 112, 64) 0           conv1/7x7_s2/bn[0][0]            
______________________________________________________________________________________________

---
# Define a function to freeze part of the network

In [9]:
def freezing_function(model, freeze):
    for idx, layer in enumerate(model.layers):
        layer.trainable = True
        if freeze is not None:
            if freeze == 'backbone':
                if idx < BACKBONE_NUM_LAYERS:
                    layer.trainable = False
            if freeze == 'head':
                if idx >= BACKBONE_NUM_LAYERS:
                    layer.trainable = False
    return model

---
# Given the class weights define a function that return the sample weights

In [10]:
def compute_sample_weights(class_weights, metadata_train):
    sample_weights = []

    for i in range(0, len(metadata_train)):
        sample_weights.append(class_weights['_'.join(metadata_train[i])])

    return np.array(sample_weights)

---
# Training function

In [11]:
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint
import pickle

def training(model, X_train, y_train, metadata_train, X_valid, y_valid, config):
    # Train the model (for each train step)
    for train_step in range(config['num_training_steps']):
        # Apply the freezing function for this step
        model = freezing_function(model, config['freeze'][train_step])

        # defining the early stop criteria
        es = EarlyStopping(monitor='val_loss', mode='min', verbose=1, patience=5)
        # saving the best model based on val_loss
        mc = ModelCheckpoint('temp/best_model_'+str(train_step)+'.h5', monitor='val_loss', mode='min', save_best_only=True)

        # defining the optimizer
        model.compile(
            optimizer=config['optimizer'][train_step],
            loss=config['loss'][train_step],
            metrics=config['metrics'][train_step]
        )

        # compute the sample weights
        sample_weights = compute_sample_weights(config['class_weights'], metadata_train)

        # training the model
        history = model.fit(
            X_train, y_train, 
            validation_data=(X_valid, y_valid), 
            batch_size=config['batch_size'][train_step], epochs=50, 
            shuffle=True, verbose=1, callbacks=[es,mc],
            sample_weight=sample_weights,
        )

        # saving training history (for future visualization)
        with open('temp/train_history_'+str(train_step)+'.pkl', 'wb') as handle:
            pickle.dump(history.history, handle, protocol=pickle.HIGHEST_PROTOCOL)
    
    return model

---
# Analyze bias functions

# Age Bias ($B_a$) 

- Evaluates (on the TEST set) how accurate the model is with respect to different age ranges.
  - group 1: age < 20
  - group 2: 20 <= age < 40
  - group 3: 40 <= age < 60
  - group 4: 60 <= age

In [12]:
def age_bias(predictions, gt):
  error_g1 = []
  error_g2 = []
  error_g3 = []
  error_g4 = []
  for i in range(0,len(gt)):
    if(gt[i]<20):
      error_g1.append(abs(predictions[i]-gt[i]))
    if(gt[i]>=20 and gt[i]<40):
      error_g2.append(abs(predictions[i]-gt[i]))
    if(gt[i]>=40 and gt[i]<60):
      error_g3.append(abs(predictions[i]-gt[i]))
    if(gt[i]>=60):
      error_g4.append(abs(predictions[i]-gt[i]))

  print('=============================')
  print('Age analysis:')
  print('Size group 1 = %d, MAE = %f' %(len(error_g1), np.mean(error_g1)))
  print('Size group 2 = %d, MAE = %f' %(len(error_g2), np.mean(error_g2)))
  print('Size group 3 = %d, MAE = %f' %(len(error_g3), np.mean(error_g3)))
  print('Size group 4 = %d, MAE = %f' %(len(error_g4), np.mean(error_g4)))

  age_bias = (abs(np.mean(error_g1)-np.mean(error_g2)) +
            abs(np.mean(error_g1)-np.mean(error_g3)) +
            abs(np.mean(error_g1)-np.mean(error_g4)) +
            abs(np.mean(error_g2)-np.mean(error_g3)) +
            abs(np.mean(error_g2)-np.mean(error_g4)) +
            abs(np.mean(error_g3)-np.mean(error_g4)))/6

  print('---------')
  print('Age bias (Ba) = ', age_bias)

# Gender Bias ($B_g$) 
- Evaluates (on the test set) how accurate the model is with respect to different gender.
  - group 1: male
  - group 2: female

In [13]:
def gender_bias(predictions, gt, metadata):
  error_m = []
  error_f = []
  for i in range(0,len(gt)):
    if(metadata[i][0] == 'female'):
      error_f.append(abs(predictions[i]-gt[i]))
    else:
      error_m.append(abs(predictions[i]-gt[i]))

  print('=============================')
  print('Gender analysis:')
  print('Size group female = %d, MAE = %f' %(len(error_f), np.mean(error_f)))
  print('Size group male = %d, MAE = %f' %(len(error_m), np.mean(error_m)))

  gender_bias = abs(np.mean(error_f)-np.mean(error_m))

  print('---------')
  print('Gender bias (Bg) = ', gender_bias)

# Ethnicity Bias ($B_e$)
- Evaluates (on the test set) how accurate the model is with respect to different ethnicity categories.
  - group 1: asian
  - group 2: afroamerican
  - group 3: caucasian


In [14]:
def ethnicity_bias(predictions, gt, metadata):
  error_as = []
  error_af = []
  error_ca = []
  for i in range(0,len(gt)):
    if(metadata[i][1] == 'asian'):
      error_as.append(abs(predictions[i]-gt[i]))
    if(metadata[i][1] == 'afroamerican'):
      error_af.append(abs(predictions[i]-gt[i]))
    if(metadata[i][1] == 'caucasian'):
      error_ca.append(abs(predictions[i]-gt[i]))

  print('=============================')
  print('Ethnicity Analysis:')
  print('Size group asian = %d, MAE = %f' %(len(error_as), np.mean(error_as)))
  print('Size group afroamerican = %d, MAE = %f' %(len(error_af), np.mean(error_af)))
  print('Size group caucasian = %d, MAE = %f' %(len(error_ca), np.mean(error_ca)))
  
  ethnicity_bias = (abs(np.mean(error_as)-np.mean(error_af)) +
                   abs(np.mean(error_as)-np.mean(error_ca)) +
                   abs(np.mean(error_af)-np.mean(error_ca)))/3

  print('---------')
  print('Ethnicity bias (Be) = ', ethnicity_bias)

# Face expression bias ($B_f$)
- Evaluates (on the test set) how accurate the model is with respect to different face expression categories.
  - group 1: neutral
  - group 2: slightlyhappy
  - group 3: happy
  - group 4: other

In [15]:
def face_expression_bias(predictions, gt, metadata):
  error_h = []
  error_s = []
  error_n = []
  error_o = []
  for i in range(0,len(gt)):
    if(metadata[i][2]=='happy'):
      error_h.append(abs(predictions[i]-gt[i]))
    if(metadata[i][2]=='slightlyhappy'):
      error_s.append(abs(predictions[i]-gt[i]))
    if(metadata[i][2]=='neutral'):
      error_n.append(abs(predictions[i]-gt[i]))
    if(metadata[i][2]=='other'):
      error_o.append(abs(predictions[i]-gt[i]))

  print('=============================')
  print('Face experession Analysis:')
  print('Size group happy = %d, MAE = %f' %(len(error_h), np.mean(error_h)))
  print('Size group slightlyhappy = %d, MAE = %f' %(len(error_s), np.mean(error_s)))
  print('Size group neutral = %d, MAE = %f' %(len(error_n), np.mean(error_n)))
  print('Size group other = %d, MAE = %f' %(len(error_o), np.mean(error_o)))

  face_bias = (abs(np.mean(error_h)-np.mean(error_s)) +
              abs(np.mean(error_h)-np.mean(error_n)) +
              abs(np.mean(error_h)-np.mean(error_o)) +
              abs(np.mean(error_s)-np.mean(error_n)) +
              abs(np.mean(error_s)-np.mean(error_o)) +
              abs(np.mean(error_n)-np.mean(error_o)))/6

  print('---------')
  print('Face Expression bias (Bf) = ', face_bias)

---
# Evaluate function

In [16]:
def evaluate(model, X_test, y_test):
    # Make predictions and re-scale (from [0,1] to age range)
    predictions = model.predict(X_test, batch_size=32, verbose=1)*100

    # evaluating on test data
    error = []
    for i in range(0,len(y_test)):
        error.append(abs(np.subtract(predictions[i][0],y_test[i])))

    print('=============================')
    print('MAE = %.8f' %(np.mean(error)))

    # computing the age bias (model_stage_2)
    age_bias(predictions, y_test)

    # computing the gender bias (model_stage_2)
    gender_bias(predictions, y_test, M_test)

    # computing the ethnicity bias (model_stage_2)
    ethnicity_bias(predictions, y_test, M_test)

    # computing the face bias (model_stage_2)
    face_expression_bias(predictions, y_test, M_test)

    return predictions

---
# Experiments RUN

## Preprocessing the data (face images)
- Use the `preprocess_input` function imported for the specific backbone model

In [17]:
# Run the preprocess_input function to each data sample
# train
for i in range(0,X_train.shape[0]):
    x = X_train[i,:,:,:]
    x = np.expand_dims(x, axis=0)
    X_train[i,] = preprocess_input(x)

# validation
for i in range(0,X_valid.shape[0]):
    x = X_valid[i,:,:,:]
    x = np.expand_dims(x, axis=0)
    X_valid[i,] = preprocess_input(x)  

# test
for i in range(0,X_test.shape[0]):
    x = X_test[i,:,:,:]
    x = np.expand_dims(x, axis=0)
    X_test[i,] = preprocess_input(x)

In [18]:
if MODEL_NAME == 'inception_resnet':
    # Resize the images to the required size (299, 299, 3)
    X_train = tf.image.resize(X_train, (299, 299))
    X_valid = tf.image.resize(X_valid, (299, 299))
    X_test = tf.image.resize(X_test, (299, 299))

## Compute class weights and define the training metadata

In [19]:
# explore the groups from the data
y_train_bin = (Y_train.copy()*100).astype(int)
y_train_bin[y_train_bin < 20] = 0
y_train_bin[(y_train_bin >= 20) & (y_train_bin < 40)] = 1
y_train_bin[(y_train_bin >= 40) & (y_train_bin < 60)] = 2
y_train_bin[y_train_bin >= 60] = 3

metadata_train = np.concatenate((M_train, y_train_bin[:, np.newaxis]), axis=1)
groups, group_counts = np.unique(metadata_train, axis=0, return_counts=True)

In [20]:
class_weights = {
    '_'.join(group): (1/g_count) * (group_counts.sum() / 2.0) for group, g_count in zip(groups, group_counts)
}

## Run experiments

In [21]:
config = { # Step hyperparameters are arrays of length num_training_steps
    'num_training_steps': 2,
    'freeze': ['backbone', None],
    'optimizer': [
        tf.keras.optimizers.Adam(learning_rate=1e-4),
        tf.keras.optimizers.Adam(learning_rate=1e-5)
    ],
    'loss': [
        tf.keras.losses.MeanSquaredError(),
        tf.keras.losses.MeanSquaredError()
    ],
    'metrics': ['mae', 'mae'],
    'batch_size': [32, 32],
    'class_weights': class_weights,
}

In [22]:
# Train the model with the given config
model = training(model, X_train, Y_train, metadata_train, X_valid, Y_valid, config)

predictions = evaluate(model, X_test, Y_test)

2022-03-26 18:38:01.001206: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2022-03-26 18:38:01.018298: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3699845000 Hz


Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50
Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/5

---
# Saving the predicted values (on Test set) to be uploaded on Codalab Competition

In [None]:
import csv

# save the predictions to a csv file
with open('predictions.csv', 'w') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(predictions)

# compressing the csv file (to be submitted to codalab as prediction)
! zip predictions.zip predictions.csv