# DATASET DESCRIPTION

The data consists of 48x48 pixel grayscale images of faces. The faces have been automatically registered so that the face is more or less centered and occupies about the same amount of space in each image. The task is to categorize each face based on the emotion shown in the facial expression in to one of seven categories (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral).

train.csv contains two columns, "emotion" and "pixels". The "emotion" column contains a numeric code ranging from 0 to 6, inclusive, for the emotion that is present in the image. The "pixels" column contains a string surrounded in quotes for each image. The contents of this string a space-separated pixel values in row major order. test.csv contains only the "pixels" column and your task is to predict the emotion column.

The training set consists of 28,709 examples. The public test set used for the leaderboard consists of 3,589 examples. The final test set, which was used to determine the winner of the competition, consists of another 3,589 examples.

This dataset was prepared by Pierre-Luc Carrier and Aaron Courville, as part of an ongoing research project. They have graciously provided the workshop organizers with a preliminary version of their dataset to use for this contest.

In [36]:
# Looking for number of data points

with open("fer2013.csv") as f:
    content = f.readlines()
 
    lines = np.array(content)
 
    num_of_instances = lines.size
    print("Number of instances: ",num_of_instances)

Number of instances:  35888


In [37]:
# Importing necessary packages

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import cv2

dataset_path = 'fer2013.csv'
image_size=(48,48)




In [38]:
# Converting the data to arrays that can be fed into neural networks

def load_fer2013():
    data = pd.read_csv(dataset_path)
    pixels = data['pixels'].tolist()
    width, height = 48, 48
    faces = []
    for pixel_sequence in pixels:
        face = [int(pixel) for pixel in pixel_sequence.split(' ')]
        face = np.asarray(face).reshape(width, height)
        face = cv2.resize(face.astype('uint8'),image_size)
        faces.append(face.astype('float32'))
    faces = np.asarray(faces)
    faces = np.expand_dims(faces, -1)
    emotions = pd.get_dummies(data['emotion']).as_matrix()
    return faces, emotions
 


In [39]:
faces, emotions = load_fer2013()

In [40]:
print(faces)

[[[[ 70.]
   [ 80.]
   [ 82.]
   ...
   [ 52.]
   [ 43.]
   [ 41.]]

  [[ 65.]
   [ 61.]
   [ 58.]
   ...
   [ 56.]
   [ 52.]
   [ 44.]]

  [[ 50.]
   [ 43.]
   [ 54.]
   ...
   [ 49.]
   [ 56.]
   [ 47.]]

  ...

  [[ 91.]
   [ 65.]
   [ 42.]
   ...
   [ 72.]
   [ 56.]
   [ 43.]]

  [[ 77.]
   [ 82.]
   [ 79.]
   ...
   [105.]
   [ 70.]
   [ 46.]]

  [[ 77.]
   [ 72.]
   [ 84.]
   ...
   [106.]
   [109.]
   [ 82.]]]


 [[[151.]
   [150.]
   [147.]
   ...
   [129.]
   [140.]
   [120.]]

  [[151.]
   [149.]
   [149.]
   ...
   [122.]
   [141.]
   [137.]]

  [[151.]
   [151.]
   [156.]
   ...
   [109.]
   [123.]
   [146.]]

  ...

  [[188.]
   [188.]
   [121.]
   ...
   [185.]
   [185.]
   [186.]]

  [[188.]
   [187.]
   [196.]
   ...
   [186.]
   [182.]
   [187.]]

  [[186.]
   [184.]
   [185.]
   ...
   [193.]
   [183.]
   [184.]]]


 [[[231.]
   [212.]
   [156.]
   ...
   [ 44.]
   [ 27.]
   [ 16.]]

  [[229.]
   [175.]
   [148.]
   ...
   [ 27.]
   [ 35.]
   [ 27.]]

  [[214.]
   [15

In [41]:
print(emotions)

[[1 0 0 ... 0 0 0]
 [1 0 0 ... 0 0 0]
 [0 0 1 ... 0 0 0]
 ...
 [1 0 0 ... 0 0 0]
 [0 0 0 ... 0 0 0]
 [0 0 1 ... 0 0 0]]


In [42]:
# It is a standard way to pre-process images by scaling them between -1 to 1. 
# Images is scaled to [0,1] by dividing it by 255. Further, subtraction by 0.5 and multiplication by 2 changes the range to [-1,1]. 
# [-1,1] has been found a better range for neural network models in computer vision problems.

def preprocess_input(x, v2=True):
    x = x.astype('float32')
    x = x / 255.0
    if v2:
        x = x - 0.5
        x = x * 2.0
    return x
 

faces = preprocess_input(faces)


In [43]:
faces

array([[[[-0.45098037],
         [-0.372549  ],
         [-0.35686272],
         ...,
         [-0.5921569 ],
         [-0.6627451 ],
         [-0.6784314 ]],

        [[-0.49019605],
         [-0.52156866],
         [-0.54509807],
         ...,
         [-0.56078434],
         [-0.5921569 ],
         [-0.654902  ]],

        [[-0.60784316],
         [-0.6627451 ],
         [-0.5764706 ],
         ...,
         [-0.6156863 ],
         [-0.56078434],
         [-0.6313726 ]],

        ...,

        [[-0.2862745 ],
         [-0.49019605],
         [-0.67058825],
         ...,
         [-0.4352941 ],
         [-0.56078434],
         [-0.6627451 ]],

        [[-0.3960784 ],
         [-0.35686272],
         [-0.38039213],
         ...,
         [-0.17647058],
         [-0.45098037],
         [-0.6392157 ]],

        [[-0.3960784 ],
         [-0.4352941 ],
         [-0.34117645],
         ...,
         [-0.16862744],
         [-0.14509803],
         [-0.35686272]]],


       [[[ 0.18431377],


In [44]:
# Storing them using numpy

np.save('fdataX1', faces)
np.save('flabels1', emotions)

In [45]:
print("Preprocessing Done")
print("Number of Features: "+str(len(faces[0])))
print("Number of Labels: "+ str(len(emotions[0])))

print("faces, emotions stored in fdataX1.npy and flabels1.npy respectively")

Preprocessing Done
Number of Features: 48
Number of Labels: 7
faces, emotions stored in fdataX1.npy and flabels1.npy respectively
