# Deep Learning - Tutorial Facial Keypoints Detection
### Magister en Data Science

#### Instituto Data Science, Universidad del Desarrollo

By Hernan Rivera 

* Tutorial esta basado en el tutorial de Daniel Nouri fundadod de Natural Vision. 
* Original: http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial

Tutorial Contents

* Requisitos previos
* Los datos
* Primer modelo: una sola capa oculta.
* Probándolo
* Segundo modelo: convolutions.
* Aumento de datos
* Cambio de la tasa de aprendizaje y el impulso con el tiempo
* Abandonar
* Especialistas en formación.
* Pre-entrenamiento supervisado.
* Conclusión

### 1. Requisitos previos

* Se debe instalar las librerias Theano y lasagne

In [2]:
#!pip install -r https://raw.githubusercontent.com/dnouri/kfkd-tutorial/master/requirements.txt
# En mi caso previamente instalada, por lo tanto, se comenta el comando.

### 2. Los datos

* Cremoa funcion para chequear los datos dispoibles para el desafio

In [4]:
import os
import numpy as np
from pandas.io.parsers import read_csv
from sklearn.utils import shuffle
from lasagne import layers
from lasagne.updates import nesterov_momentum
from nolearn.lasagne import NeuralNet

* Los datos se descargan desde el sitio web de kagle (https://www.kaggle.com/c/facial-keypoints-detection/data)

In [7]:
FTRAIN = './data/training.csv'
FTEST = './data/test.csv'

In [8]:
def load(test=False, cols=None):
    """Loads data from FTEST if *test* is True, otherwise from FTRAIN.
    Pass a list of *cols* if you're only interested in a subset of the
    target columns.
    """
    fname = FTEST if test else FTRAIN
    df = read_csv(os.path.expanduser(fname))  # load pandas dataframe

    # The Image column has pixel values separated by space; convert
    # the values to numpy arrays:
    df['Image'] = df['Image'].apply(lambda im: np.fromstring(im, sep=' '))

    if cols:  # get a subset of columns
        df = df[list(cols) + ['Image']]

    print(df.count())  # prints the number of values for each column
    df = df.dropna()  # drop all rows that have missing values in them

    X = np.vstack(df['Image'].values) / 255.  # scale pixel values to [0, 1]
    X = X.astype(np.float32)

    if not test:  # only FTRAIN has any target columns
        y = df[df.columns[:-1]].values
        y = (y - 48) / 48  # scale target coordinates to [-1, 1]
        X, y = shuffle(X, y, random_state=42)  # shuffle train data
        y = y.astype(np.float32)
    else:
        y = None

    return X, y

In [9]:
X, y = load()
print("X.shape == {}; X.min == {:.3f}; X.max == {:.3f}".format(X.shape, X.min(), X.max()))
print("y.shape == {}; y.min == {:.3f}; y.max == {:.3f}".format(y.shape, y.min(), y.max()))

left_eye_center_x            7039
left_eye_center_y            7039
right_eye_center_x           7036
right_eye_center_y           7036
left_eye_inner_corner_x      2271
left_eye_inner_corner_y      2271
left_eye_outer_corner_x      2267
left_eye_outer_corner_y      2267
right_eye_inner_corner_x     2268
right_eye_inner_corner_y     2268
right_eye_outer_corner_x     2268
right_eye_outer_corner_y     2268
left_eyebrow_inner_end_x     2270
left_eyebrow_inner_end_y     2270
left_eyebrow_outer_end_x     2225
left_eyebrow_outer_end_y     2225
right_eyebrow_inner_end_x    2270
right_eyebrow_inner_end_y    2270
right_eyebrow_outer_end_x    2236
right_eyebrow_outer_end_y    2236
nose_tip_x                   7049
nose_tip_y                   7049
mouth_left_corner_x          2269
mouth_left_corner_y          2269
mouth_right_corner_x         2270
mouth_right_corner_y         2270
mouth_center_top_lip_x       2275
mouth_center_top_lip_y       2275
mouth_center_bottom_lip_x    7016
mouth_center_b

* Podemos observar las caractetisticas principales del dataset, en donde destaca lo siguiente:
 > 
 * La data de entrenamineto consta de 7049 imagenes de 96 x 96 pixeles en escala de grises.
 * Existen 30 caracteristicas que corresponden a los pares (puntos) X, Y de los 15 puntos claves de la cara.
 * Para algunos puntos claves solo tenemso aprox 2000 etiquetas, mientras que para otros cerca de 7000 etiquetas.
 * **y.shape == (2140, 30)** nos dice que solo hay 2140 imagenes en el dataset con todas las features
 * Las imagenes estan en escala [0,1] en vez de [0, 255]
 * Los valores de X, Y estan entre [-1,1] y las imagenes de [0,95]

### 3. Primer modelo: una sola capa oculta.

* Creamos una red Lasagne con una solo una capa:
 > 
 * Primero inicializamos la red con 3 capas (input, hidden y output). En este punto se especifica el nombre y su orden.
 * Luego se especifica las caracteristicas de cada capa
 * 

In [14]:
net1 = NeuralNet(
    layers=[  # three layers: one hidden layer
        ('input', layers.InputLayer),
        ('hidden', layers.DenseLayer),
        ('output', layers.DenseLayer),
        ],
    # layer parameters:
    input_shape=(None, 9216),  # 96x96 input pixels per batch
    hidden_num_units=100,  # number of units in hidden layer
    output_nonlinearity=None,  # output layer uses identity function
    output_num_units=30,  # 30 target values

    # optimization method:
    update=nesterov_momentum,
    update_learning_rate=0.01,
    update_momentum=0.9,

    regression=True,  # flag to indicate we're dealing with regression problem
    max_epochs=400,  # we want to train this many epochs
    verbose=1, # Print 1, Not print 0
    )

In [15]:
net1.fit(X, y)

# Neural Network with 924730 learnable parameters

## Layer information

  #  name      size
---  ------  ------
  0  input     9216
  1  hidden     100
  2  output      30

  epoch    trn loss    val loss    trn/val  dur
-------  ----------  ----------  ---------  -----
      1     [36m0.10887[0m     [32m0.09718[0m    1.12033  0.32s
      2     [36m0.04194[0m     [32m0.01798[0m    2.33336  0.29s
      3     [36m0.01648[0m     [32m0.01645[0m    1.00192  0.30s
      4     [36m0.01476[0m     [32m0.01476[0m    1.00007  0.30s
      5     [36m0.01354[0m     [32m0.01382[0m    0.97974  0.30s
      6     [36m0.01276[0m     [32m0.01314[0m    0.97139  0.29s
      7     [36m0.01214[0m     [32m0.01257[0m    0.96633  0.29s
      8     [36m0.01163[0m     [32m0.01209[0m    0.96236  0.28s
      9     [36m0.01121[0m     [32m0.01168[0m    0.95928  0.29s
     10     [36m0.01084[0m     [32m0.01134[0m    0.95530  0.29s
     11     [36m0.01052[0m     [32m0.01105[0

    118     [36m0.00430[0m     [32m0.00517[0m    0.83257  0.28s
    119     [36m0.00428[0m     [32m0.00515[0m    0.83171  0.29s
    120     [36m0.00426[0m     [32m0.00513[0m    0.83089  0.27s
    121     [36m0.00425[0m     [32m0.00512[0m    0.83005  0.26s
    122     [36m0.00423[0m     [32m0.00510[0m    0.82916  0.26s
    123     [36m0.00421[0m     [32m0.00508[0m    0.82833  0.27s
    124     [36m0.00419[0m     [32m0.00507[0m    0.82744  0.27s
    125     [36m0.00418[0m     [32m0.00505[0m    0.82667  0.26s
    126     [36m0.00416[0m     [32m0.00504[0m    0.82573  0.26s
    127     [36m0.00414[0m     [32m0.00502[0m    0.82495  0.26s
    128     [36m0.00412[0m     [32m0.00500[0m    0.82412  0.25s
    129     [36m0.00411[0m     [32m0.00499[0m    0.82318  0.28s
    130     [36m0.00409[0m     [32m0.00497[0m    0.82237  0.28s
    131     [36m0.00407[0m     [32m0.00496[0m    0.82152  0.28s
    132     [36m0.00406[0m     [32m0.00494[0

    239     [36m0.00287[0m     [32m0.00389[0m    0.73919  0.27s
    240     [36m0.00286[0m     [32m0.00388[0m    0.73832  0.25s
    241     [36m0.00286[0m     [32m0.00387[0m    0.73760  0.26s
    242     [36m0.00285[0m     [32m0.00387[0m    0.73701  0.26s
    243     [36m0.00284[0m     [32m0.00386[0m    0.73625  0.26s
    244     [36m0.00284[0m     [32m0.00385[0m    0.73561  0.25s
    245     [36m0.00283[0m     [32m0.00385[0m    0.73485  0.26s
    246     [36m0.00282[0m     [32m0.00384[0m    0.73420  0.25s
    247     [36m0.00281[0m     [32m0.00384[0m    0.73345  0.26s
    248     [36m0.00281[0m     [32m0.00383[0m    0.73284  0.26s
    249     [36m0.00280[0m     [32m0.00382[0m    0.73206  0.26s
    250     [36m0.00279[0m     [32m0.00382[0m    0.73131  0.26s
    251     [36m0.00279[0m     [32m0.00381[0m    0.73071  0.26s
    252     [36m0.00278[0m     [32m0.00381[0m    0.72989  0.26s
    253     [36m0.00277[0m     [32m0.00380[0

    370     0.00222     [32m0.00335[0m    0.66268  0.35s
    371     [36m0.00221[0m     [32m0.00334[0m    0.66085  0.30s
    372     [36m0.00220[0m     [32m0.00334[0m    0.65948  0.31s
    373     [36m0.00220[0m     [32m0.00333[0m    0.65920  0.42s
    374     [36m0.00220[0m     [32m0.00333[0m    0.65933  0.33s
    375     [36m0.00219[0m     [32m0.00333[0m    0.65854  0.30s
    376     [36m0.00219[0m     [32m0.00333[0m    0.65755  0.30s
    377     [36m0.00218[0m     [32m0.00333[0m    0.65670  0.29s
    378     [36m0.00218[0m     [32m0.00333[0m    0.65619  0.32s
    379     [36m0.00218[0m     [32m0.00332[0m    0.65590  0.30s
    380     [36m0.00217[0m     [32m0.00332[0m    0.65537  0.30s
    381     [36m0.00217[0m     [32m0.00331[0m    0.65485  0.30s
    382     [36m0.00217[0m     [32m0.00331[0m    0.65422  0.34s
    383     [36m0.00216[0m     [32m0.00331[0m    0.65385  0.34s
    384     [36m0.00216[0m     [32m0.00331[0m    0.65

NeuralNet(X_tensor_type=None,
     batch_iterator_test=<nolearn.lasagne.base.BatchIterator object at 0x1114c1ef0>,
     batch_iterator_train=<nolearn.lasagne.base.BatchIterator object at 0x1114c1eb8>,
     check_input=True, custom_scores=None, hidden_num_units=100,
     input_shape=(None, 9216),
     layers=[('input', <class 'lasagne.layers.input.InputLayer'>), ('hidden', <class 'lasagne.layers.dense.DenseLayer'>), ('output', <class 'lasagne.layers.dense.DenseLayer'>)],
     loss=None, max_epochs=400, more_params={},
     objective=<function objective at 0x1116a31e0>,
     objective_loss_function=<function squared_error at 0x1113da1e0>,
     on_batch_finished=[],
     on_epoch_finished=[<nolearn.lasagne.handlers.PrintLog object at 0x1c29f02908>],
     on_training_finished=[],
     on_training_started=[<nolearn.lasagne.handlers.PrintLayerInfo object at 0x1c29f02940>],
     output_nonlinearity=None, output_num_units=30, regression=True,
     scores_train=[], scores_valid=[],
     train_s

* Comparación de metodos de optimización(by Alec Radford). La estrella denota el minimo global en el plano de error. Nosotros usaremos Nesterov's Accelerated Gradient Descent (NAG) en este tutorial.

<img src="img/methods.gif" />