# Keras

http://keras.io/

In [1]:
from keras.utils import np_utils
from keras.optimizers import SGD
from keras.layers.core import Dense
from keras.models import Sequential

Using Theano backend.


Preparing the toy dataset. Note that the iris data is sorted by the label. In such cases, the `validation_split` that we'll later use in `model.fit` will not work correctly. Thus, we should shuffle data before training.

https://github.com/fchollet/keras/issues/68

In [2]:
import numpy as np
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
from sklearn.cross_validation import train_test_split

iris = datasets.load_iris()
shuffle = np.arange(len(iris.data))
np.random.shuffle(shuffle)
X = iris.data[shuffle]
y = iris.target[shuffle]
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size = 0.3, random_state = 0 )

sc = StandardScaler()
X_train_sd = sc.fit_transform(X_train)
X_test_sd  = sc.transform(X_test)

We have to convert the class labels to one-hot format.

In [3]:
y_train_ohe = np_utils.to_categorical(y_train)

- The core datastructure of Keras is a model. e.g. `Sequential()`; a linear pile of layers.
- Stacking layers is as easy as `.add()`.
    - Since the first layer that we add is the input layer, we have to make sure that the input_dim attribute matches the number of features (columns) in the training set. Also, the number of output units (output_dim) and input units (input_dim) of two consecutive layers has to match. Finally, the number of units in the output layer should be equal to the number of unique class labels.
- Once your model looks good, configure its learning process with `.compile()`.
    - We can define our own optimizer to train the model. Here we use `SGD()`; stochastic gradient descent optimization. For the SGD, we can then define the `lr`; learning rate. The `decay`; weight decay. The `momentum`; This simply adds a fraction m of the previous weight update to the current one.
    - We set the cost or loss function. Here `categorical_crossentropy`; Simply refers to the cost function of logistic regression, note that the prediction is generalized to multi-class via the softmax activation in the final layer.

In [4]:
model = Sequential()

# this is equivalent to adding two hidden layers with 50 hidden units each
model.add( Dense( input_dim = X_train_sd.shape[1], output_dim = 50,
                  init = 'uniform', activation = 'sigmoid' ) )
model.add( Dense( input_dim = 50, output_dim = 50,
                  init = 'uniform', activation = 'sigmoid' ) )
model.add( Dense( input_dim = 50, output_dim = y_train_ohe.shape[1],
                  init = 'uniform', activation = 'softmax' ) )
sgd = SGD( lr = 0.001, decay = 1e-7, momentum = .9 )
model.compile( loss = 'categorical_crossentropy', optimizer = sgd )

After compiling the model, we can now train the model by calling `.fit()`.

In [5]:
# A simpler version for the small iris data
model = Sequential()                                                       
model.add( Dense( input_dim = X_train_sd.shape[1], output_dim = y_train_ohe.shape[1], 
                  init = 'uniform', activation = 'softmax' ) )                                          
model.compile( loss = 'mean_squared_error', optimizer = 'sgd' )

In [6]:
model.fit(
    X_train_sd, 
    y_train_ohe,
    nb_epoch = 30,
    batch_size = 1, # minibatch training 
    verbose = 0, # 1 for printing out the cost function
    validation_split = 0.1 # reserve 10 percent of the data for validation after each epoch
    # show_accuracy = True
)

<keras.callbacks.History at 0x110d9b550>

Predict the class labels and print out the accuracy.

In [7]:
# model.evaluate( X_test, y_test, batch_size = 30 )
y_train_pred = model.predict_classes( X_train_sd, verbose = 0 )
y_train_pred

array([2, 2, 0, 2, 0, 1, 2, 2, 2, 2, 0, 2, 0, 1, 0, 0, 2, 1, 2, 1, 0, 1, 2,
       0, 2, 2, 2, 2, 1, 1, 0, 1, 2, 2, 2, 0, 2, 0, 0, 2, 1, 2, 0, 0, 2, 2,
       2, 0, 1, 0, 0, 2, 0, 1, 2, 1, 1, 2, 1, 2, 0, 1, 0, 0, 1, 2, 0, 0, 0,
       0, 2, 2, 0, 0, 1, 0, 1, 2, 0, 2, 0, 2, 1, 0, 0, 0, 2, 0, 1, 2, 1, 2,
       1, 2, 2, 2, 0, 2, 1, 1, 0, 2, 2, 2, 1])

In [8]:
train_acc = float( np.sum( y_train == y_train_pred ) ) / X_train.shape[0]
print( 'Training accuracy: %.2f%%' % ( train_acc * 100 ) )

Training accuracy: 89.52%


Future work, Kaggle Example.

https://www.kaggle.com/c/otto-group-product-classification-challenge/forums/t/13632/achieve-0-48-in-5-min-with-a-deep-net-feat-batchnorm-prelu