# Predicting Images on CIFAR-10

* Author: Jay Huang
* E-mail: askjayhuang at gmail dot com
* GitHub: https://github.com/jayhuang1
* Created: 2018-01-01

This workshop predicts a class of an image using image recognition on the CIFAR-10 dataset. The consists of 60,000 32x32 color images containing one of 10 object classes, with 6000 images per class. The training set contains 50,000 images while the test set contains 10,000 images.

Classification machine learning algorithms such as Random Forest and Gaussian Naive Bayes will first be used. We will then use neural networks to see if we can get better results.

## Data Ingestion

The CIFAR-10 data was ingested from the Keras dataset module:

In [10]:
from keras.datasets import cifar10

(X_train, y_train), (X_test, y_test) = cifar10.load_data()

## Data Exploration

## Data Wrangling

For our classification implementation, the fit function in scikit-learn only accepts 2D arrays. Therefore, we need to reshape our data we downloaded from Keras from a 4D array into a 2D array:

In [20]:
nsamples, nx, ny, nz = X_train.shape
X_train_cl = X_train.reshape((nsamples, nx * ny * nz))
nsamples, nx, ny, nz = X_test.shape
X_test_cl = X_test.reshape((nsamples, nx * ny * nz))

y_train_cl = y_train
y_test_cl = y_test

For our neural networks implementation, the label data needs to be converted into a category matrix:

In [19]:
import keras

X_train_nn = X_train
X_test_nn = X_test
y_train_nn = keras.utils.to_categorical(y_train, 10)
y_test_nn = keras.utils.to_categorical(y_test, 10)

## Model Building

Let's build our model by first using conventional classification algorithms:

In [23]:
from sklearn.metrics import classification_report
import time

def fit_model():
    # Train model
    start = time.time()
    model.fit(X_train_cl, y_train_cl)
    duration = time.time() - start

    print("{:25} fit in: {:0.2f} seconds".format(model.__class__.__name__, duration))

    # Test model
    y_pred = model.predict(X_test_cl)

    print(classification_report(y_test_cl, y_pred, target_names=LABEL_NAMES))

In [None]:
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB

models = (SVC(),
          RandomForestClassifier(),
          GaussianNB())

for model in models:
    fit_model()

  y = column_or_1d(y, warn=True)


Let's then build a basic neural network model using Keras:

In [None]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Dense, Activation, Dropout, Flatten

model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same', input_shape=X_train.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(NUM_CLASSES))
model.add(Activation('softmax'))

sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)

model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy']
              )

model.summary()

early_stopping_monitor = EarlyStopping(patience=PATIENCE)

mh = model.fit(X_train, y_train, validation_data=(X_test, y_test),
               batch_size=BATCH_SIZE, epochs=EPOCHS, callbacks=[early_stopping_monitor])

## Model Evaluation

## Conclusion