[![mlpack-lab Image](https://img.shields.io/endpoint?url=https%3A%2F%2Flab.kurg.org%2Fstatus%2Fstatus.json)](https://lab.mlpack.org)

In [44]:
/**
 * @file mnist-cnn-cpp.ipynb
 *
 * An example of using Convolutional Neural Network (CNN) for
 * solving Digit Recognizer problem from Kaggle website.
 *
 * The full description of a problem as well as datasets for training
 * and testing are available here https://www.kaggle.com/c/digit-recognizer.
 */

In [41]:
!wget -c https://lab.mlpack.org/data/mnist.tar.gz -O - | tar -xz

tar: mnist-train.csv: Cannot utime: Operation not permitted
tar: Exiting with failure status due to previous errors


In [1]:
#include <mlpack/core.hpp>
#include <mlpack/core/data/split_data.hpp>
#include <mlpack/methods/ann/layer/layer.hpp>
#include <mlpack/methods/ann/ffn.hpp>
#include <ensmallen.hpp>

In [2]:
using namespace mlpack;

In [3]:
using namespace mlpack::ann;

In [25]:
/**
 * Returns labels bases on predicted probability (or log of probability)
 * of classes.
 *
 * @param predOut matrix contains probabilities (or log of probability) of
 *     classes. Each row corresponds to a certain class, each column corresponds to a data point.
 * @return a row vector of data points classes. The classes starts from 1 to the number of rows in input matrix.
 */
arma::Row<size_t> getLabels(const arma::mat& predOut)
{
  arma::Row<size_t> pred(predOut.n_cols);

  // Class of a j-th data point is chosen to be the one with maximum value
  // in j-th column plus 1 (since column's elements are numbered from 0).
  for (size_t j = 0; j < predOut.n_cols; ++j)
  {
    pred(j) = arma::as_scalar(arma::find(
        arma::max(predOut.col(j)) == predOut.col(j), 1)) + 1;
  }

  return pred;
}

In [26]:
/**
 * Returns the accuracy (percentage of correct answers).
 *
 * @param predLabels predicted labels of data points.
 * @param realY real labels (they are double because we usually read them from CSV file that contain many other double values).
 * @return percentage of correct answers.
 */
double accuracy(arma::Row<size_t> predLabels, const arma::mat& realY)
{
  // Calculating how many predicted classes are coincide with real labels.
  size_t success = 0;
  for (size_t j = 0; j < realY.n_cols; j++) {
    if (predLabels(j) == std::round(realY(j))) {
      ++success;
    }
  }

  // Calculating percentage of correctly classified data points.
  return (double)success / (double)realY.n_cols * 100.0;
}

In [34]:
/**
 * Saves prediction into specifically formated CSV file, suitable for
 * most Kaggle competitions.
 * @param filename the name of a file.
 * @param header the header in a CSV file.
 * @param predLabels predicted labels of data points. Classes of data points
 * are expected to start from 1. At the same time classes of data points in
 * the file are going to start from 0 (as Kaggle usually expects)
 */
void save(const std::string filename, std::string header,
  const arma::Row<size_t>& predLabels)
{
  std::ofstream out(filename);
  out << header << std::endl;
  for (size_t j = 0; j < predLabels.n_cols; ++j)
  {
    // j + 1 because Kaggle indexes start from 1
    // pred - 1 because 1st class is 0, 2nd class is 1 and etc.
    out << j + 1 << "," << std::round(predLabels(j)) - 1;
    // To avoid an empty line in the end of the file.
    if (j < predLabels.n_cols - 1)
    {
      out << std::endl;
    }
  }
  out.close();
}

In [6]:
// Dataset is randomly split into validation
// and training parts with following ratio.
constexpr double RATIO = 0.1;

// Number of iteration per cycle.
constexpr int ITERATIONS_PER_CYCLE = 10000;

// Number of cycles.
constexpr int CYCLES = 40;

// Step size of the optimizer.
constexpr double STEP_SIZE = 1.2e-3;

// Number of data points in each iteration of SGD.
constexpr int BATCH_SIZE = 50;

In [7]:
// Labeled dataset that contains data for training is loaded from CSV file. Rows represent features, columns represent data points.
arma::mat tempDataset;

// The original file can be downloaded from https://www.kaggle.com/c/digit-recognizer/data
data::Load("mnist-train.csv", tempDataset, true);

In [8]:
// The original Kaggle dataset CSV file has headings for each column, so it's necessary to get rid of the first row. In Armadillo representation, this corresponds to the first column of our data matrix.
arma::mat dataset = tempDataset.submat(0, 1, tempDataset.n_rows - 1, tempDataset.n_cols - 1);

In [9]:
// Split the dataset into training and validation sets.
arma::mat train, valid;
data::Split(dataset, train, valid, RATIO);

In [10]:
// The train and valid datasets contain both - the features as well as the
// class labels. Split these into separate mats.
const arma::mat trainX = train.submat(1, 0, train.n_rows - 1, train.n_cols - 1);
const arma::mat validX = valid.submat(1, 0, valid.n_rows - 1, valid.n_cols - 1);

In [11]:
// According to NegativeLogLikelihood output layer of NN, labels should
// specify class of a data point and be in the interval from 1 to
// number of classes (in this case from 1 to 10).

// Create labels for training and validatiion datasets.
const arma::mat trainY = train.row(0) + 1;
const arma::mat validY = valid.row(0) + 1;

In [12]:
// Specify the NN model. NegativeLogLikelihood is the output layer that
// is used for classification problem. RandomInitialization means that
// initial weights are generated randomly in the interval from -1 to 1.
FFN<NegativeLogLikelihood<>, RandomInitialization> model;

In [13]:
// Specify the model architecture.
// In this example, the CNN architecture is chosen similar to LeNet-5.
// The architecture follows a Conv-ReLU-Pool-Conv-ReLU-Pool-Dense schema. We
// have used leaky ReLU activation instead of vanilla ReLU. Standard
// max-pooling has been used for pooling. The first convolution uses 6 filters
// of size 5x5 (and a stride of 1). The second convolution uses 16 filters of
// size 5x5 (stride = 1). The final dense layer is connected to a softmax to
// ensure that we get a valid probability distribution over the output classes

// Layers schema.
// 28x28x1 --- conv (6 filters of size 5x5. stride = 1) ---> 24x24x6
// 24x24x6 --------------- Leaky ReLU ---------------------> 24x24x6
// 24x24x6 --- max pooling (over 2x2 fields. stride = 2) --> 12x12x6
// 12x12x6 --- conv (16 filters of size 5x5. stride = 1) --> 8x8x16
// 8x8x16  --------------- Leaky ReLU ---------------------> 8x8x16
// 8x8x16  --- max pooling (over 2x2 fields. stride = 2) --> 4x4x16
// 4x4x16  ------------------- Dense ----------------------> 10

// Add the first convolution layer.
model.Add<Convolution<> >(
  1,  // Number of input activation maps.
  6,  // Number of output activation maps.
  5,  // Filter width.
  5,  // Filter height.
  1,  // Stride along width.
  1,  // Stride along height.
  0,  // Padding width.
  0,  // Padding height.
  28, // Input width.
  28  // Input height.
  );

// Add first ReLU.
model.Add<LeakyReLU<> >();

// Add first pooling layer. Pools over 2x2 fields in the input.
model.Add<MaxPooling<> >(
  2,  // Width of field.
  2,  // Height of field.
  2,  // Stride along width.
  2,  // Stride along height.
  true
  );

// Add the second convolution layer.
model.Add<Convolution<> >(
  6,  // Number of input activation maps.
  16, // Number of output activation maps.
  5,  // Filter width.
  5,  // Filter height.
  1,  // Stride along width.
  1,  // Stride along height.
  0,  // Padding width.
  0,  // Padding height.
  12, // Input width.
  12  // Input height.
  );

// Add the second ReLU.
model.Add<LeakyReLU<> >();

// Add the second pooling layer.
model.Add<MaxPooling<> >(2, 2, 2, 2, true);

// Add the final dense layer.
model.Add<Linear<> >(16*4*4, 10);
model.Add<LogSoftMax<> >();

In [14]:
// Set parameters of Stochastic Gradient Descent (SGD) optimizer.
ens::SGD<ens::AdamUpdate> optimizer(
    // Step size of the optimizer.
    STEP_SIZE,
    // Batch size. Number of data points that are used in each iteration.
    BATCH_SIZE,
    // Max number of iterations.
    ITERATIONS_PER_CYCLE,
    // Tolerance, used as a stopping condition. Such a small value
    // means we almost never stop by this condition, and continue gradient
    // descent until the maximum number of iterations is reached.
    1e-8,
    // Shuffle. If optimizer should take random data points from the dataset at
    // each iteration.
    true,
    // Adam update policy.
    ens::AdamUpdate(1e-8, 0.9, 0.999));

In [15]:
for (int i = 0; i <= CYCLES; i++)
{
  // Train the CNN model. If this is the first iteration, weights are
  // randomly initialized between -1 and 1. Otherwise, the values of weights
  // from the previous iteration are used.
  model.Train(trainX, trainY, optimizer);

  // Don't reset optimizers parameters between cycles.
  optimizer.ResetPolicy() = false;

  // Matrix to store the predictions on train and validation datasets.
  arma::mat predOut;
  // Get predictions on training data points.
  model.Predict(trainX, predOut);
  // Calculate accuracy on training data points.
  arma::Row<size_t> predLabels = getLabels(predOut);
  double trainAccuracy = accuracy(predLabels, trainY);
  // Get predictions on validating data points.
  model.Predict(validX, predOut);
  // Calculate accuracy on validating data points.
  predLabels = getLabels(predOut);
  double validAccuracy = accuracy(predLabels, validY);

  std::cout << "Epoch " << i << ":\tTraining Accuracy = "<< trainAccuracy<< "%," <<"\tValidation Accuracy = "<< validAccuracy << "%" << std::endl;
}

[0;33m[WARN ] [0mThe optimizer's maximum number of iterations is less than the size of the dataset; the optimizer will not pass over the entire dataset. To fix this, modify the maximum number of iterations to be at least equal to the number of points of your dataset (37800).
Epoch 0:	Training Accuracy = 42.7672%,	Validation Accuracy = 42.8571%
[0;33m[WARN ] [0mThe optimizer's maximum number of iterations is less than the size of the dataset; the optimizer will not pass over the entire dataset. To fix this, modify the maximum number of iterations to be at least equal to the number of points of your dataset (37800).
Epoch 1:	Training Accuracy = 60.9418%,	Validation Accuracy = 60.7857%
[0;33m[WARN ] [0mThe optimizer's maximum number of iterations is less than the size of the dataset; the optimizer will not pass over the entire dataset. To fix this, modify the maximum number of iterations to be at least equal to the number of points of your dataset (37800).
Epoch 2:	Training Accuracy

In [27]:
std::cout << "Predicting ..." << std::endl;
// Load test dataset. The original file could be download from https://www.kaggle.com/c/digit-recognizer/data.
data::Load("mnist-test.csv", tempDataset, true);

Predicting ...


In [28]:
// As before, it's necessary to get rid of column headings.
arma::mat testX = tempDataset.submat(0, 1, tempDataset.n_rows - 1, tempDataset.n_cols - 1);
// Matrix to store the predictions on test dataset.
arma::mat testPredOut;

// Get predictions on test data points.
model.Predict(testX, testPredOut);
// Generate labels for the test dataset.
arma::Row<size_t> testPred = getLabels(testPredOut);
std::cout << "Saving predicted labels to results.csv."<< std::endl;

Saving predicted labels to results.csv.


In [42]:
// Saving results into Kaggle compatibe CSV file.
save("kaggel-results.csv", "ImageId,Label", testPred);
std::cout << "Results were saved to kaggel-results.csv. This file can be uploaded to "
    << "https://www.kaggle.com/c/digit-recognizer/submissions." << std::endl;

Results were saved to kaggel-results.csv. This file can be uploaded to https://www.kaggle.com/c/digit-recognizer/submissions.
