# DL4J Neural Net Computer Vision Example

Neural networks are machine learning algorithms used for classificaiton and prediction, which works well with high dimensionality data. This notebook provides sample code on how to structure, run and save a neural net using Deeplearning4j (DL4J) for a simplified computer vision problem. This notebook's example uses animal images with the goal to correctly identify the picture.

<img src="nn_diagram.jpg">

Neural nets work especially well with image and text datasets that have many examples of each classification. The data is converted to a numerical representation and fed into the model where each node in the net applies a linear and non-linear transformation.

>***linear equation***<br>
>$\mathbf{z_k}= \sum_{j=1} \mathbf{w_{k,j}}\mathbf{x_j} + \mathbf{b_k}$


>***non-linear equation - sigmoid***<br>
>$\mathbf{y= \dfrac{1}{(1+\mathrm{e}^{-z})}}$

The input ($\mathbf{x}$) is the data fed into the net. Each node multiplies a weight ($\mathbf{w}$) to the input, sums the product in that node and then applies a bias ($\mathbf{b}$).  The network weights and bias, also known as parameters ($\mathbf{\theta}$), are used to fit the model to its objective (goal). In order to accomplish this, gradient descent optimization techniques are used to ***find the optimal weights*** that will lead to correct classification. Gradient descent uses the derivative (gradient) of the calculated model loss (prediction error) in order to shift each of the weights in the direction on the error curve that will reduce the error.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/1e/Extrema_example.svg/600px-Extrema_example.svg.png">

<center>- Wikipedia 

More information on DL4J and how neural nets function can be found in the links below and the References section:
- DL4J http://deeplearning4j.org/documentation.html
- Neural Nets for Newbies https://youtu.be/Cu6A96TUy_o

## Requirements

- [Java 7+](http://nd4j.org/getstarted.html#java)
- [Maven 3.3.9](http://nd4j.org/getstarted.html#maven)

## Setting Dependencies

In [1]:
//Below is for Jupyter-Scala notebook. If iScala is used then below should change to load dependencies
load.resolver("DefaultMavenRepository" at "https://repo1.maven.org/maven2")



In [2]:
val dl4jVersion = "0.4-rc3.10"
val nd4jVersion = "0.4-rc3.10"
val canovaVersion = "0.0.0.16"

[36mdl4jVersion[0m: java.lang.String = [32m"0.4-rc3.8"[0m
[36mnd4jVersion[0m: java.lang.String = [32m"0.4-rc3.8"[0m
[36mcanovaVersion[0m: java.lang.String = [32m"0.0.0.14"[0m

In [None]:
load.ivy("org.deeplearning4j" % "deeplearning4j-core" % dl4jVersion)
load.ivy("org.deeplearning4j" % "deeplearning4j-nlp" % dl4jVersion)
load.ivy("org.deeplearning4j" % "deeplearning4j-ui" % dl4jVersion)
load.ivy("org.nd4j" % "nd4j-x86" % nd4jVersion)
load.ivy("canova-spark" % "org.nd4j" % canovaVersion)
load.ivy("canova-nd4j-codec" % "org.nd4j" % canovaVersion)
load.ivy("canova-nd4j-image" % "org.nd4j" % canovaVersion)

In [None]:
import java.io.{File, IOException}
import java.util
import java.util.Random

import org.apache.commons.io.{FilenameUtils}
import org.canova.api.io.filters.BalancedPathFilter
import org.canova.api.io.labels.ParentPathLabelGenerator
import org.canova.api.records.reader.RecordReader
import org.canova.api.split.{FileSplit, InputSplit}
import org.canova.image.loader.BaseImageLoader
import org.canova.image.recordreader.ImageRecordReader
import org.deeplearning4j.datasets.canova.RecordReaderDataSetIterator
import org.deeplearning4j.datasets.iterator.{DataSetIterator, MultipleEpochsIterator}
import org.deeplearning4j.nn.api.OptimizationAlgorithm
import org.deeplearning4j.nn.conf.layers.{ConvolutionLayer, DenseLayer, LocalResponseNormalization, OutputLayer, SubsamplingLayer}
import org.deeplearning4j.nn.conf.{GradientNormalization, MultiLayerConfiguration, NeuralNetConfiguration, Updater}
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork
import org.deeplearning4j.nn.weights.WeightInit
import org.deeplearning4j.optimize.listeners.ScoreIterationListener
import org.deeplearning4j.util.NetSaverLoaderUtils
import org.nd4j.linalg.dataset.DataSet
import org.nd4j.linalg.lossfunctions.LossFunctions

## Loading Data

The first step is to clean up and load the data for training and testing.
- store the data in a folder that the model can load from
- confirm the formats are the same (e.g. pictures exist and have similar sizes)
- convert data to a DataSet structure (numerical feature format and labels)
- setup the data to load in batches inside an iterator

Something to be aware of with data is the difference between supervised and unsupervised which just means labeled and unlabeled. In this example, we have labeled images we are working with; thus, it's supervised.

### *Data*

Images provided in this example are from the U.S Fish and Wildlife Service and they are in the public domain. There are four categories with ~ 20 images each, in the dataset and the categories are:

- bear
- deer
- duck
- turtle

The images vary in pixel size, and they are all RGB encoded, which means they have 3 color channels.

<center>***Image Example***</center>

<img src="animals/turtle/Blandings_Turtle.jpg">
<center> - U.S. Fish and Wildlife Service

The code below is used to load the data into a DataSetIterator format that can be fed into the network. ImageRecordReader handles loading and vectorizing the images, and  RecordReaderDataSetIterator converts images into a DataSet format. It generates the iterator that will only load the data when next is called.

In [None]:
// Load images and labels
val height = 50
val width = 50
val channels = 3
val numExamples = 80
val numLabels = 4
val batchSize = 20
val splitTrainTest = 0.8
val rng = new Random(seed)
val labelPosition = 1

val mainPath: File = new File("animals") 

// Define how to filter and load data into batches
val recordReader: RecordReader = new ImageRecordReader(width, height, channels, new ParentPathLabelGenerator())
val fileSplit: FileSplit = new FileSplit(mainPath, BaseImageLoader.ALLOWED_FORMATS,rng)
val pathFilter: BalancedPathFilter = new BalancedPathFilter(rng, BaseImageLoader.ALLOWED_FORMATS, new ParentPathLabelGenerator, numExamples, numLabels, 0, batchSize)

// Define train and test split
val inputSplit: Array[InputSplit] = fileSplit.sample(pathFilter, numExamples * (1 + splitTrainTest), numExamples * (1 - splitTrainTest))
val trainData: InputSplit = inputSplit(0)
val testData: InputSplit = inputSplit(1)

// Define how to load data into network
try {
  recordReader.initialize(trainData)
} catch {
  case ioe: IOException => ioe.printStackTrace()
  case e: InterruptedException => e.printStackTrace()
}
var dataIter: DataSetIterator = new RecordReaderDataSetIterator(recordReader, batchSize, labelPosition, numLabels)
val multDataIter: MultipleEpochsIterator = new MultipleEpochsIterator(epochs, dataIter)

When working with computer vision models, you will want many more examples to run through your model for it to build a solid representation of the different animals. The sample set is too small to achieve high accuracy scores. Some approaches to expand the dataset when you have sparse examples are the following:
- flip images by various degrees
- change the color saturation (including change to grey scale)
- change image contrast or brightness
- crop the image in different positions
- search and download more examples

## Configuring

Model configuration takes experimentation to get familiar with all the options. Below outlines key attributes that you can define in the your configuration:

- ***weightInit*** = how to initialize weights which is typically a variation on random unless you load weights defined from a previous model
- ***seed*** = locks random weight initialization for consistent results when checking and adjusting hyperparameter impact
- ***activation*** = non-linear function applied to the output of each node in the layer
- ***iterations*** = how many times to run each batch through each layer
- ***optimizationAlgo*** = convex optimizer that calculates and applies loss function gradients to parameter updates
- ***updater*** = equation applies gradient adjustments (e.g. Nesterovs applies momentum to the learning rate for the gradient update)
- ***learningRate ($\alpha$)*** = the step to take down or up the optimizer algorithm to improve model convergence
- ***regularization*** = whether to apply weight decay to penalize large weights and bias and prevent overfitting (e.g. $\ell1$ is best for sparse data and $\ell2$ is good with minimizing prediction error)
- ***gradientNormalization*** = regularization approach to smooth gradient results
- ***layer*** = construct to define each layer and requires a name or number when there is more than one layer
- ***backprop*** = true or false signals whether to apply backprop to the model for weight updates

Note, most of these can be defined globally or inside the construct of each layer. 

### *Variables*

In [None]:
val seed = 42
val iterations = 1
val epochs = 5
val listenerFreq = 1
val weightInit = WeightInit.XAVIER
val activation = "relu"
val optimizer = OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT

***Computer Vision Common Configuration***<br>
It's good to start with common configuration approaches like the ones provided below, and use training and tunning to modify hyperparameters. More information on this topic is covered in the Tuning section. 

- ***WeightInit.XAVIER*** = initializing weights in network by drawing them from uniform distribution and dividing by weight matrix size
> $\mathbf{w = \dfrac{1}{\sqrt{n_{in}+ n_{out}}}}$
- ***"relu"*** = rectifed linear unit is an activation function that helps prevent gradient vanishing because it sets the activation threshold at zero
> $\mathbf{f(x)=max(0,x)}$
- ***LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD*** *(aka cross-entropy)* = evaluates and scores model error in the output layer
> $\mathbf{H_y{'}(y) = -\sum_{i}y_i^{'}log(y_i)}$
- ***OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT*** = how to update weights & bias based on error gradient from the full training set and the updater changes
> $\mathbf{w = w -\alpha(H_y{'}(y))}$ <br>
> $\mathbf{b = b -\alpha(H_y{'}(y))}$
- ***ConvolutionLayer*** = type of feed-forward net where nodes are tiled to respond to overlapping regions in the dataset and kernel size, stride, padding and number of feature matrices are used to convolve (reshape) the input
- ***SubsamplingLayer*** = layer type that reduces the dimension of the signal and typically applies kernal size of 2 or 3 with a stride of 2

***Tiny ImageNet Example***<br>
Below are two different example configurations. First, is pulled from the Tiny ImageNet paper that provides guidance on how to build as compact a deep model as possible to be effective in image classification.

In [None]:
// Tiny ImageNet Example
val confTiny: MultiLayerConfiguration = new NeuralNetConfiguration.Builder()
  .weightInit(weightInit)
  .seed(seed)
  .activation(activation)
  .iterations(iterations)
  .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer)
  .optimizationAlgo(optimizer)
  .updater(Updater.NESTEROVS)
  .learningRate(0.01)
  .momentum(0.9)
  .regularization(true)
  .l2(0.04)
  .list()
  .layer(0, new ConvolutionLayer.Builder(5, 5)
    .name("cnn1")
    .nIn(channels)
    .stride(1, 1)
    .padding(2, 2)
    .nOut(32)
    .build())
  .layer(1, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
    .kernelSize(3, 3)
    .name("pool1")
    .build())
  .layer(2, new LocalResponseNormalization.Builder(3, 5e-05, 0.75).build())
  .layer(3, new ConvolutionLayer.Builder(5, 5)
    .name("cnn2")
    .stride(1, 1)
    .padding(2, 2)
    .nOut(32)
    .build())
  .layer(4, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
    .kernelSize(3, 3)
    .name("pool2")
    .build())
  .layer(5, new LocalResponseNormalization.Builder(3, 5e-05, 0.75).build())
  .layer(6, new ConvolutionLayer.Builder(5, 5)
    .name("cnn3")
    .stride(1, 1)
    .padding(2, 2)
    .nOut(64)
    .build())
  .layer(7, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
    .kernelSize(3, 3)
    .name("pool3")
    .build())
  .layer(8, new DenseLayer.Builder()
    .name("ffn1")
    .nOut(250)
    .dropOut(0.5)
    .build())
  .layer(9, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
    .nOut(numLabels)
    .activation("softmax")
    .build())
  .backprop(true).pretrain(false)
  .cnnInputSize(height, width, channels).build()

***AlexNet Example***<br>
 The second configuration is a slight variant on AlexNet which won the ImageNet competition in 2012 for image classification.

In [None]:
// AlexNet Example
val nonZeroBias = 1
val dropOut = 0.5
val poolingType: SubsamplingLayer.PoolingType = SubsamplingLayer.PoolingType.MAX

val confAlexNet: MultiLayerConfiguration = new NeuralNetConfiguration.Builder()
    .weightInit(weightInit)
    .seed(seed)
    .activation(activation)
    .iterations(iterations)
    // normalize to prevent vanishing or exploding gradients
    .gradientNormalization(GradientNormalization.RenormalizeL2PerLayer) 
    .optimizationAlgo(optimizer)
    .updater(Updater.NESTEROVS)
    .learningRate(1e-3)
    .momentum(0.9)
    .regularization(true)
    .l2(5 * 1e-4)
    .miniBatch(false)
    .list()
    .layer(0, new ConvolutionLayer.Builder(new int[]{11, 11}, new int[]{4, 4}, new int[]{3, 3})
            .name("cnn1")
            .nIn(channels)
            .nOut(96)
            .build())
    .layer(1, new LocalResponseNormalization.Builder()
            .name("lrn1")
            .build())
    .layer(2, new SubsamplingLayer.Builder(poolingType, new int[]{3, 3}, new int[]{2, 2})
            .name("pool1")
            .build())
            //conv2
    .layer(3, new ConvolutionLayer.Builder(new int[]{5, 5}, new int[]{1, 1}, new int[]{2, 2})
            .name("cnn2")
            .nOut(256)
            .biasInit(nonZeroBias)
            .build())
    .layer(4, new LocalResponseNormalization.Builder()
            .name("lrn2")
            .k(2).n(5).alpha(1e-4).beta(0.75)
            .build())
    .layer(5, new SubsamplingLayer.Builder(poolingType, new int[]{3, 3}, new int[]{2, 2})
            .name("pool2")
            .build())
    .layer(6, new ConvolutionLayer.Builder(new int[]{3, 3}, new int[]{1, 1}, new int[]{1, 1})
            .name("cnn3")
            .nOut(384)
            .build())
    .layer(7, new ConvolutionLayer.Builder(new int[]{3, 3}, new int[]{1, 1}, new int[]{1, 1})
            .name("cnn4")
            .nOut(384)
            .biasInit(nonZeroBias)
            .build())
    .layer(8, new ConvolutionLayer.Builder(new int[]{3, 3}, new int[]{1, 1}, new int[]{1, 1})
            .name("cnn5")
            .nOut(256)
            .biasInit(nonZeroBias)
            .build())
    .layer(9, new SubsamplingLayer.Builder(poolingType, new int[]{3, 3}, new int[]{2, 2})
            .name("pool3")
            .build())
    .layer(10, new DenseLayer.Builder()
            .name("ffn1")
            .nOut(4096)
            .biasInit(nonZeroBias)
            .dropOut(dropOut)
            .build())
    .layer(11, new DenseLayer.Builder()
            .name("ffn2")
            .nOut(4096)
            .biasInit(nonZeroBias)
            .dropOut(dropOut)
            .build())
    .layer(12, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
            .name("output")
            .nOut(numLabels)
            .activation("softmax")
            .build())
    .backprop(true)
    .pretrain(false)
    .cnnInputSize(height,width,channels).build()

In [None]:
// Initialize the network and alternate which configuration to pass into MultiLayerNetwork
val network: MultiLayerNetwork = new MultiLayerNetwork(confTiny)
network.init()

***Listeners***

Apply setListeners to the network to get information on how the model is performing. ScoreIterationListener is the simplest one to check if the model is converging in its predictions on the training data. Basically it shows how accurately the model is predicting the results of the training data. Typically, you work to lower the scores as close to zero as possible.

In [None]:
network.setListeners(new ScoreIterationListener(listenerFreq))

***Gradients***

Backpropagation is how you move the weight updates from stochastic gradient descent back into the model. Sometimes there are score results of NaN or 0 because the gradient explodes or vanishes. As changes are moved backwards through the layers in deep nets, if the gradient starts out too small then it can vanish causing the neurons in the beginning layers to learn more slowly than the neurons in the later layers. If the gradient starts out to big it can explode and no longer be useful for changes that the model can get signal on. More information on how to address these issues are in the References section. Just be aware this is common and requires tuning.

## Training

Once you've loaded the data and initialized the model configuration, train the model by calling fit on the configured network and passing in the dataset. The goal of training is to find weights and bias that will help the model classify with high accuracy while generalizing enough to perform well on new data the model hasn't seen yet.

In [None]:
network.fit(multDataIter)

## Tuning

Next to loading data and training time, tuning is a one of the key challenges to produce effective neural nets. Tuning refers to the process of selecting hyperparameters (such as the learning rate) in order to obtain good performance. If these hyperparameters are poorly chosen, the network may learn very slowly, or frequently may not learn at all.

To get a good sense of how to tune, spend time running different models and reading academic papers that outline various approaches. Below are a couple pointers to get you started:

***General***

Start with as few hyperparameters as possible in the configuration and focus on improving scores with those first. Try tuning one hyperparameter at a time and keep the others fixed. When it seems you can no longer improve the scores on it, change to a new one and be willing to go back to the first after you've made adjustments to other hyperparameters. 

***Learning Rate***

Learning rate is a good one to start with. Watch how the score changes. If it decreases smoothly till the final epoch, that's a good value to work with. If the score's progress is smooth early on and then has small random oscillations, or if the score climbs, then increase the learning rate. If the score has large oscillations from the start then decrease the learning rate. Initially, try shifting the rate by an order magnitude of 10 and then make smaller adjustments as you get closer to a smooth decrease in score.

***Mini-batch Size***

Mini-batch size makes a difference when tuning. If its too small then you aren't maximizing matrix library optimizations and too large leads to not updating the weights enough. Be aware that the size is independent of other hyperparameters; thus, you don't need tuned hyperparameters to find a good mini-batch size. Review how quickly you can improve accuracy to determine what size will work best. Common mini-batch sizes range between 30-120.

***Batch Normalization***

Batch normalization is the popular technique in the last year for deep neural net training because it leads to faster learning and higher overall accuracy. You can work with higher learning rates and avoid using regularization techniques like dropout. When passing in input, it is common to scale the input by shifting it to zero-mean and unit variance, but as the input passes through the net, it gets adjusted by the weights and bias which is known as "covariate shift". Using batch norm in each mini-batch and between layers helps to reset the input normalization.

***Automated Tuning***

Manual tuning is great to get a feel on how to use hyperparameters, but when you want to get quick results, automated tuning techniques will help cut down training time. There are many different approaches to try like grid, random and Bayesian hyperparameter optimization. 


For more information on tuning, check out the references below.

## Evaluating

After the model converges with regard to its loss function, you can run new test data through the model to see how well it generalizes its predictions. The test data should be a dataset that was not seen during training.

Example performance indicators:
- ***accuracy*** = number of correct predictions divided by total test examples 
- ***precision*** = number of correct positive predictions divided by total positive class values predicted
- ***recall*** = number of correct positive predictions divided by the total actual positive class values
- ***f1-score*** = measure of test accuracy as a balance between precision and recall

In [None]:
recordReader.initialize(testData)
dataIter = new RecordReaderDataSetIterator(recordReader, 20, 1, numLabels)
val eval = network.evaluate(dataIter, dataIter.getLabels())
print(eval.stats(true))

## Saving

Save the model configuration and parameters(weights & bias) when you are satisfied with its evaluation scores, or when you need to take a break from training and don't want to loose the progress you've made. 

In [None]:
val basePath = FilenameUtils.concat(System.getProperty("user.dir"))

In [None]:
NetSaverLoaderUtils.saveNetworkAndParameters(network, basePath)
NetSaverLoaderUtils.saveUpdators(network, basePath)

## Final Note

Now that you have trained, tuned, evaluated and saved your network and parameters, you can work on applying it to new image datasets and other problems. Go forth and have fun classifying images.

## References

For more information on how to develop neural nets as well as the datasets used here, below are additional resources to explore.

- Skymind: http://www.skymind.io/
- Tiny ImageNet Classification with CNN: http://cs231n.stanford.edu/reports/leonyao_final.pdf
- AlexNet: http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf & https://github.com/BVLC/caffe/blob/master/models/bvlc_alexnet/train_val.prototxt
- Neural Networks and Deep Learning: http://neuralnetworksanddeeplearning.com/chap3.html
- Neuarl Networks: http://nbviewer.jupyter.org/github/masinoa/machine_learning/blob/master/04_Neural_Networks.ipynb
- Visual Information Theory: https://colah.github.io/posts/2015-09-Visual-Information/
- Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift: http://jmlr.org/proceedings/papers/v37/ioffe15.pdf
- Deep Learning Book: http://www.deeplearningbook.org/
- Neural Networks for Machine Learning: https://www.coursera.org/course/neuralnets
- Convolutional Neural Networks for Visual Recognition: http://cs231n.github.io/
- ImageNet: http://image-net.org/
- U.S. Fish and Wildlife Service (animal sample dataset): http://digitalmedia.fws.gov/cdm/