# Learn Caffe
- Neo Xing, 2016/10
- **Incomplete...**
- Reference
    - [Caffe Tutorial](http://caffe.berkeleyvision.org/tutorial/)
    - [Brewing Deep Networks With Caffe](https://docs.google.com/presentation/d/1AuiPxUy7-Dgb36Q_SN8ToRrq6Nk0X-sOFmr7UnbAAHI/edit?usp=sharing)


## I Tutorial

### 1. Nets, Layers, and Blobs
- **the anatomy of a Caffe model**
- Caffe defines a net layer-by-layer in its own model schema. The network defines the entire model bottom-to-top from input data to loss.

- As data and derivatives flow through the network in the forward and backward passes Caffe stores, communicates, and manipulates the information as blobs: the blob is the standard array and unified memory interface for the framework.
- The layer comes next as the foundation of both model and computation. 
- The net follows as the collection and connection of layers. 

#### Blobs
- Caffe stores and communicates data using blobs. Blobs provide a unified memory interface holding data; e.g., batches of images, model parameters, and derivatives for optimization.

- The conventional blob dimensions for batches of image data are number N x channel K x height H x width W. Blob memory is row-major in layout, so the last / rightmost dimension changes fastest. For example, in a 4D blob, the value at index (n, k, h, w) is physically located at index ((n * K + k) * H + h) * W + w.
    - Number N is the batch size
    - Channel K is the feature dimension, for RGB images K = 3
    
- a Blob stores two chunks of memories, data and diff. The former is the normal data that we pass along, and the latter is the gradient computed by the network.

#### Layer computation and connections
- The layer is the essence of a model and the fundamental unit of computation. Most of the types needed for state-of-the-art deep learning tasks are there.

- A layer takes input through bottom connections and makes output through top connections.
- Each layer type defines three critical computations: setup, forward, and backward.

#### Net definition and operation
- The net jointly defines a function and its gradient by composition and auto-differentiation.
- The net is a set of layers connected in a computation graph – a directed acyclic graph (DAG) to be exact.
- The net is defined as a set of layers and their connections in a plaintext modeling language. 
- The models are defined in plaintext protocol buffer ([Google Protocol Buffer](https://code.google.com/p/protobuf/)) schema (`prototxt`) while the learned models are serialized as binary protocol buffer (binaryproto) `.caffemodel` files.

- A simple logistic regression classifier defined by following codes.

``` json
name: "LogReg"
layer {
  name: "mnist"
  type: "Data"
  top: "data"
  top: "label"
  data_param {
    source: "input_leveldb"
    batch_size: 64
  }
}
layer {
  name: "ip"
  type: "InnerProduct"
  bottom: "data"
  top: "ip"
  inner_product_param {
    num_output: 2
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip"
  bottom: "label"
  top: "loss"
}
```
<img src="http://caffe.berkeleyvision.org/tutorial/fig/logreg.jpg" width="320px">


### 2. Forward / Backward
- **the essential computations of layered compositional models called by `Solver`**

#### Forward
- The forward pass computes the output given the input for inference, from bottom to top.

<img src="http://caffe.berkeleyvision.org/tutorial/fig/forward.jpg" width="320px" alt="the forward pass">


#### Backward
- The backward pass computes the gradient given the loss for learning, from top to bottom.
- The gradient with respect to the rest of the model is computed layer-by-layer through the chain rule. 

<img src="http://caffe.berkeleyvision.org/tutorial/fig/backward.jpg" width="320px" alt="the backward pass">

### 3. Loss
- **the task to be learned is defined by the loss**
- The loss in Caffe is computed by the Forward pass of the network. 

#### Loss weights
- For nets with multiple layers producing a loss, loss weights can be used to specify their relative importance.
- Any layer able to backpropagate may be given a non-zero loss_weight.
- The final loss in Caffe, then, is computed by summing the total weighted loss over the network.
- the above SoftmaxWithLoss layer could be equivalently written as:

``` json
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "pred"
  bottom: "label"
  top: "loss"
  loss_weight: 1
}
```

### 4. Solver
- **the solver coordinates model optimization**

- The solver orchestrates model optimization by coordinating the network’s forward inference and backward gradients to form parameter updates that attempt to improve the loss. 
- The responsibilities of learning are divided between the Solver for overseeing the optimization and generating parameter updates and the Net for yielding loss and gradients.

#### Methods, ...
- Stochastic Gradient Descent (type: "SGD")
- AdaDelta (type: "AdaDelta")
- Adaptive Gradient (type: "AdaGrad")
- Adam (type: "Adam")
- Nesterov’s Accelerated Gradient (type: "Nesterov")
- RMSprop (type: "RMSProp")

- To use a learning rate policy like this, you can put the following lines somewhere in your solver prototxt file:

``` bash
base_lr: 0.01     # begin training at a learning rate of 0.01 = 1e-2
lr_policy: "step" # learning rate policy: drop the learning rate in "steps"
                  # by a factor of gamma every stepsize iterations
gamma: 0.1        # drop the learning rate by a factor of 10
                  # (i.e., multiply it by a factor of gamma = 0.1)
stepsize: 100000  # drop the learning rate every 100K iterations
max_iter: 350000  # train for 350K iterations total
momentum: 0.9
```

#### Scaffolding
- The solver scaffolding prepares the optimization method and initializes the model to be learned

#### Updating Parameters
- The actual weight update is made by the solver then applied to the net parameters in Solver, incorporates any weight decay r(W) and scaled by the learning rate α.

#### Snapshotting and Resuming
- The solver snapshots the weights and its own state during training in Solver.
- The weight snapshots export the learned model while the solver snapshots allow training to be resumed from a given point.
- Snapshotting is configured in the solver definition prototxt.

``` bash
# The snapshot interval in iterations.
snapshot: 5000
# File path prefix for snapshotting model weights and solver state.
# Note: this is relative to the invocation of the `caffe` utility, not the
# solver definition file.
snapshot_prefix: "/path/to/model"
# Snapshot the diff along with the weights. This can help debugging training
# but takes more storage.
snapshot_diff: false
# A final snapshot is saved at the end of training unless
# this flag is set to false. The default is true.
snapshot_after_train: true
```

### 5. Layer Catalogue
- **the layer is the fundamental unit of modeling and computation**
#### Vision Layers
- Vision layers usually take images as input and produce other images as output.

##### Convlolution
- The Convolution layer convolves the input image with a set of learnable filters, each producing one feature map in the output image.
- Sample (as seen in ./models/bvlc_reference_caffenet/train_val.prototxt)
``` json
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  # learning rate and decay multipliers for the filters
  param { lr_mult: 1 decay_mult: 1 }
  # learning rate and decay multipliers for the biases
  param { lr_mult: 2 decay_mult: 0 }
  convolution_param {
    num_output: 96     # learn 96 filters
    kernel_size: 11    # each filter is 11x11
    stride: 4          # step 4 pixels between each filter application
    weight_filler {
      type: "gaussian" # initialize the filters from a Gaussian
      std: 0.01        # distribution with stdev 0.01 (default mean: 0)
    }
    bias_filler {
      type: "constant" # initialize the biases to zero (0)
      value: 0
    }
  }
}
```

##### Pooling
- Sample (as seen in ./models/bvlc_reference_caffenet/train_val.prototxt)
``` json
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3 # pool over a 3x3 region
    stride: 2      # step two pixels (in the bottom blob) between pooling regions
  }
}
```

##### Local Response Normalization (LRN)
- The local response normalization layer performs a kind of “lateral inhibition” by normalizing over local input regions.

#### Loss Layers
- Loss drives learning by comparing an output to a target and assigning cost to minimize. The loss itself is computed by the forward pass and the gradient w.r.t. to the loss is computed by the backward pass.

- Softmax, Layer type: SoftmaxWithLoss
- Sum-of-Squares/ Euclidean, Layer type: EuclideanLoss
- Hinge/ Margin, Layer type: HingeLoss
- Sigmoid Cross-Entropy, Layer type: SigmoidCrossEntropyLoss
- Infogain, Layer type: InfogainLoss

#### Activation / Neuron Layers
- In general, activation / Neuron layers are element-wise operators, taking one bottom blob and producing one top blob of the same size.

- ReLU/Rectified-Linear and Leaky-ReLU, Layer type: ReLU
- Sigmoid, Layer type: Sigmoid
- TanH / Hyperbolic Tangent, Layer type: TanH
- Absolute Value, Layer type: AbsVal
- Power, Layer type: Power
- BNLL, Layer type: BNLL

#### Data Layers
- Data enters Caffe through data layers: they lie at the bottom of nets.
- Common input preprocessing (mean subtraction, scaling, random cropping, and mirroring) is available by specifying `TransformationParameters`.

- Database, Layer type: Data
- In-Memory, Layer type: MemoryData
- HDF5 Input, Layer type: HDF5Data
- HDF5 Output, Layer type: HDF5Output
- Images, Layer type: ImageData

#### Common Layers
- Inner Product (Fully Connected layer), Layer type: InnerProduct
    - Sample
``` json
layer {
  name: "fc8"
  type: "InnerProduct"
  # learning rate and decay multipliers for the weights
  param { lr_mult: 1 decay_mult: 1 }
  # learning rate and decay multipliers for the biases
  param { lr_mult: 2 decay_mult: 0 }
  inner_product_param {
    num_output: 1000
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
  bottom: "fc7"
  top: "fc8"
}
```

- Reshape, Layer type: Reshape
    - The Reshape layer can be used to change the dimensions of its input, without changing its data.
    - Sample
``` json
  layer {
    name: "reshape"
    type: "Reshape"
    bottom: "input"
    top: "output"
    reshape_param {
      shape {
        dim: 0  # copy the dimension from below
        dim: 2
        dim: 3
        dim: -1 # infer it from the other dimensions
      }
    }
  }
```

- Splitting
    - The Split layer is a utility layer that splits an input blob to multiple output blobs. This is used when a blob is fed into multiple output layers.

- Flattening
    - The Flatten layer is a utility layer that flattens an input of shape n * c * h * w to a simple vector output of shape n * (c*h*w)

- Concatenation, Layer type: Concat
    - The Concat layer is a utility layer that concatenates its multiple input blobs to one single output blob.
    - Sample
``` json
layer {
  name: "concat"
  bottom: "in1"
  bottom: "in2"
  top: "out"
  type: "Concat"
  concat_param {
    axis: 1
  }
}
```

- Slicing
    - The Slice layer is a utility layer that slices an input layer to multiple output layers along a given dimension (currently num or channel only) with given slice indices.
    - Sample
``` json
layer {
  name: "slicer_label"
  type: "Slice"
  bottom: "label"
  ## Example of label with a shape N x 3 x 1 x 1
  top: "label1"
  top: "label2"
  top: "label3"
  slice_param {
    axis: 1
    slice_point: 1
    slice_point: 2
  }
}
```

- Elementwise Operations
    - Eltwise
    - Argmax
    - Softmax
    - Mean-Variance Normalization


### 6. Data
- **how to caffeinate data for model input**

- Data layers load input and save output by converting to and from Blob to other formats. 
- Common transformations like mean-subtraction and feature-scaling are done by data layer configuration. 
- Data and Label
    - Data and Label: a data layer has at least one top canonically named data. For ground truth a second top can be defined that is canonically named label.
- Transformation
    - data preprocessing is parametrized by transformation messages within the data layer definition.
- Multiple Inputs
    - a Net can have multiple inputs of any number and type. 
    - Define as many data layers as needed giving each a unique name and top. Multiple inputs are useful for non-trivial ground truth: one data layer loads the actual data and the other data layer loads the ground truth in lock-step. In this arrangement both data and label can be any 4D array. 

- This data layer definition loads the MNIST digits.
``` json
layer {
  name: "mnist"
  # Data layer loads leveldb or lmdb storage DBs for high-throughput.
  type: "Data"
  # the 1st top is the data itself: the name is only convention
  top: "data"
  # the 2nd top is the ground truth: the name is only convention
  top: "label"
  # the Data layer configuration
  data_param {
    # path to the DB
    source: "examples/mnist/mnist_train_lmdb"
    # type of DB: LEVELDB or LMDB (LMDB supports concurrent reads)
    backend: LMDB
    # batch processing improves efficiency.
    batch_size: 64
  }
  # common data transformations
  transform_param {
    # feature scaling coefficient: this maps the [0, 255] MNIST data to [0, 1]
    scale: 0.00390625
  }
}
```


### 7. Interfaces
- **command line, Python, and MATLAB Caffe**
- Caffe has command line, Python, and MATLAB interfaces for day-to-day usage, interfacing with research code, and rapid prototyping. 

#### Command Line
- The command line interface – cmdcaffe – is the caffe tool for model training, scoring, and diagnostics.
- This tool and others are found in caffe/build/tools. 

- Training: caffe train learns models from scratch, resumes learning from saved snapshots, and fine-tunes models to new data and tasks:
    - All training requires a solver configuration through the -solver solver.prototxt argument.
    - Resuming requires the -snapshot model_iter_1000.solverstate argument to load the solver snapshot.
    - Fine-tuning requires the -weights model.caffemodel argument for the model initialization.
    - For example, you can run:
``` bash
# train LeNet
caffe train -solver examples/mnist/lenet_solver.prototxt
# train on GPU 2
caffe train -solver examples/mnist/lenet_solver.prototxt -gpu 2
# resume training from the half-way point snapshot
caffe train -solver examples/mnist/lenet_solver.prototxt -snapshot examples/mnist/lenet_iter_5000.solverstate
```
- Testing: caffe test scores models by running them in the test phase and reports the net output as its score. 
    - The net architecture must be properly defined to output an accuracy measure or loss as its output. 
    - The per-batch score is reported and then the grand average is reported last.
``` bash
# score the learned LeNet model on the validation set as defined in the
# model architeture lenet_train_test.prototxt
caffe test -model examples/mnist/lenet_train_test.prototxt -weights examples/mnist/lenet_iter_10000.caffemodel -gpu 0 -iterations 100
```

- Benchmarking: caffe time benchmarks model execution layer-by-layer through timing and synchronization. 
``` bash
# (These example calls require you complete the LeNet / MNIST example first.)
# time LeNet training on CPU for 10 iterations
caffe time -model examples/mnist/lenet_train_test.prototxt -iterations 10
# time LeNet training on GPU for the default 50 iterations
caffe time -model examples/mnist/lenet_train_test.prototxt -gpu 0
# time a model architecture with the given weights on the first GPU for 10 iterations
caffe time -model examples/mnist/lenet_train_test.prototxt -weights examples/mnist/lenet_iter_10000.caffemodel -gpu 0 -iterations 10
```

#### Python
- The Python interface – `pycaffe` – is the caffe module and its scripts in caffe/python. 
    - caffe.Net is the central interface for loading, configuring, and running models. caffe.Classifier and caffe.Detector provide convenience interfaces for common tasks.
    - caffe.SGDSolver exposes the solving interface.
    - caffe.io handles input / output with preprocessing and protocol buffers.
    - caffe.draw visualizes network architectures.
    - Caffe blobs are exposed as numpy ndarrays for ease-of-use and efficiency.

### MATLAB, ...
- The MATLAB interface – matcaffe – is the caffe package in caffe/matlab in which you can integrate Caffe in your Matlab code.

## Examples
### Image Classification and Filter Visualization
- [Classification: Instant Recognition with Caffe](http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb)

### Learning LeNet
- [Solving in Python with LeNet](http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/01-learning-lenet.ipynb)

### Editing model parameters
- [Net Surgery](http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb)

### CIFAR tutorial
- [Alex’s CIFAR-10 tutorial, Caffe style](http://caffe.berkeleyvision.org/gathered/examples/cifar10.html)

## Others, ...
### Install
### Config
### Tools
### [Caffe Model Zoo](https://github.com/BVLC/caffe/wiki/Model-Zoo)
- [Doc](http://caffe.berkeleyvision.org/model_zoo.html)

