Skip to content
gnawice edited this page Jul 8, 2018 · 7 revisions

Mojo's Layers

This library approaches machine learning a little differently than others. Some ideas are just borrowed directly from other papers or implementations. But liberties are taken with other ideas, so we end up with some non-standard layers.

Concatenation (or Resize) Layer

network.push_back("name", "concatenate <feature map size> <pad_type>")

network.push_back("name", "resize <feature map size> <pad_type>")

Since mojo will allow branching, the concatenation layer will concatenate output maps into a single set. It also can be used to pad or crop output maps since padding is not a property of the other layers.

Convolution Layer

network.push_back("name", "convolution <kernel size> <out channels> <stride/step size> <activation>")

This is a traditional convolution layer. Only square kernels are supported for now with step/stride=1. Optimized and SSE implementations exist for 2x2, 3x3 and 5x5 convolutions. This layer is a mess to get your head around since there are several nested loops and furthermore tricks like unwrapping 2d kernels to speed the dot product. GPU speedups were tried through OpenCV and ArrayFire but since everything else in the network was not on the GPU, the speed to go back and forth to the GPU made the hybrid GPU/CPU approach too slow. It will be revisited.

DeepCNet

network.push_back("name", "deepcnet <output channels> <activation>")

This layer holds 2 layers: a 2x2 convolution followed by a 2x2 max pool.

Dropout

network.push_back("name", "dropout <fraction to drop>")

Dropout is implemented as a layer type so you can place the dropout wherever you want. The dropout layer randomly cuts out nodes during training. It does nothing (but wastes time) for forward passes. Like the literature says, this helps prevent over-fitting the training data and builds redundant connections in the network.

Fractional Max Pooling

(currently not being maintained) network.push_back("name", "fractional_max_pool <out size>")

This is supposed to be like a Stochastic Pooling layer, but allows non-integer step/stride size. If the step size is sqrt(2) with pool size of 2, the idea is that you will randomize your step size to 1 or 2 so that it averages out to sqrt(2)=1.4. This implementation is not finished, but there was a simple ability to use a non-integer step size.

Fully Connected

network.push_back("name", "fully_connected <nodes> <activation>")

This is just the traditional fully connected layer you use at the end of your network or in constructing a MLP.

Grouped Convolution Layer

network.push_back("name", "group_convolution <kernel size> <out channels> <stride/step size> <groups> <activation>")

Like the convolution layer but adds parameter to specify the number of groups. Typically this is small (<10) or equal to the number of input and output groups, in which case it is a depth-wise convolution. If groups=1, this is the same as the normal convolution layer.

Input

network.push_back("name", "input <width> <height> <channels>")

Needs to be at the beginning of the network to match your input dimensions. Currently only square inputs are supported if you intend to do convolutions.

Max Feature Map (MFM)

network.push_back("name", "mfm <maps to pool>")

MFM is inspired by maxout networks. It pools feature maps and acts as an activation functions. This implementation of mfm pools across output maps.

Max Pooling

network.push_back("name", "max_pool <size> <stride>")

Nothing special here. Pretty standard implementation with unrolled loops for 2x, 3x, 4x pooling. Step/Stride size does not need to equal pool size. No additional padding is performed on the layers, so if you try to pool a 15x15 layer by a factor of 2, you will end up with a 7x7 layer.

Semi-Stochastic Pooling

network.push_back("name", "semi_stochastic_pool <size> <stride>")

The problem with Max pooling is that if you pool by 2x2, then you completely ignore 3 of the 4 signals, even if the other signals are of fairly high amplitude. Stochastic pooling addresses this by picking the signal based on probability determined by signal strength. If all signals are similar in a 2x2 pool, it is essentially a random sample. The Semi-Stochastic pooling is a short-cut implementation of stochastic pooling where the same concept is used, but only the top 2 signals are considered out of the pool.

Shuffle Layer

network.push_back("name", "shuffle <groups>")

Shuffles channels (ShuffleNet). For instance if you have 9 channels and are using 3 grouped convolutions we can represent the channel indices as: [0 1 2][3 4 5][6 7 8]. After shuffle with groups=3 the index of the channels will be: [1 4 8][3 7 2][6 1 5].

Softmax

network.push_back("name", "softmax <nodes>")

This is the same as fully connected layer with softmax activation.

Missing Layers

Average Pooling: To reduce code clutter, there is no average pooling. I'd sometimes use average before the FC layers, but you can just as easily skip that pooling and connect the FC directly to the last Conv layer. Since it is now becoming a thing to average over a map, it may need to be added.

Abs Max Pooling: This worked when I added it to tiny-cnn, but I could not get the idea to work in mojo-cnn.