Neural networks in C++
This library provides components used for building advanced neural networks. It is built on Eigen (for speed and ease of development) and tries to be as flexible as possible. Some of its main features are :
- Very flexible architecture: neural networks consist of a graph of nodes (cycles allowed), which allows an easy implementation of feed-forward deep networks, recurrent networks, autoencoders, complex networks that do several things in parallel and then merge results, etc.
- Built on Eigen, known for its speed
- Time-series can be trained in an efficient and incremental fashion (you can predict t(0)...t(n), then train t(n), then predict t(n+1) without having to re-predict t(0)...t(n), then train t(n+1), etc)
- Fully unit-tested
The library consists of a
Network class, that manages training and prediction, and several
AbstractNode subclasses. Examples of usages of these classes can be found in the
tests/ directory, but here are some more detailed examples.
Creating a neural network starts by instantiating a
Network object, that will keep track of nodes :
Network *net = new Network(num_inputs);
Now that the
Network object has been created,
AbstractNode subclasses can be instantiated. The most important subclasses are
Dense (a fully-connected weighted dense layer, that learns its weights) and the different kinds of
Activation nodes (
Dense *layer1 = new Dense(num_hidden, 0.005); SigmoidActivation *layer1_act = new SigmoidActivation; Dense *layer2 = new Dense(num_outputs, 0.005); SigmoidActivation *layer2_act = new SigmoidActivation;
The above code snippet creates the nodes required for the implementation of a single-hidden-layer feed-forward perceptron. The first dense node connects the
num_inputs inputs of the network to the
num_hidden neurons of the hidden layer. The second dense node connects the
num_hidden neurons of the hidden layer to the
num_outputs output neurons. The constructor of
Dense takes as parameter the number of output neurons of the dense node. The number of input neurons will be inferred when the nodes are connected to each other (when
Dense knows from what to take its input).
Those two dense layers don't have any activation function by themselves (they use a linear activation), so two
SigmoidActivation nodes will be used in order to add activations to them.
Now that the nodes are created, they can be wired together. Each
AbstractNode subclass exposes an output port (producing values and consuming error signals), and can have one or several input ports. In this simple example, all the nodes used have only one input port.
layer1->setInput(net->inputPort()); layer1_act->setInput(layer1->output()); layer2->setInput(layer1_act->output()); layer2_act->setInput(layer2->output());
The last step, that is simple when building feed-forward neural networks but can become tricky when building recurrent networks, consists of adding the layers to the network. The layers must be added in the order in which they have to forward their values. In this simple example, we will add
layer2 and finally
layer2_act. If there are loops in the network graph, then an order has to be decided. It is usually the breadth-first order that is used.
net->addNode(layer1); net->addNode(layer1_act); net->addNode(layer2); net->addNode(layer2_act);
The first node added to
Network is the input (it receives input vectors). The last one is the output and produces the value returned by
This neural network is now complete and can be trained using
Network::train (performing a single gradient update on a single example). The
tests/utils.h file contains some utility functions that can be used in order to train a network on a batch of input/output samples.
Merge nodes are special types of nodes that take as many inputs as one wants, and merge them. Merging can be done either by adding the inputs (component-wise adding, so the first output neuron is the sum of the first neuron of all the inputs), or by multiplying them. Those nodes can be used to implement network that have gates: the output of a
Dense node is multiplied by another one, that serves as a gate, which allows to design gated recurrent networks that can learn to forget, copy, or anything else. Those merge nodes are used internally by the GRU node.
The following snippet shows to to create a network that learns a function like
(ax + b)*(cx + d), with the a, b, c and d parameters learned by the
Network *net = new Network(1); Dense *dense1 = new Dense(1, 0.05); Dense *dense2 = new Dense(1, 0.05); MergeSum *product = new MergeProduct; dense1->setInput(net->inputPort()); dense2->setInput(net->inputPort()); product->addInput(dense1->output()); product->addInput(dense2->output()); net->addNode(dense1); net->addNode(dense2); net->addNode(product);
Using recurrend nodes (
GRU for instance, which works a bit like the famous
LSTM networks but are sometimes a bit more efficient and stable) is easy. They are added to a network exactly like other nodes.
Building recurrent networks piece by piece, so by assembling
Activation and merge nodes, is a bit more complicated. The topology of the network first has to be designed (which can be done by looking at formulas or graphs explaining how the recurrent network works), then all the nodes have to be instantiated and wired in the correct way.
Once the wiring is correct, they have to be added to a
Network in the right order. The main principle is that, during the forward pass, no node can be forwarded until all its dependencies have been forwarded. If all your nodes depend on
h(t-1), then ensure that you set the output of the node reprenting
h to zero before starting to train a sequence, and add the networks so that the node (usually a
h to the input is forwarded first. The other nodes are usually easier to connect. Here is how the GRU unit has been connected :
// Wire h(t-1) to what depends on it (but not on other things) _nodes.push_back(loop_output_to_updates); _nodes.push_back(loop_output_to_resets); // resets merges the h(t-1) (wired in the above step) and user-specified reset signals _nodes.push_back(resets); _nodes.push_back(reset_activation); _nodes.push_back(reset_times_output); // depends on h(t-1) and reset_activation, okay to add here // Now that information can flow from resets to its activation to reset_times_output, reset_times_output can be used _nodes.push_back(loop_reset_times_output_to_inputs); // inputs depend on loop_reset_times_output_to_inputs, which has now been added to the network, and user-specific inputs. _nodes.push_back(inputs); _nodes.push_back(input_activation); // updates depends on loop_output_to_updates (h(t-1)) and user-specific updates. _nodes.push_back(updates); _nodes.push_back(update_activation); _nodes.push_back(oneminus_update_activation); _nodes.push_back(update_times_output); _nodes.push_back(oneminus_update_times_input); // This node also depends on input, hence its addition near the end of this code snippet. // Now that everything has been computed using h(t-1), the new values can be forwarded to the output, h(t) _nodes.push_back(output);