Sophia (Greek for "wisdom") is a real-time recurrent neural network (RNN) agent based on Theano. It focuses on training and evaluating experimental RNN architectures for regression tasks on long and noisy inputs. Once trained, the RNN can be used as a plugin to an existing project (written in any language) to perform real-time estimation by communicating input/output data via interprocess communication (IPC) or over a network (TCP).
- BPTT(h; h') for more efficient training
- On-line, minibatch training on sequences of unequal lengths
- Inference on ensemble of heterogeneous models
- Scheduled learning rate annealing with patience
- LSTM/GRU with various unit options (weight normalization, layer normalization, batch normalization, residual gate, initial state learning, etc.), loss functions (L2, L1, Huber), initializations (uniform, orthogonal, Glorot), and optimizers (Nesterov, Adadelta, RMSProp, Adam)
- Options for learning time-dimension (e.g., Phased LSTM) or batch-dimension (e.g., sequence ID) parameters in addition to the standard hidden-dimension parameters
- Option for faster training by unrolling Theano scans (increases compile time)
- Self-documenting object oriented code
Project was developed on Ubuntu 16.04.1 LTS. Version numbers are not hard requirements.
- Python 2.7 or 3.5+
- Theano (0.9.0b1), along with its dependencies (CUDA 8.0.44, cuDNN 5.1.5, libgpuarray 0.6.0)
- ZeroMQ (pyzmq 16.0.2) : used for communicating real-time input/output data with an external process
- Input and target data preprocessed and saved as separate binary files (see below for format)
- Sibyl
:
sophia.py
may be used as the real-time RNN engine for this project
master
: for anything other than batch normalizationbn
: if using batch normalization (removes certain incompatible features)
- There is no application specific data processing in this project; the code assumes that the input has already been preprocessed, and that any post-processing of the output will be done by the receiving side
- A dataset is contained in a directory with
train.list
: list of training set sequencesdev.list
: list of development set sequences- For each sequence, a pair of
sequence_ID.input
/sequence_ID.target
in arbitrary subdirectories (in whichever way that makes sense for the data)
.list
files are'\n'
-delimited plain text lists of relative paths of.input
/.target
files minus the extensions (i.e., one entry ofrelative/path/to/sequence_ID
for each sequence)- Multiple sequences may share the same
sequence_ID
in different subdirectores if that makes sense for the data;sequence_ID
can be used as an input feature in this case .input
/.target
files are binary files each containing a flat array of little endian 4-byte floats in [time
][dimension
] order (stride of1
indimension
axis, stride ofdimension
intime
axis);- Time lengths of
.input
&.target
must match for the same sequence - Time lengths for different sequences do not need to match, unless using batch normalization (where all sequences in a minibatch must be synchronized)
- Options are configured in
train.py
option name | explanation |
---|---|
input_dim/target_dim | integers |
unit_type | 'fc'/'lstm'/'gru' |
loss_type | 'l2'/'l1'/'huber' |
net_width/net_depth | # of params ~ W2 D |
batch_size | minibatch size |
window_size/step_size | BPTT(window_size; step_size) |
*_norm | False/True (weight/layer/batch norm) |
residual_gate | False/True |
learn_init_states | False/True |
learn_id_embedding | False/True |
learn_clock_params | False/True |
update_type | 'sgd'/'momentum'/'nesterov' |
force_type | 'vanilla'/'adadelta'/'rmsprop'/'adam' |
frames_per_epoch | time_indices * batch_size per epoch |
lr_*, max_retry | for learning rate annealing with patience |
unroll_scan | trades memory consumption & slower compile time for faster training |
- Instructions for launching a training instance is provided in
train.py
heading - Multiple instances may be launched if needed
- By providing a
--load_from
flag, the RNN can be trained starting from an already trained RNN; this may help getting out of saddle points on some tasks - Sample
train.py
output:
sophia.py
provides an interface for real-time communication with an external process which is not necessarily written in Python; seesophia.py
for the communication protocol (handshake + data exchange)- It is assumed that the same pre-/post-processing is applied to the input/output of the RNN as was used during training
- If the use case is simple, it may be directly implemented in Python in
a manner similar to
sophia.py
- np = numpy, th = theano, tt = theano.tensor
prefix | type | explanation | how it's created |
---|---|---|---|
(none) | built-in/NumPy | variable on CPU memory | the usual way |
v_ | th.SharedVariable | variable on GPU memory; has init value | th.shared |
s_ | symbolic node | graph node; has no substance until compiled | tensor literals (tt.alloc/zeros/ones, etc) calculations on v_/s_ |
p_ | symbolic node | same as above, but marked for use as input/output ports for th.function |
inputs: tt.scalar/vector/matrix/tensor3, etc outputs: calculations on v_/s_ |
f_ | callable object | links built-in/NumPy to input/output ports all v_ updates take place via f_ |
th.function |
- Prefix not used inside Layer.setup_graph as there is no need for distinction there
suffix | explanation |
---|---|
<tensor_name>_[t][b][i/j/k] | [time][batch][hidden] dimensions |
<individual_name>s | collections (list, dict, etc) |
[optional], <required>, {default}, (miscellaneous)