ChangeLog

# ChangeLog

## v2.4.0 (2017-11-27):

* Enabled pip install through pypi
* Updated MKLML to version 20171007 with performance improve of ~3X for mnist datalayer/nondatalayer and ~1.6X for DCGAN/WGAN datalayer
* Updated resnet model to optimize performance with MKLML 20171007
* Updated Alexnet weight file and fixed bug for deep dream
* Fixed faster-rcnn inference model loading issue
* Added data_loading time measurement and enabled GAN networks benchmarking
* Updated to Aeon version 1.2.0
* Enabled neon build with mklEngine on Windows systems

## v2.3.0 (2017-10-27):

* Optimized DeepSpeech2 MKL backend performance (~7X improvement over the CPU backend)
* Fused convolution and bias layer which significantly boosted AlexNet and VGG performance on Intel architectures with MKL backend
* Made SSD and Faster-RNN use VGG weight files in new format
* Fixed use of reset_cells hyperparameter
* Fixed MKL backend bug for GAN and Faster-RCNN models

## v2.2.0 (2017-09-27):

* Update MKLML version 20170908 that fixes a bug related to data conversions)
* Add SSD example for bounding box object detection that works for both GPU and MKL backend
* Add DeepSpeech2 MKL backend optimization that features ~3X improvement
* Update aeon to 1.0.0 including new version of manifest (doc/source/loading_data.rst#aeon-dataloader)
* Add CHWD Support for Batch Normalization in mkl backend
* Modify ResNet-50 model's last layer to match the original ResNet-50 model paper
* Enable Seq2Seq testing and benchmarking

## v2.1.0 (2017-08-02):

* Set MKL backend (-b mkl) as the default CPU backend on Linux (use -b cpu to specify original CPU backend)
* Update MKLML version 20170720 (AVX512 code paths enabled by default and conversion optimizations)
* Simplify ResNet example
* Makefiles now check for virtualenv and pkg-config (NervanaSystems/neon#383)
* Fix Deep Speech2 model on MKL backend
* Fix MKL installation for "make sysinstall"

## v2.0.0 (2017-06-27):

* Added support for MKL backend (-b mkl) on Linux, which boosts neon CPU performance significantly
* Added WGAN model examples for LSUN and MNIST data
* Enabled WGAN and DCGAN model examples for Python3
* Added fix (using file locking) to prevent race conditions running multiple jobs on the same machine with multiple GPUs
* Added functionality to display some information about hardware, OS and model used
* Updated appdirs to 1.4.3 to be compatibile on Centos 7.3 for appliance

## v1.9.0 (2017-05-03):

* Add support for 3D deconvolution
* Generative Adversarial Networks (GAN) implementation, and MNIST DCGAN example, following GoodFellow 2014 (http://arXiv.org/abs/1406.2661)
* Implement Wasserstein GAN cost function and make associated API changes for GAN models
* Add a new benchmarking script with per-layer timings
* Add weight clipping for GDM, RMSProp, Adagrad, Adadelta and Adam optimizers
* Make multicost an explicit choice in mnist_branch.py example
* Enable NMS kernels to work with normalized boxes and offset
* Fix missing links in api.rst [#366]
* Fix docstring for --datatype option to neon [#367]
* Fix perl shebang in maxas.py and allow for build with numpy 1.12 [#356]
* Replace os.path.join for Windows interoperability [#351]
* Update aeon to 0.2.7 to fix a seg fault on termination

## v1.8.2 (2017-02-23):

* Make the whale calls example stable and shuffle dataset before splitting into subsets
* Reduce default depth in cifar_msra example to 2
* Fix the formatting of the conv layer description
* Fix documentation error in the video-c3d example
* Support greyscale videos

## v1.8.1 (2017-01-17):

* Bug fix: Add dilation to object dict and assign defaults to dil_w = dil_h = 1 [#335, #336]
* Bug fix: Prevent GPU backend from ignoring non-zero slope in Rectlinclip and change default slope to 0
* Bug fix: Nesterov momentum was updating velocities incorrectly

## v1.8.0 (2016-12-28):

* Skip Thought Vectors (http://arxiv.org/abs/1506.06726) example
* Dilated convolution support
* Nesterov Accelerated Gradient option to SGD optimizer
* MultiMetric class to allow wrapping Metric classes
* Support for serializing and deserializing encoder-decoder models
* Allow specifying the number of time steps to evaluate during beam search
* A new community-contributed Docker image
* Improved error messages when a tensor is created with an invalid shape or reshaped to an incompatible size
* Fix bugs in MultiCost support
* Documentation fixes [#331]

## v1.7.0 (2016-11-21):

* Update Data Loader to aeon https://github.com/NervanaSystems/aeon for flexible,
  multi-threaded data loading and transformations
* Add Neural Machine Translation model
* Remove Fast RCNN model (use Faster RCNN model instead)
* Remove music_genres example
* Fix super blocking for small N with 1D conv
* Fix update-direct conv kernel for small N
* Add gradient clipping to Adam optimizer
* Documentation updates and bug fixes

## v1.6.0 (2016-09-21):

* Faster RCNN model
* Sequence to Sequence container and char_rae recurrent autoencoder model
* Reshape Layer that reshapes the input [#221]
* Pip requirements in requirements.txt updated to latest versions [#289]
* Remove deprecated data loaders and update docs
* Use NEON_DATA_CACHE_DIR envvar as archive dir to store DataLoader ingested data
* Eliminate type conversion for FP16 for CUDA compute capability >= 5.2
* Use GEMV kernels for batch size 1
* Alter delta buffers for nesting of merge-broadcast layers
* Support for ncloud real-time logging
* Add fast_style Makefile target
* Fix Python 3 builds on Ubuntu 16.04
* Run setup.py for sysinstall to generate version.py [#282]
* Fix broken link in mnist docs
* Fix conv/deconv tests for CPU execution and fix i32 data type
* Fix for average pooling with batch size 1
* Change default scale_min to allow random cropping if omitted
* Fix yaml loading
* Fix bug with image resize during injest
* Update references to the ModelZoo and neon examples to their new locations

## v1.5.4 (2016-07-15):

* Implement Binarized Neural Networks from http://arxiv.org/pdf/1602.02830v3.pdf
* Bug fixes [#268]

## v1.5.3 (2016-07-07):

* Bug fixes [#267]

## v1.5.2 (2016-07-06):

* Bug fixes to audio loader

## v1.5.1 (2016-06-30):

* Bug fixes

## v1.5.0 (2016-06-29):

### Modifications

* Python2/Python3 compatibility [#191]
* Support for Pascal GPUs
* Persistent RNN kernels [#262]
* Dataloader enhancements (audio loader with examples)
* HDF5 file data iterator
* Convolution kernel improvements
* Winograd kernel for fprop/bprop and 5x5 stride 1 filters
* API documentation improvements [#234, #244, #263]
* Cache directory cleanup
* Reorganization of all unit tests
* Check for compatible shapes before doing a memcpy [#182, #183]
* Bug fixes [#231, #241, #253, #257, #259]

## v1.4.0 (2016-04-29):

### Modifications

* VGG16 based Fast R-CNN model using winograd kernels
* new, backward compatible, generic data loader
* C3D video loader model trained on UCF101 dataset
* Deep Dream example
* make conv layer printout more informative [#222]
* fix some examples to use new arg override capability
* improve performance for relu for small N
* better support for arbitrary batch norm layer placement
* documentation updates [#210, #213, #236]

## v1.3.0 (2016-03-03):

### Modifications

* winograd kernels and associated autotuning routines
* benchmarking scripts
* deprecation of deterministic argument for backend constructor
* improve batch norm stability with fp16 backend
* allow strided support for dimshuffle kernel
* speed up zero momentum gradient descent

## v1.2.2 (2016-02-24):

### Modifications

* benchmarking enhancements
* fast dimshuffle, transpose, other kernel speedups and refactoring
* batch norm states fix, deterministic updates
* example fixes for fast rcnn and conv_autoencoder
* image decoding rescaling method fix
* deserialization fixes for RNN's, refactoring
* caffe compatibility fixes
* documentation updates

## v1.2.1 (2016-02-05):

### Modifications

* New MergeSum, Colornoise layers
* support for aspect_ratio scaling augmentation
* updated IMDB sentiment analysis example
* generic CSV batchwriter
* various build and deserialization bugfixes, doc updates

## v1.2.0 (2016-01-29):

### Modifications

* kepler GPU kernel support [#80]
* new dataloader format, updated docs [#115, #170]
* new serialization format
* FastRCNN implementation, ROI pooling support [#135]
* deep residual nets implementation and example
* expanded model zoo
* Ticker dataset and copy, repeat copy tasks
* autodiff transpose support [#173]
* numerous bug fixes and documentation updates.

## v1.1.5 (2016-01-13):

### Modifications

* CUDA kernels for lookuptable layer (up to 4x speedup)
* support for determinstic Conv layer updatesa
* LRN layer support
* custom dataset walkthrough utilizing bAbI data
* reduced number of threads in deep reduction EW kernels [#171]
* additional (de)serialization routines [#106]
* CPU tensor slicing fix
* corrections for PrecisionRecall, MultiLabelStats [#148]
* explicitly specify python2.7 for virtualenv [#155]
* default to SM50 when no working GPU found [#186]
* Add alpha to ELU activation [#164]
* deconv callback fix [#162]
* various documentation updates [#151, #152]

## v1.1.4 (2015-12-14):

### Modifications

* Add support for bidirectional RNNs and LSTMs
* added ELU, leaky ReLU activations
* significantly faster GPU kernel builds (using ptx instead of cuda-c)
* data shuffling enhancements, removal of old data loader code.
* caffe conv, pool, dropout layer matching and compatibility flags
* add scheduling support for RMSProp
* callback enhancements, additional unit tests
* documentation auditing, added links to introductory video tutorials

## v1.1.3 (2015-12-01):

### Modifications

* deconvolution and weight histogram visualization examples and documentation
* CPU convolution and pooling layer speedups (~2x faster)
* bAbI question and answer interactive demo, dataset support.
* various ImageLoader enhancements.
* interactive usage improvements (shortcut Callback import, multiple Callbacks
  init, doc updates, single item batch size support)
* set default verbosity level to warning
* CIFAR10 example normalization updates
* CUDA detection enhancements [#132]
* only parse batch_writer arguments when used as a script, allow undefined
  global_mean [#137, #140]

## v1.1.2 (2015-11-17):

### Modifications

* completely re-written C++ multithreaded dataloader
* new weight initialization options for recurrent layers
* Added deconvolution visualization support (guided backprop)
* new bAbI question answering example network
* Improved performance of cifar10_allcnn, word_lstm examples
* new CUDA-C max and avg pooling kernels
* Additional bugfixes and documentation updates

## v1.1.1 (2015-11-06):

### Modifications

* Callback initialization bug fix [#127]
* IMDB LSTM example bug fix [#130]
* Added cuda-convnet2 style binary dropout variant
* Added benchmark function to model (separate fprop, bprop, update timings)
* Remove h_buffer references in lieu of outputs for recurrent layers
* Multi-cost output buffer bugfix for inference [#131]
* New timeseries prediction and generation example
* Change Callback initialization to re-support named arguments.  Separate out
  these arguments in argparser. [#128]

## v1.1.0 (2015-10-30):

### Modifications

* Sentiment analysis support (LSTM lookupTable based), new IMDB example
* Support for merge and branch layer stacks via LayerContainers
  * Sequential, Tree, MergeBroadcast, MergeMultiStream
* Support for freezing layer stacks
* Adagrad optimizer support
* new GPU kernels for fast compounding batch norm, conv and pooling engine
  updates, new kernel build system and flags.
* Modifications for Caffe support
  * conv, pooling, P/Q updates, dropout layer normalization more in-line with
    Caffe approach.  NOTE: this breaks backwards compatibility with some
	strided conv/pool related models serialized using older versions of neon
	as the output sizes may now be different.  See the FAQ for more info.
  * serialization enhancements to make caffe model import/export easier
  * use per-channel mean subtraction instead of single global.  NOTE: this
    breaks backwards compatibility with ImgMaster saved datasets prior to this
	revision.  To correct, please use the included `update_dataset_cache.py`
	script in the util directory.
* Default training cost display during progress bar is now calculated on a
  rolling window basis rather than from the beginning of each epoch
* Separate Layer configuration and initialization steps
* YAML based alexnet example
* Callback enhancements.
  * now pass args instead of having to spell out callbacks in each example
  * Changed validation callback to loss callback, validation_frequency now
    evaluation_frequency
  * Generic metric callback.
* Various bug fixes
  * non-contiguous array get for GPUTensors
  * 1D slicing returns 2D matrices
  * bin/neon serialization fixes for RNNs
  * 3D conv fixes for fprop, bprop
  * batch norm inference fix
  * bias layer size fix
* Documentation updates and improvements

## v1.0.0 (2015-09-15):

### Modifications

Primarily bug fixes:

* Ensure root logging handler setup [#82]
* C++ utility for CUDA compatibility checking [#83]
* Add predict function to models [#86]
* Fix bug in learning rate schedule impacting deserialization
* Speed up batch norm computation
* Average gradients in OpTree, fix tests
* Use inference mode for fprop during validation
* Add top-k misclassifcation metric
* Simplify maxas install, make vis requirements optional, doc updates.

## v1.0.0rc1 (2015-09-08):

### Modifications

* RNN/LSTM
  * Code is cleaner and achieves state of the art results on the Penn Tree Bank
    dataset using RNN/LSTM/GRU
  * Fast image captioning model (~200x faster than CPU based NeuralTalk) on
    flickr8k dataset
* Basic automatic differentiation support
* Framework for visualizations (supported via callbacks)
* Top-down refactoring & redesign to enable quicker iteration while keeping the
  speedups offered by our nervanagpu kernels
  * Datasets are easier to specify
  * Backend now uses OpTrees (similar to nervanagpu) to support autodiff
  * nervanagpu merged in as a neon backend to simplify development and use
  * YAML syntax is simplified (but not backwards compatible)
  * Better documentation and wider test coverage

The following features will be added in upcoming releases:

* Advanced automatic differentiation & computational graph support
* Support for Kepler and older generation GPUs
* Multi-GPU support & hyperparameter optimization

This release was made possible thanks to the heroic efforts of the following
contributors:
* Yinyin Liu
* Yixing Lao
* Alex Park
* Evren Tumer
* Gabriel Pereyra
* JD Co-Reyes
* Will Constable
* Scott Leishman
* Angel Zhang
* Hunter Lang
* Arjun Bansal
* Anil Thomas
* Augustus Odena
* Urs Koster
* Scott Gray
* Jenkins


## v0.9.0 (2015-07-20)

### Modifications

* Version bump for the v0.9.0 release. [Scott Leishman]

* Merge branch 'MGPUMerge' into 'master' [Scott Leishman]

  Mgpu merge

  Cleanup on top of Multi-GPU branch, addresses small bugs and makes some readability improvements.

  See merge request !31

* Merge branch 'master' of gitlab.localdomain:algorithms/neon into MGPUMerge. [Scott Leishman]

  Incorporate validation_pct changes.

* Merge branch 'ZebTech-val_split'. [Scott Leishman]

  Adds ability to partition training set cases into validation set.
  Closes #58

* Add more documentation, remove duplicated test. [Scott Leishman]

* Implemented validation set splitting. [seba-1511]

* Set default device list in gen_backend. [Alex Park]

* Ignore mgpu tensor tests that require more than 1 device when only 1 device is present. [Alex Park]

* Remove references to DIST, MGPU tests with &lt; 2 devices. [Scott Leishman]

* Small fixes and code cleanup for MultiGPU. [Urs Koster]

* Implement multigpu processing using weird trick (data parallel for local layers, model parallel for fully-connected layers) Remove compatibility with cudanet backend for imageset based models Remove mpi-based parallelism Remove dependency on NoPar structures Fix check_grad to initialize with backend Remove need to link and initialize layers separately Add documentation for multiple gpu usage, device id specification. [Alex Park]


## v0.8.2 (2015-07-08)

### Modifications

* Version bump for the v0.8.2 release. [Scott Leishman]

* Merge branch 'more_misc_fixes' into 'master' [Urs Koster]

  Various bug fixes

  Collection of fixes to address issues with 1D CPU tensor handling, Leaky ReLU backprop, better GPU / CUDA checking, liveness indication for Tensor values, and a new dataset to highlight building regression models.

  See merge request !27

* Merge branch 'master' into more_misc_fixes. [Scott Leishman]

* Merge branch 'evren/refactor_tox' into 'master' [Scott Leishman]

  Evren/refactor tox

  The jenkins job for neon is using to to run tests with python 2.7 and 3.4 but the xUnit output from nosetests is getting overwritten since nosetests tries to write to the same file for both tests (2.7 and 3.4).  This fix puts makes the two files have different names.

  Instead of changing Makefile, I put the fix in tox.ini.

  Scott, I thought you would be best to look at this.

  See merge request !28

* Try another way without hardwiring test attr filters. [Evren Tumer]

* Changed tox.ini to stop py34 test from overwriting py27 nosetests.xml file. [Evren Tumer]

* Make octal numbers python2 and python3 compliant. [Scott Leishman]

* Use python3 compatible pickle. [Scott Leishman]

* Initial inroads into flatfile dataset. [Scott Leishman]

  Used for streaming (live) inference currently.  Other usecases do not yet work.

* Port of multi learning rule param serialization fix. [Scott Leishman]

  See #40162164201505

* Add missing rectlin_derivative from NervanaGPU backend. [Scott Leishman]

* Fix bug in RectLeaky bprop_func. [Scott Leishman]

  Added unit tests, closes #39973152922213

* Merge branch 'master' into more_misc_fixes. [Scott Leishman]

* Merge branch 'evren/myl-250/serialize-error' into 'master' [Scott Leishman]

  Evren/myl 250/serialize error

  Generated new test for serialization.

  Added feature for retaining a subset of the checkpoint files.

  Added test for checkpoint files.

  See merge request !22

* Python3 compatible print. [Scott Leishman]

* Remove no longer relevant comment. [Scott Leishman]

* Fix small style issue. [Evren Tumer]

* Merge branch 'evren/myl-250/serialize-error' of http://gitlab.localdomain/algorithms/neon into evren/myl-250/serialize-error. [Evren Tumer]

  rebased on master

* Added some docs for new checkpoint saving feature, added skip test for broken serialization tests. [Evren Tumer]

* Have make serialize run test_serialize.py. [Scott Leishman]

* Fixed style errors, remove i1k because it is freezing the test, added 'slow' label to the serialize test. [Evren Tumer]

* Exclude serialization test from make test. [Evren Tumer]

* Added checkpoint test. [Evren Tumer]

* Added test generator, added code to keep some checkpoint files, added comments. [Evren Tumer]

* Clean up hack. [Evren Tumer]

* Added cpu-&gt;gpu handoff and gpu-&gt;cpu handoff tests. [Evren Tumer]

* Fixed issue with python not expanding ~ in paths. [Evren Tumer]

* Added yml and fixed name bug. [Evren Tumer]

* Initial code for new serialization test. [Evren Tumer]

* Added some docs for new checkpoint saving feature, added skip test for broken serialization tests. [Evren Tumer]

* Have make serialize run test_serialize.py. [Scott Leishman]

* Fixed style errors, remove i1k because it is freezing the test, added 'slow' label to the serialize test. [Evren Tumer]

* Exclude serialization test from make test. [Evren Tumer]

* Added checkpoint test. [Evren Tumer]

* Added test generator, added code to keep some checkpoint files, added comments. [Evren Tumer]

* Clean up hack. [Evren Tumer]

* Added cpu-&gt;gpu handoff and gpu-&gt;cpu handoff tests. [Evren Tumer]

* Fixed issue with python not expanding ~ in paths. [Evren Tumer]

* Added yml and fixed name bug. [Evren Tumer]

* Initial code for new serialization test. [Evren Tumer]

* Merge branch 'fully_connected_layer_unit_tests' into 'master' [Scott Leishman]

  Fully connected layer unit tests

  CPU unittests of fprop/brop for FCLayer that check if output/delta buffers are set to the right size.

  See merge request !24

* Minor style fix. [Scott Leishman]

* Tensor comparison fix for val_init test. [Scott Leishman]

* Added numerical checks. [GabrielPereyra]

* Added identity initialization for deterministic testing (doesn't import) [GabrielPereyra]

* Added some GPU (cudanet) tests. [Scott Leishman]

* CPU unit test of fprop/bprop for FCLayer. [GabrielPereyra]

* Restored ANNOTATED_EXAMPLE. [Urs Koster]

* Fixed CPU 1D indexing inconsistency. [Scott Leishman]

  Closes MYL-221, #38758848739023

* Fix CPU backend update when batch_size 1. [Scott Leishman]

  Closes #47, MYL-260, #38750574965487

* Add posix_ipc to dev requirements. [Scott Leishman]

* Remove nvidia-smi dependency for cudanet GPU checking. [Scott Leishman]

  Closes #51

* Added housing data, simple regression network. [Scott Leishman]

  Still to tune example, closes MYL-258, closes #33

* Mark non-persistent tensor values.  Closes MYL-251. [Scott Leishman]

* Merge branch 'NvgpuCompat' into 'master' [Scott Leishman]

  Nvgpu compat

  This is a pretty minor change but makes it easier to keep up to date with changes in nervanagpu because it uses the ng tensor constructors rather than the GPUTensor constructors directly.  (Recent changes to nervanagpu have changed the way the tensors are constructed)

  See merge request !21

* Merge branch 'master' into NvgpuCompat. [Scott Leishman]

* Quick patch for RNN docs. [Scott Leishman]

* Minor fixes. [Scott Leishman]

  Remove now un-needed import, extraneous function calls during fullset
  prediction.

* Change array creation routines to use nervanagpu calls rather than instantiating GPUTensor directly. [Alex Park]

* Merge branch 'MYL261-RNN2' into 'master' [Scott Leishman]

  RNN and LSTM updates

  Fixes issue with prediction using GPU backend.  Closes #16

  - Minor cleanup to numerical gradient code, removing hardcoded element indices.
  - Mobydick dataset changed to use only the 96 printable ASCII characters and to remove line breaks from text.
  - Updated dtype handling so fp64 with CPU backend is supported. Used for numerical gradients.
  - Some additional documentation for LSTM layer.

  See merge request !20

* Update default logging level for rnn, lstm examples. [Scott Leishman]

* RNN cleanup: - added documentation to annotated_example yaml - restored functionality to generate strings from predictions - cleaned up dtype checking. [Urs Koster]

* Restored ability to run numerical gradients in fp64, in a clean way using the backend_type in the yaml. [Urs Koster]

* Fixes issue with prediction using GPU backend.     Minor cleanup to numerical gradient code, removing hardcoded element indices     Mobydick dataset changed to use only the 96 printable ASCII characters and     to remove line breaks from text. [Urs Koster]

* Merge branch 'bnormfix2' into 'master' [Urs Koster]

  Bnormfix2

  Corrects calculation of global means and variances used during batch norm inference

  - Uses exponential moving average to keep a running estimate of the global average mean and variance
  - Added some helper functions to ensure efficient computation of moving average without allocating extra space
  - Requires latest version of cuda-convnet2 to ensure correct computation for cc2 backend
  - May make things slower for nervanagpu during training due to extra overhead of computing global stats that wasn't happening before

  See merge request !18

* Correct calculation of global means and variances used during batch norm inference. [Urs Koster]

* Merge branch 'misc_fixes' into 'master' [Anil Thomas]

  Miscellaneous fixes and updates

  Collection of various small fixes including:
  * MOP updates to express value persistence across backend begin and end calls
  * Removal of extraneous backend clip calls where appropriate
  * python 3 compatibility fixes
  * Revamped metrics comparison
  * training error notation updates
  * serialization testing fixes
  * make develop target fixes

  See merge request !17

* Merge branch 'master' into misc_fixes. [Scott Leishman]

* Merge pull request #46 from itsb/master. [Scott Leishman]

  fix broken link in README

* Fix broken link in README. [itsb]

* Merge branch 'rmsprop2' into 'master' [Scott Leishman]

  Rmsprop2

  Implement RMSprop, inheriting from GradientDescentMomentum
  - Simplify calling of compounded kernels in nervanagpu for learning rules
  - Change default behavior of gradient descent with momentum if momentum params are not included to behave as if momentum_coef is 0
  - Change default settings of adadelta to use suggested values for rho and epsilon if not provided
  - Update documentation for optimizers
  - Include example of rmsprop in ANNOTATED_EXAMPLE.yaml
  - Include example of rmsprop in mnist-tuned.yaml

  closes MYL-118, #43

  See merge request !15

* Doc and style updates, plus rms_update fix for cc2. [Scott Leishman]

* Merge branch 'master' into rmsprop2. [Scott Leishman]

* Merge branch 'clients' into 'master' [Anil Thomas]

  Shared-memory based IPC mechanism - this is to support third party
  applications that want to interface with neon for live inference.

* Shared-memory based IPC mechanism. [Anil Thomas]

  This is to support third party applications that want to interface
  with neon for live inference.

* Merge branch 'notebook' into 'master' [Anil Thomas]

  iPython notebook example

  Added an iPython notebook example using neon to train an MNIST MLP and visualize results.

  See merge request !13

* Changed default backend to CPU. [Arjun Bansal]

* Added iPython notebook example. [Arjun Bansal]

* - Implement RMSprop, inheriting from GradientDescentMomentum - Simplify calling of compounded kernels in nervanagpu for learning rules - Change default behavior of gradient descent with momentum if momentum params are not included to behave as if momentum_coef is 0 - Change default settings of adadelta to use suggested values for rho and epsilon if not provided - Update documentation for optimizers - Include example of rmsprop in ANNOTATED_EXAMPLE.yaml - Include example of rmsprop in mnist-tuned.yaml. [Alex Park]

  Add init for RMSProp from WeightLayer, Inherit from GradientDescentMomentum because of common characteristics

  Update documentation, make default values for optimizer params, change default behavior of gradient descent with momentum if momentum params are not included to behave as if momentum_coef is 0

  Revert mnist-tuned back to using gradient descent momentum

* Ensure pip utilizes newest cudanet version. [Scott Leishman]

* Merge branch 'BatchNormReshape2' into 'master' [Urs Koster]

  Batch norm reshape

  - Change how reshaping is done for local layers in batch norm and shared biases.
  - Reduce chance of memory leak in nervanagpu calls by reducing creation of reshape references.
  - Use new behavior of cudanet to return reshape views rather than reshape underlying tensor

  See merge request !11

* Improved handling of tensor allocations by using views. -clean up unnecessary tensor allocations -rather than reshaping repeatedly, store a reusable reshaped view -Moved reshaping for batch norm -update Makefile dependency for cudanet to 0.2.6. [Urs Koster]

* Fix minor formatting issue in serialize check. [Scott Leishman]

* Added persist_values to tensors and their creation. [Scott Leishman]

  Closes MYL-246.  Note that no tensors have been initialized as not being
  persistent as of yet (deferring to default in all cases).

* Work around backend clip calls where possible.  Closes MYL-247. [Scott Leishman]

* Remove unused pre-commit hook. [Scott Leishman]

  Prevented make develop based install where users didn't have appropriate pep8
  version already in place.

* Python3 compatibility fixes.  Closes #35. [Scott Leishman]

* Revamped metrics comparison. [Scott Leishman]

  * Now compare across like backends only (by default).
  * Make output more tabular, and easier to see at a glance.

* Revert leading zero to colon based notation. [Scott Leishman]

* Leading zero epoch and partial mini-batch display. [Scott Leishman]

  Closes #40.

* Merge branch 'RectleakyGPU' into 'master' [Scott Leishman]

  Rectleaky gpu

  Add RectLeaky to gpu backend to address github issue #39

  See merge request !10

* Restore nervanagpu based sanity checks. [Scott Leishman]

* Add RectLeaky activation for gpu backend. [Alex Park]

* Merge branch 'SerializeSnapshots' into 'master' [Scott Leishman]

  Serialize snapshots

  Add option to yaml allowing model snapshots to be serialized on a schedule.  Snapshots will be serialized to provided `serialize_path` and the schedule can be  either:

  -  explicitly specified using a list of ints, `[N1, N2, N3, ...]`, indicating that serialization will occur after epoch `N1`, epoch `N2`, epoch `N3`, etc., or
  -  specified using an integer interval, `N`, indicating that serialization will occur every `N` epochs.

  See merge request !8

* Expanded docs. [Scott Leishman]

  Also fixed ANNOTATED_EXAMPLE

* Rewrite save_snapshot to show control flow a little more clearly. [Alex Park]

* Documentation for serialize schedule. [Alex Park]

* Implement snapshot saving according to an epoch list or epoch interval for mlp and rnn models. [Alex Park]

* Merge branch 'ZebTech-cifar100' into 'master' [Scott Leishman]

  Addition of CIFAR100 dataset

* Minor docstring format updates. [Scott Leishman]

* Fixed MNIST and CIFAR100 testing. [seba-1511]

* Merge remote-tracking branch 'neon/master' into cifar100. [seba-1511]

* Support prediction generation for RNNs and LSTMs. [Scott Leishman]

  This fixes #23.

* Merge branch 'cifar100' of https://github.com/ZebTech/neon into ZebTech-cifar100. [Scott Leishman]

* Merge branch 'Kaixhin-docker_readme' [Scott Leishman]

  Added Docker image links to install docs and README.  Fixes #24.

* Move docker image links into source install docs. [Scott Leishman]

  Provided reference links from README and quick-start page.

* Update readme with Docker images. [Kaixhin]

* Removed debugging lines. [seba-1511]

* Added CIFAR100 to the ANNOTED_EXAMPLE. [seba-1511]

* Added CIFAR100 to the documentation. [seba-1511]

* Moved coarse-labels as art of kwargs. [seba-1511]

* Fixed number of classes in docstring. [seba-1511]

* Added CIFAR100 loader. Closes issue #28. [seba-1511]

* Merge branch 'rnn-docs' into 'master' [Scott Leishman]

  Rnn docs

  Added doc-strings describing the dataset format expected for Recurrent Neural Networks (RNNs).

  See merge request !7

* Fix section headings, other typos. [Scott Leishman]

  Also fix minor doc path issue to ensure docstrings are parsed from local
  build area.

* Updated documentation for recurrent networks and datasets. [Urs Koster]

* Merge branch 'bn-compound2' into 'master' [Scott Leishman]

  Bn compound2

  Added gpu backend calls for fprop and bprop pass of Batch Normalization, which results in a 10% overall speedup on Alexnet. Also deletes minibatch provider at the end of training to free up device DDR for inference.

  See merge request !6

* Removed extraneous RNN train_mode false call. [Scott Leishman]

* Delete minibatch provider after training to save memory. [Urs Koster]

* Added gpu backend calls for fprop and bprop pass of Batch Normalization, which results in a 10% overall. [Urs Koster]

* Merge branch 'noyaml' into 'master' [Scott Leishman]

  Noyaml

  Add example code to create networks without .yaml.

  See merge request !4

* Added noyaml example to docs. [Scott Leishman]

* Merge branch 'master' into noyaml. [Scott Leishman]

* Merge branch 'IntegrationTest' into 'master' [Scott Leishman]

  Added Integration tests

  * Added integration tests based on our current example networks and backends.
    * Requires Maxwell GPU with nervanagpu and cudanet backends installed, as well as imagenet dataset.
    * New command line parameter `--integration` that cleans up YAML files to make them more easily
      reproducible.
    * Currently requires manual inspection of results relative to prior runs on the same machine to
      determine if outputs are feasible.
  * Added tolerances to the serialization tests.

  See merge request !2

* Allow CL params for outfile and logfile to integration tests. [Scott Leishman]

* Serialize Makefile cleanup. [Scott Leishman]

* Merge branch 'master' into IntegrationTest. [Scott Leishman]

* Merge pull request #20 from Kaixhin/change_cuda_check. [Scott Leishman]

  Change GPU check to CUDA SDK check. Closes issue #19

* Change GPU check to CUDA SDK check. Closes issue #19. [Kaixhin]

* Cleanup prior integration approach files. [Scott Leishman]

* Ensure integration tests run from all directories. [Scott Leishman]

* Swap branch with non-branch yaml again. [Alex Park]

* Allow reuse of example yamls for integration testing - Add integration command line option to neon executable - Adjust yaml options in integration mode - Among other things, drop serialization path, remove sources of randomization - Add is_random attribute to layers to identify drop-able layers   (Just dropout for now) - Make dotransforms have the correct behavior in Imageset - Change batch norm warnings to static to avoid proliferation of messages across many layers - Switch branch and small layers for cifar mlp since they were incorrectly labeled - Create integration test script in examples directory that uses existing metrics code - Correct i1k-alexnet-fp16 to use fp16 as backend data type. [Alex Park]

* Added checks for nervanagpu and cudanet in dev tests. [Arjun Bansal]

* Style fixes. [Arjun Bansal]

* Deleted commented layers in tests yamls. [Arjun Bansal]

* Moved test yamls to a separate subdirectory. [Arjun Bansal]

* Deleted the cpu/gpu command line parameters for serialize and integration since they aren't used right now. [Arjun Bansal]

* Added tolerances for serialization tests. [Arjun Bansal]

* Added integration tests. [Arjun Bansal]

* Add example code to create networks without .yaml. [Anil Thomas]

  It is possible to create and use networks without using a .yaml file.
  This is illustrated by examples/mlp/mnist-small-noyaml.py.

* Simplify the expression for computing output dims. [Anil Thomas]

  This is for computing dimensions of an output feature map in a
  convolutional or pooling layer.

* Let datasets have access to the experiment object. [Anil Thomas]

  Custom plug-ins may use this functionality instead of defining own
  experiment classes.

* Documentation link updates. [Scott Leishman]

* Merge pull request #13 from kellyp/master. [Scott Leishman]

  Update readme with correct using neon link

* Update readme with correct using neon link. [Kelly Plummer]

* Fix broken links in docs, remove i1K. [Scott Leishman]

* Convnet/i1k-alexnet-fp16.yaml was using float32 &amp; mb=64. fixed. [Arjun Bansal]

* Change the value of the sumWidth parameter. [Anil Thomas]

  This parameter affects the performance of the weight gradient computation
  in the cudanet backend.

* Fix computation of the number of samples. [Anil Thomas]

  This issue was causing neon to compute the number of samples
  incorrectly when &quot;predictions&quot; is specified in the .yaml file
  and the number of samples in the validation set is different
  from that in the training set.


## v0.8.1 (2015-05-04)

### Modifications

* Initial public release of neon. [Scott Leishman]