forked from NervanaSystems/neon
-
Notifications
You must be signed in to change notification settings - Fork 1
/
ChangeLog
969 lines (624 loc) · 35 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
# ChangeLog
## v2.4.0 (2017-11-27):
* Enabled pip install through pypi
* Updated MKLML to version 20171007 with performance improve of ~3X for mnist datalayer/nondatalayer and ~1.6X for DCGAN/WGAN datalayer
* Updated resnet model to optimize performance with MKLML 20171007
* Updated Alexnet weight file and fixed bug for deep dream
* Fixed faster-rcnn inference model loading issue
* Added data_loading time measurement and enabled GAN networks benchmarking
* Updated to Aeon version 1.2.0
* Enabled neon build with mklEngine on Windows systems
## v2.3.0 (2017-10-27):
* Optimized DeepSpeech2 MKL backend performance (~7X improvement over the CPU backend)
* Fused convolution and bias layer which significantly boosted AlexNet and VGG performance on Intel architectures with MKL backend
* Made SSD and Faster-RNN use VGG weight files in new format
* Fixed use of reset_cells hyperparameter
* Fixed MKL backend bug for GAN and Faster-RCNN models
## v2.2.0 (2017-09-27):
* Update MKLML version 20170908 that fixes a bug related to data conversions)
* Add SSD example for bounding box object detection that works for both GPU and MKL backend
* Add DeepSpeech2 MKL backend optimization that features ~3X improvement
* Update aeon to 1.0.0 including new version of manifest (doc/source/loading_data.rst#aeon-dataloader)
* Add CHWD Support for Batch Normalization in mkl backend
* Modify ResNet-50 model's last layer to match the original ResNet-50 model paper
* Enable Seq2Seq testing and benchmarking
## v2.1.0 (2017-08-02):
* Set MKL backend (-b mkl) as the default CPU backend on Linux (use -b cpu to specify original CPU backend)
* Update MKLML version 20170720 (AVX512 code paths enabled by default and conversion optimizations)
* Simplify ResNet example
* Makefiles now check for virtualenv and pkg-config (NervanaSystems/neon#383)
* Fix Deep Speech2 model on MKL backend
* Fix MKL installation for "make sysinstall"
## v2.0.0 (2017-06-27):
* Added support for MKL backend (-b mkl) on Linux, which boosts neon CPU performance significantly
* Added WGAN model examples for LSUN and MNIST data
* Enabled WGAN and DCGAN model examples for Python3
* Added fix (using file locking) to prevent race conditions running multiple jobs on the same machine with multiple GPUs
* Added functionality to display some information about hardware, OS and model used
* Updated appdirs to 1.4.3 to be compatibile on Centos 7.3 for appliance
## v1.9.0 (2017-05-03):
* Add support for 3D deconvolution
* Generative Adversarial Networks (GAN) implementation, and MNIST DCGAN example, following GoodFellow 2014 (http://arXiv.org/abs/1406.2661)
* Implement Wasserstein GAN cost function and make associated API changes for GAN models
* Add a new benchmarking script with per-layer timings
* Add weight clipping for GDM, RMSProp, Adagrad, Adadelta and Adam optimizers
* Make multicost an explicit choice in mnist_branch.py example
* Enable NMS kernels to work with normalized boxes and offset
* Fix missing links in api.rst [#366]
* Fix docstring for --datatype option to neon [#367]
* Fix perl shebang in maxas.py and allow for build with numpy 1.12 [#356]
* Replace os.path.join for Windows interoperability [#351]
* Update aeon to 0.2.7 to fix a seg fault on termination
## v1.8.2 (2017-02-23):
* Make the whale calls example stable and shuffle dataset before splitting into subsets
* Reduce default depth in cifar_msra example to 2
* Fix the formatting of the conv layer description
* Fix documentation error in the video-c3d example
* Support greyscale videos
## v1.8.1 (2017-01-17):
* Bug fix: Add dilation to object dict and assign defaults to dil_w = dil_h = 1 [#335, #336]
* Bug fix: Prevent GPU backend from ignoring non-zero slope in Rectlinclip and change default slope to 0
* Bug fix: Nesterov momentum was updating velocities incorrectly
## v1.8.0 (2016-12-28):
* Skip Thought Vectors (http://arxiv.org/abs/1506.06726) example
* Dilated convolution support
* Nesterov Accelerated Gradient option to SGD optimizer
* MultiMetric class to allow wrapping Metric classes
* Support for serializing and deserializing encoder-decoder models
* Allow specifying the number of time steps to evaluate during beam search
* A new community-contributed Docker image
* Improved error messages when a tensor is created with an invalid shape or reshaped to an incompatible size
* Fix bugs in MultiCost support
* Documentation fixes [#331]
## v1.7.0 (2016-11-21):
* Update Data Loader to aeon https://github.com/NervanaSystems/aeon for flexible,
multi-threaded data loading and transformations
* Add Neural Machine Translation model
* Remove Fast RCNN model (use Faster RCNN model instead)
* Remove music_genres example
* Fix super blocking for small N with 1D conv
* Fix update-direct conv kernel for small N
* Add gradient clipping to Adam optimizer
* Documentation updates and bug fixes
## v1.6.0 (2016-09-21):
* Faster RCNN model
* Sequence to Sequence container and char_rae recurrent autoencoder model
* Reshape Layer that reshapes the input [#221]
* Pip requirements in requirements.txt updated to latest versions [#289]
* Remove deprecated data loaders and update docs
* Use NEON_DATA_CACHE_DIR envvar as archive dir to store DataLoader ingested data
* Eliminate type conversion for FP16 for CUDA compute capability >= 5.2
* Use GEMV kernels for batch size 1
* Alter delta buffers for nesting of merge-broadcast layers
* Support for ncloud real-time logging
* Add fast_style Makefile target
* Fix Python 3 builds on Ubuntu 16.04
* Run setup.py for sysinstall to generate version.py [#282]
* Fix broken link in mnist docs
* Fix conv/deconv tests for CPU execution and fix i32 data type
* Fix for average pooling with batch size 1
* Change default scale_min to allow random cropping if omitted
* Fix yaml loading
* Fix bug with image resize during injest
* Update references to the ModelZoo and neon examples to their new locations
## v1.5.4 (2016-07-15):
* Implement Binarized Neural Networks from http://arxiv.org/pdf/1602.02830v3.pdf
* Bug fixes [#268]
## v1.5.3 (2016-07-07):
* Bug fixes [#267]
## v1.5.2 (2016-07-06):
* Bug fixes to audio loader
## v1.5.1 (2016-06-30):
* Bug fixes
## v1.5.0 (2016-06-29):
### Modifications
* Python2/Python3 compatibility [#191]
* Support for Pascal GPUs
* Persistent RNN kernels [#262]
* Dataloader enhancements (audio loader with examples)
* HDF5 file data iterator
* Convolution kernel improvements
* Winograd kernel for fprop/bprop and 5x5 stride 1 filters
* API documentation improvements [#234, #244, #263]
* Cache directory cleanup
* Reorganization of all unit tests
* Check for compatible shapes before doing a memcpy [#182, #183]
* Bug fixes [#231, #241, #253, #257, #259]
## v1.4.0 (2016-04-29):
### Modifications
* VGG16 based Fast R-CNN model using winograd kernels
* new, backward compatible, generic data loader
* C3D video loader model trained on UCF101 dataset
* Deep Dream example
* make conv layer printout more informative [#222]
* fix some examples to use new arg override capability
* improve performance for relu for small N
* better support for arbitrary batch norm layer placement
* documentation updates [#210, #213, #236]
## v1.3.0 (2016-03-03):
### Modifications
* winograd kernels and associated autotuning routines
* benchmarking scripts
* deprecation of deterministic argument for backend constructor
* improve batch norm stability with fp16 backend
* allow strided support for dimshuffle kernel
* speed up zero momentum gradient descent
## v1.2.2 (2016-02-24):
### Modifications
* benchmarking enhancements
* fast dimshuffle, transpose, other kernel speedups and refactoring
* batch norm states fix, deterministic updates
* example fixes for fast rcnn and conv_autoencoder
* image decoding rescaling method fix
* deserialization fixes for RNN's, refactoring
* caffe compatibility fixes
* documentation updates
## v1.2.1 (2016-02-05):
### Modifications
* New MergeSum, Colornoise layers
* support for aspect_ratio scaling augmentation
* updated IMDB sentiment analysis example
* generic CSV batchwriter
* various build and deserialization bugfixes, doc updates
## v1.2.0 (2016-01-29):
### Modifications
* kepler GPU kernel support [#80]
* new dataloader format, updated docs [#115, #170]
* new serialization format
* FastRCNN implementation, ROI pooling support [#135]
* deep residual nets implementation and example
* expanded model zoo
* Ticker dataset and copy, repeat copy tasks
* autodiff transpose support [#173]
* numerous bug fixes and documentation updates.
## v1.1.5 (2016-01-13):
### Modifications
* CUDA kernels for lookuptable layer (up to 4x speedup)
* support for determinstic Conv layer updatesa
* LRN layer support
* custom dataset walkthrough utilizing bAbI data
* reduced number of threads in deep reduction EW kernels [#171]
* additional (de)serialization routines [#106]
* CPU tensor slicing fix
* corrections for PrecisionRecall, MultiLabelStats [#148]
* explicitly specify python2.7 for virtualenv [#155]
* default to SM50 when no working GPU found [#186]
* Add alpha to ELU activation [#164]
* deconv callback fix [#162]
* various documentation updates [#151, #152]
## v1.1.4 (2015-12-14):
### Modifications
* Add support for bidirectional RNNs and LSTMs
* added ELU, leaky ReLU activations
* significantly faster GPU kernel builds (using ptx instead of cuda-c)
* data shuffling enhancements, removal of old data loader code.
* caffe conv, pool, dropout layer matching and compatibility flags
* add scheduling support for RMSProp
* callback enhancements, additional unit tests
* documentation auditing, added links to introductory video tutorials
## v1.1.3 (2015-12-01):
### Modifications
* deconvolution and weight histogram visualization examples and documentation
* CPU convolution and pooling layer speedups (~2x faster)
* bAbI question and answer interactive demo, dataset support.
* various ImageLoader enhancements.
* interactive usage improvements (shortcut Callback import, multiple Callbacks
init, doc updates, single item batch size support)
* set default verbosity level to warning
* CIFAR10 example normalization updates
* CUDA detection enhancements [#132]
* only parse batch_writer arguments when used as a script, allow undefined
global_mean [#137, #140]
## v1.1.2 (2015-11-17):
### Modifications
* completely re-written C++ multithreaded dataloader
* new weight initialization options for recurrent layers
* Added deconvolution visualization support (guided backprop)
* new bAbI question answering example network
* Improved performance of cifar10_allcnn, word_lstm examples
* new CUDA-C max and avg pooling kernels
* Additional bugfixes and documentation updates
## v1.1.1 (2015-11-06):
### Modifications
* Callback initialization bug fix [#127]
* IMDB LSTM example bug fix [#130]
* Added cuda-convnet2 style binary dropout variant
* Added benchmark function to model (separate fprop, bprop, update timings)
* Remove h_buffer references in lieu of outputs for recurrent layers
* Multi-cost output buffer bugfix for inference [#131]
* New timeseries prediction and generation example
* Change Callback initialization to re-support named arguments. Separate out
these arguments in argparser. [#128]
## v1.1.0 (2015-10-30):
### Modifications
* Sentiment analysis support (LSTM lookupTable based), new IMDB example
* Support for merge and branch layer stacks via LayerContainers
* Sequential, Tree, MergeBroadcast, MergeMultiStream
* Support for freezing layer stacks
* Adagrad optimizer support
* new GPU kernels for fast compounding batch norm, conv and pooling engine
updates, new kernel build system and flags.
* Modifications for Caffe support
* conv, pooling, P/Q updates, dropout layer normalization more in-line with
Caffe approach. NOTE: this breaks backwards compatibility with some
strided conv/pool related models serialized using older versions of neon
as the output sizes may now be different. See the FAQ for more info.
* serialization enhancements to make caffe model import/export easier
* use per-channel mean subtraction instead of single global. NOTE: this
breaks backwards compatibility with ImgMaster saved datasets prior to this
revision. To correct, please use the included `update_dataset_cache.py`
script in the util directory.
* Default training cost display during progress bar is now calculated on a
rolling window basis rather than from the beginning of each epoch
* Separate Layer configuration and initialization steps
* YAML based alexnet example
* Callback enhancements.
* now pass args instead of having to spell out callbacks in each example
* Changed validation callback to loss callback, validation_frequency now
evaluation_frequency
* Generic metric callback.
* Various bug fixes
* non-contiguous array get for GPUTensors
* 1D slicing returns 2D matrices
* bin/neon serialization fixes for RNNs
* 3D conv fixes for fprop, bprop
* batch norm inference fix
* bias layer size fix
* Documentation updates and improvements
## v1.0.0 (2015-09-15):
### Modifications
Primarily bug fixes:
* Ensure root logging handler setup [#82]
* C++ utility for CUDA compatibility checking [#83]
* Add predict function to models [#86]
* Fix bug in learning rate schedule impacting deserialization
* Speed up batch norm computation
* Average gradients in OpTree, fix tests
* Use inference mode for fprop during validation
* Add top-k misclassifcation metric
* Simplify maxas install, make vis requirements optional, doc updates.
## v1.0.0rc1 (2015-09-08):
### Modifications
* RNN/LSTM
* Code is cleaner and achieves state of the art results on the Penn Tree Bank
dataset using RNN/LSTM/GRU
* Fast image captioning model (~200x faster than CPU based NeuralTalk) on
flickr8k dataset
* Basic automatic differentiation support
* Framework for visualizations (supported via callbacks)
* Top-down refactoring & redesign to enable quicker iteration while keeping the
speedups offered by our nervanagpu kernels
* Datasets are easier to specify
* Backend now uses OpTrees (similar to nervanagpu) to support autodiff
* nervanagpu merged in as a neon backend to simplify development and use
* YAML syntax is simplified (but not backwards compatible)
* Better documentation and wider test coverage
The following features will be added in upcoming releases:
* Advanced automatic differentiation & computational graph support
* Support for Kepler and older generation GPUs
* Multi-GPU support & hyperparameter optimization
This release was made possible thanks to the heroic efforts of the following
contributors:
* Yinyin Liu
* Yixing Lao
* Alex Park
* Evren Tumer
* Gabriel Pereyra
* JD Co-Reyes
* Will Constable
* Scott Leishman
* Angel Zhang
* Hunter Lang
* Arjun Bansal
* Anil Thomas
* Augustus Odena
* Urs Koster
* Scott Gray
* Jenkins
## v0.9.0 (2015-07-20)
### Modifications
* Version bump for the v0.9.0 release. [Scott Leishman]
* Merge branch 'MGPUMerge' into 'master' [Scott Leishman]
Mgpu merge
Cleanup on top of Multi-GPU branch, addresses small bugs and makes some readability improvements.
See merge request !31
* Merge branch 'master' of gitlab.localdomain:algorithms/neon into MGPUMerge. [Scott Leishman]
Incorporate validation_pct changes.
* Merge branch 'ZebTech-val_split'. [Scott Leishman]
Adds ability to partition training set cases into validation set.
Closes #58
* Add more documentation, remove duplicated test. [Scott Leishman]
* Implemented validation set splitting. [seba-1511]
* Set default device list in gen_backend. [Alex Park]
* Ignore mgpu tensor tests that require more than 1 device when only 1 device is present. [Alex Park]
* Remove references to DIST, MGPU tests with < 2 devices. [Scott Leishman]
* Small fixes and code cleanup for MultiGPU. [Urs Koster]
* Implement multigpu processing using weird trick (data parallel for local layers, model parallel for fully-connected layers) Remove compatibility with cudanet backend for imageset based models Remove mpi-based parallelism Remove dependency on NoPar structures Fix check_grad to initialize with backend Remove need to link and initialize layers separately Add documentation for multiple gpu usage, device id specification. [Alex Park]
## v0.8.2 (2015-07-08)
### Modifications
* Version bump for the v0.8.2 release. [Scott Leishman]
* Merge branch 'more_misc_fixes' into 'master' [Urs Koster]
Various bug fixes
Collection of fixes to address issues with 1D CPU tensor handling, Leaky ReLU backprop, better GPU / CUDA checking, liveness indication for Tensor values, and a new dataset to highlight building regression models.
See merge request !27
* Merge branch 'master' into more_misc_fixes. [Scott Leishman]
* Merge branch 'evren/refactor_tox' into 'master' [Scott Leishman]
Evren/refactor tox
The jenkins job for neon is using to to run tests with python 2.7 and 3.4 but the xUnit output from nosetests is getting overwritten since nosetests tries to write to the same file for both tests (2.7 and 3.4). This fix puts makes the two files have different names.
Instead of changing Makefile, I put the fix in tox.ini.
Scott, I thought you would be best to look at this.
See merge request !28
* Try another way without hardwiring test attr filters. [Evren Tumer]
* Changed tox.ini to stop py34 test from overwriting py27 nosetests.xml file. [Evren Tumer]
* Make octal numbers python2 and python3 compliant. [Scott Leishman]
* Use python3 compatible pickle. [Scott Leishman]
* Initial inroads into flatfile dataset. [Scott Leishman]
Used for streaming (live) inference currently. Other usecases do not yet work.
* Port of multi learning rule param serialization fix. [Scott Leishman]
See #40162164201505
* Add missing rectlin_derivative from NervanaGPU backend. [Scott Leishman]
* Fix bug in RectLeaky bprop_func. [Scott Leishman]
Added unit tests, closes #39973152922213
* Merge branch 'master' into more_misc_fixes. [Scott Leishman]
* Merge branch 'evren/myl-250/serialize-error' into 'master' [Scott Leishman]
Evren/myl 250/serialize error
Generated new test for serialization.
Added feature for retaining a subset of the checkpoint files.
Added test for checkpoint files.
See merge request !22
* Python3 compatible print. [Scott Leishman]
* Remove no longer relevant comment. [Scott Leishman]
* Fix small style issue. [Evren Tumer]
* Merge branch 'evren/myl-250/serialize-error' of http://gitlab.localdomain/algorithms/neon into evren/myl-250/serialize-error. [Evren Tumer]
rebased on master
* Added some docs for new checkpoint saving feature, added skip test for broken serialization tests. [Evren Tumer]
* Have make serialize run test_serialize.py. [Scott Leishman]
* Fixed style errors, remove i1k because it is freezing the test, added 'slow' label to the serialize test. [Evren Tumer]
* Exclude serialization test from make test. [Evren Tumer]
* Added checkpoint test. [Evren Tumer]
* Added test generator, added code to keep some checkpoint files, added comments. [Evren Tumer]
* Clean up hack. [Evren Tumer]
* Added cpu->gpu handoff and gpu->cpu handoff tests. [Evren Tumer]
* Fixed issue with python not expanding ~ in paths. [Evren Tumer]
* Added yml and fixed name bug. [Evren Tumer]
* Initial code for new serialization test. [Evren Tumer]
* Added some docs for new checkpoint saving feature, added skip test for broken serialization tests. [Evren Tumer]
* Have make serialize run test_serialize.py. [Scott Leishman]
* Fixed style errors, remove i1k because it is freezing the test, added 'slow' label to the serialize test. [Evren Tumer]
* Exclude serialization test from make test. [Evren Tumer]
* Added checkpoint test. [Evren Tumer]
* Added test generator, added code to keep some checkpoint files, added comments. [Evren Tumer]
* Clean up hack. [Evren Tumer]
* Added cpu->gpu handoff and gpu->cpu handoff tests. [Evren Tumer]
* Fixed issue with python not expanding ~ in paths. [Evren Tumer]
* Added yml and fixed name bug. [Evren Tumer]
* Initial code for new serialization test. [Evren Tumer]
* Merge branch 'fully_connected_layer_unit_tests' into 'master' [Scott Leishman]
Fully connected layer unit tests
CPU unittests of fprop/brop for FCLayer that check if output/delta buffers are set to the right size.
See merge request !24
* Minor style fix. [Scott Leishman]
* Tensor comparison fix for val_init test. [Scott Leishman]
* Added numerical checks. [GabrielPereyra]
* Added identity initialization for deterministic testing (doesn't import) [GabrielPereyra]
* Added some GPU (cudanet) tests. [Scott Leishman]
* CPU unit test of fprop/bprop for FCLayer. [GabrielPereyra]
* Restored ANNOTATED_EXAMPLE. [Urs Koster]
* Fixed CPU 1D indexing inconsistency. [Scott Leishman]
Closes MYL-221, #38758848739023
* Fix CPU backend update when batch_size 1. [Scott Leishman]
Closes #47, MYL-260, #38750574965487
* Add posix_ipc to dev requirements. [Scott Leishman]
* Remove nvidia-smi dependency for cudanet GPU checking. [Scott Leishman]
Closes #51
* Added housing data, simple regression network. [Scott Leishman]
Still to tune example, closes MYL-258, closes #33
* Mark non-persistent tensor values. Closes MYL-251. [Scott Leishman]
* Merge branch 'NvgpuCompat' into 'master' [Scott Leishman]
Nvgpu compat
This is a pretty minor change but makes it easier to keep up to date with changes in nervanagpu because it uses the ng tensor constructors rather than the GPUTensor constructors directly. (Recent changes to nervanagpu have changed the way the tensors are constructed)
See merge request !21
* Merge branch 'master' into NvgpuCompat. [Scott Leishman]
* Quick patch for RNN docs. [Scott Leishman]
* Minor fixes. [Scott Leishman]
Remove now un-needed import, extraneous function calls during fullset
prediction.
* Change array creation routines to use nervanagpu calls rather than instantiating GPUTensor directly. [Alex Park]
* Merge branch 'MYL261-RNN2' into 'master' [Scott Leishman]
RNN and LSTM updates
Fixes issue with prediction using GPU backend. Closes #16
- Minor cleanup to numerical gradient code, removing hardcoded element indices.
- Mobydick dataset changed to use only the 96 printable ASCII characters and to remove line breaks from text.
- Updated dtype handling so fp64 with CPU backend is supported. Used for numerical gradients.
- Some additional documentation for LSTM layer.
See merge request !20
* Update default logging level for rnn, lstm examples. [Scott Leishman]
* RNN cleanup: - added documentation to annotated_example yaml - restored functionality to generate strings from predictions - cleaned up dtype checking. [Urs Koster]
* Restored ability to run numerical gradients in fp64, in a clean way using the backend_type in the yaml. [Urs Koster]
* Fixes issue with prediction using GPU backend. Minor cleanup to numerical gradient code, removing hardcoded element indices Mobydick dataset changed to use only the 96 printable ASCII characters and to remove line breaks from text. [Urs Koster]
* Merge branch 'bnormfix2' into 'master' [Urs Koster]
Bnormfix2
Corrects calculation of global means and variances used during batch norm inference
- Uses exponential moving average to keep a running estimate of the global average mean and variance
- Added some helper functions to ensure efficient computation of moving average without allocating extra space
- Requires latest version of cuda-convnet2 to ensure correct computation for cc2 backend
- May make things slower for nervanagpu during training due to extra overhead of computing global stats that wasn't happening before
See merge request !18
* Correct calculation of global means and variances used during batch norm inference. [Urs Koster]
* Merge branch 'misc_fixes' into 'master' [Anil Thomas]
Miscellaneous fixes and updates
Collection of various small fixes including:
* MOP updates to express value persistence across backend begin and end calls
* Removal of extraneous backend clip calls where appropriate
* python 3 compatibility fixes
* Revamped metrics comparison
* training error notation updates
* serialization testing fixes
* make develop target fixes
See merge request !17
* Merge branch 'master' into misc_fixes. [Scott Leishman]
* Merge pull request #46 from itsb/master. [Scott Leishman]
fix broken link in README
* Fix broken link in README. [itsb]
* Merge branch 'rmsprop2' into 'master' [Scott Leishman]
Rmsprop2
Implement RMSprop, inheriting from GradientDescentMomentum
- Simplify calling of compounded kernels in nervanagpu for learning rules
- Change default behavior of gradient descent with momentum if momentum params are not included to behave as if momentum_coef is 0
- Change default settings of adadelta to use suggested values for rho and epsilon if not provided
- Update documentation for optimizers
- Include example of rmsprop in ANNOTATED_EXAMPLE.yaml
- Include example of rmsprop in mnist-tuned.yaml
closes MYL-118, #43
See merge request !15
* Doc and style updates, plus rms_update fix for cc2. [Scott Leishman]
* Merge branch 'master' into rmsprop2. [Scott Leishman]
* Merge branch 'clients' into 'master' [Anil Thomas]
Shared-memory based IPC mechanism - this is to support third party
applications that want to interface with neon for live inference.
* Shared-memory based IPC mechanism. [Anil Thomas]
This is to support third party applications that want to interface
with neon for live inference.
* Merge branch 'notebook' into 'master' [Anil Thomas]
iPython notebook example
Added an iPython notebook example using neon to train an MNIST MLP and visualize results.
See merge request !13
* Changed default backend to CPU. [Arjun Bansal]
* Added iPython notebook example. [Arjun Bansal]
* - Implement RMSprop, inheriting from GradientDescentMomentum - Simplify calling of compounded kernels in nervanagpu for learning rules - Change default behavior of gradient descent with momentum if momentum params are not included to behave as if momentum_coef is 0 - Change default settings of adadelta to use suggested values for rho and epsilon if not provided - Update documentation for optimizers - Include example of rmsprop in ANNOTATED_EXAMPLE.yaml - Include example of rmsprop in mnist-tuned.yaml. [Alex Park]
Add init for RMSProp from WeightLayer, Inherit from GradientDescentMomentum because of common characteristics
Update documentation, make default values for optimizer params, change default behavior of gradient descent with momentum if momentum params are not included to behave as if momentum_coef is 0
Revert mnist-tuned back to using gradient descent momentum
* Ensure pip utilizes newest cudanet version. [Scott Leishman]
* Merge branch 'BatchNormReshape2' into 'master' [Urs Koster]
Batch norm reshape
- Change how reshaping is done for local layers in batch norm and shared biases.
- Reduce chance of memory leak in nervanagpu calls by reducing creation of reshape references.
- Use new behavior of cudanet to return reshape views rather than reshape underlying tensor
See merge request !11
* Improved handling of tensor allocations by using views. -clean up unnecessary tensor allocations -rather than reshaping repeatedly, store a reusable reshaped view -Moved reshaping for batch norm -update Makefile dependency for cudanet to 0.2.6. [Urs Koster]
* Fix minor formatting issue in serialize check. [Scott Leishman]
* Added persist_values to tensors and their creation. [Scott Leishman]
Closes MYL-246. Note that no tensors have been initialized as not being
persistent as of yet (deferring to default in all cases).
* Work around backend clip calls where possible. Closes MYL-247. [Scott Leishman]
* Remove unused pre-commit hook. [Scott Leishman]
Prevented make develop based install where users didn't have appropriate pep8
version already in place.
* Python3 compatibility fixes. Closes #35. [Scott Leishman]
* Revamped metrics comparison. [Scott Leishman]
* Now compare across like backends only (by default).
* Make output more tabular, and easier to see at a glance.
* Revert leading zero to colon based notation. [Scott Leishman]
* Leading zero epoch and partial mini-batch display. [Scott Leishman]
Closes #40.
* Merge branch 'RectleakyGPU' into 'master' [Scott Leishman]
Rectleaky gpu
Add RectLeaky to gpu backend to address github issue #39
See merge request !10
* Restore nervanagpu based sanity checks. [Scott Leishman]
* Add RectLeaky activation for gpu backend. [Alex Park]
* Merge branch 'SerializeSnapshots' into 'master' [Scott Leishman]
Serialize snapshots
Add option to yaml allowing model snapshots to be serialized on a schedule. Snapshots will be serialized to provided `serialize_path` and the schedule can be either:
- explicitly specified using a list of ints, `[N1, N2, N3, ...]`, indicating that serialization will occur after epoch `N1`, epoch `N2`, epoch `N3`, etc., or
- specified using an integer interval, `N`, indicating that serialization will occur every `N` epochs.
See merge request !8
* Expanded docs. [Scott Leishman]
Also fixed ANNOTATED_EXAMPLE
* Rewrite save_snapshot to show control flow a little more clearly. [Alex Park]
* Documentation for serialize schedule. [Alex Park]
* Implement snapshot saving according to an epoch list or epoch interval for mlp and rnn models. [Alex Park]
* Merge branch 'ZebTech-cifar100' into 'master' [Scott Leishman]
Addition of CIFAR100 dataset
* Minor docstring format updates. [Scott Leishman]
* Fixed MNIST and CIFAR100 testing. [seba-1511]
* Merge remote-tracking branch 'neon/master' into cifar100. [seba-1511]
* Support prediction generation for RNNs and LSTMs. [Scott Leishman]
This fixes #23.
* Merge branch 'cifar100' of https://github.com/ZebTech/neon into ZebTech-cifar100. [Scott Leishman]
* Merge branch 'Kaixhin-docker_readme' [Scott Leishman]
Added Docker image links to install docs and README. Fixes #24.
* Move docker image links into source install docs. [Scott Leishman]
Provided reference links from README and quick-start page.
* Update readme with Docker images. [Kaixhin]
* Removed debugging lines. [seba-1511]
* Added CIFAR100 to the ANNOTED_EXAMPLE. [seba-1511]
* Added CIFAR100 to the documentation. [seba-1511]
* Moved coarse-labels as art of kwargs. [seba-1511]
* Fixed number of classes in docstring. [seba-1511]
* Added CIFAR100 loader. Closes issue #28. [seba-1511]
* Merge branch 'rnn-docs' into 'master' [Scott Leishman]
Rnn docs
Added doc-strings describing the dataset format expected for Recurrent Neural Networks (RNNs).
See merge request !7
* Fix section headings, other typos. [Scott Leishman]
Also fix minor doc path issue to ensure docstrings are parsed from local
build area.
* Updated documentation for recurrent networks and datasets. [Urs Koster]
* Merge branch 'bn-compound2' into 'master' [Scott Leishman]
Bn compound2
Added gpu backend calls for fprop and bprop pass of Batch Normalization, which results in a 10% overall speedup on Alexnet. Also deletes minibatch provider at the end of training to free up device DDR for inference.
See merge request !6
* Removed extraneous RNN train_mode false call. [Scott Leishman]
* Delete minibatch provider after training to save memory. [Urs Koster]
* Added gpu backend calls for fprop and bprop pass of Batch Normalization, which results in a 10% overall. [Urs Koster]
* Merge branch 'noyaml' into 'master' [Scott Leishman]
Noyaml
Add example code to create networks without .yaml.
See merge request !4
* Added noyaml example to docs. [Scott Leishman]
* Merge branch 'master' into noyaml. [Scott Leishman]
* Merge branch 'IntegrationTest' into 'master' [Scott Leishman]
Added Integration tests
* Added integration tests based on our current example networks and backends.
* Requires Maxwell GPU with nervanagpu and cudanet backends installed, as well as imagenet dataset.
* New command line parameter `--integration` that cleans up YAML files to make them more easily
reproducible.
* Currently requires manual inspection of results relative to prior runs on the same machine to
determine if outputs are feasible.
* Added tolerances to the serialization tests.
See merge request !2
* Allow CL params for outfile and logfile to integration tests. [Scott Leishman]
* Serialize Makefile cleanup. [Scott Leishman]
* Merge branch 'master' into IntegrationTest. [Scott Leishman]
* Merge pull request #20 from Kaixhin/change_cuda_check. [Scott Leishman]
Change GPU check to CUDA SDK check. Closes issue #19
* Change GPU check to CUDA SDK check. Closes issue #19. [Kaixhin]
* Cleanup prior integration approach files. [Scott Leishman]
* Ensure integration tests run from all directories. [Scott Leishman]
* Swap branch with non-branch yaml again. [Alex Park]
* Allow reuse of example yamls for integration testing - Add integration command line option to neon executable - Adjust yaml options in integration mode - Among other things, drop serialization path, remove sources of randomization - Add is_random attribute to layers to identify drop-able layers (Just dropout for now) - Make dotransforms have the correct behavior in Imageset - Change batch norm warnings to static to avoid proliferation of messages across many layers - Switch branch and small layers for cifar mlp since they were incorrectly labeled - Create integration test script in examples directory that uses existing metrics code - Correct i1k-alexnet-fp16 to use fp16 as backend data type. [Alex Park]
* Added checks for nervanagpu and cudanet in dev tests. [Arjun Bansal]
* Style fixes. [Arjun Bansal]
* Deleted commented layers in tests yamls. [Arjun Bansal]
* Moved test yamls to a separate subdirectory. [Arjun Bansal]
* Deleted the cpu/gpu command line parameters for serialize and integration since they aren't used right now. [Arjun Bansal]
* Added tolerances for serialization tests. [Arjun Bansal]
* Added integration tests. [Arjun Bansal]
* Add example code to create networks without .yaml. [Anil Thomas]
It is possible to create and use networks without using a .yaml file.
This is illustrated by examples/mlp/mnist-small-noyaml.py.
* Simplify the expression for computing output dims. [Anil Thomas]
This is for computing dimensions of an output feature map in a
convolutional or pooling layer.
* Let datasets have access to the experiment object. [Anil Thomas]
Custom plug-ins may use this functionality instead of defining own
experiment classes.
* Documentation link updates. [Scott Leishman]
* Merge pull request #13 from kellyp/master. [Scott Leishman]
Update readme with correct using neon link
* Update readme with correct using neon link. [Kelly Plummer]
* Fix broken links in docs, remove i1K. [Scott Leishman]
* Convnet/i1k-alexnet-fp16.yaml was using float32 & mb=64. fixed. [Arjun Bansal]
* Change the value of the sumWidth parameter. [Anil Thomas]
This parameter affects the performance of the weight gradient computation
in the cudanet backend.
* Fix computation of the number of samples. [Anil Thomas]
This issue was causing neon to compute the number of samples
incorrectly when "predictions" is specified in the .yaml file
and the number of samples in the validation set is different
from that in the training set.
## v0.8.1 (2015-05-04)
### Modifications
* Initial public release of neon. [Scott Leishman]