New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nadam optimizer and test for it added #2764

Merged
merged 3 commits into from Jun 11, 2016

Conversation

Projects
None yet
4 participants
@uralik
Contributor

uralik commented May 19, 2016

Hello @fchollet, this is my first ever PR sorry for any mistakes.
in my experiments I tried the combination of Adam and NAG from here, and it was better than pure Adam optimizer, today I added it to Keras optimizer module and created this PR. I hope it will be helpful for someone =)

Here is some small experiments which I did, but authors from the report did bigger experiments.

The test passed ok for Theano backend, I don't have TF installed, but if it is necessary I can install it and check.

This is my results comparison for default CIFAR cnn demo from examples folder. (Mini batch = 128, the rest is default).

SGD + NAG:
Using Theano backend.
Using gpu device 0: GeForce GTX 970 (CNMeM is disabled, cuDNN 5004)
X_train shape: (50000, 3, 32, 32)
50000 train samples
10000 test samples
Using real-time data augmentation.
Epoch 1/10
50000/50000 - 22s - loss: 1.8430 - acc: 0.3180 - val_loss: 1.4731 - val_acc: 0.4604
Epoch 2/10
50000/50000 - 19s - loss: 1.4929 - acc: 0.4531 - val_loss: 1.2950 - val_acc: 0.5366
Epoch 3/10
50000/50000 - 19s - loss: 1.3450 - acc: 0.5133 - val_loss: 1.1525 - val_acc: 0.5932
Epoch 4/10
50000/50000 - 19s - loss: 1.2316 - acc: 0.5584 - val_loss: 1.0146 - val_acc: 0.6413
Epoch 5/10
50000/50000 - 15s - loss: 1.1343 - acc: 0.5966 - val_loss: 0.9389 - val_acc: 0.6674
Epoch 6/10
50000/50000 - 12s - loss: 1.0642 - acc: 0.6212 - val_loss: 0.8569 - val_acc: 0.7000
Epoch 7/10
50000/50000 - 12s - loss: 1.0075 - acc: 0.6422 - val_loss: 0.8149 - val_acc: 0.7147
Epoch 8/10
50000/50000 - 12s - loss: 0.9612 - acc: 0.6606 - val_loss: 0.7856 - val_acc: 0.7289
Epoch 9/10
50000/50000 - 12s - loss: 0.9298 - acc: 0.6715 - val_loss: 0.7820 - val_acc: 0.7272
Epoch 10/10
50000/50000 - 12s - loss: 0.8945 - acc: 0.6837 - val_loss: 0.7276 - val_acc: 0.7493

Adam:
Using Theano backend.
Using gpu device 0: GeForce GTX 970 (CNMeM is disabled, cuDNN 5004)
X_train shape: (50000, 3, 32, 32)
50000 train samples
10000 test samples
Using real-time data augmentation.
Epoch 1/10
50000/50000 - 13s - loss: 1.7587 - acc: 0.3508 - val_loss: 1.3012 - val_acc: 0.5314
Epoch 2/10
50000/50000 - 13s - loss: 1.3630 - acc: 0.5087 - val_loss: 1.1107 - val_acc: 0.6094
Epoch 3/10
50000/50000 - 13s - loss: 1.1711 - acc: 0.5831 - val_loss: 0.9536 - val_acc: 0.6626
Epoch 4/10
50000/50000 - 13s - loss: 1.0511 - acc: 0.6275 - val_loss: 0.8502 - val_acc: 0.7025
Epoch 5/10
50000/50000 - 13s - loss: 0.9661 - acc: 0.6589 - val_loss: 0.8260 - val_acc: 0.7083
Epoch 6/10
50000/50000 - 13s - loss: 0.9174 - acc: 0.6780 - val_loss: 0.7416 - val_acc: 0.7409
Epoch 7/10
50000/50000 - 13s - loss: 0.8853 - acc: 0.6883 - val_loss: 0.7118 - val_acc: 0.7529
Epoch 8/10
50000/50000 - 13s - loss: 0.8412 - acc: 0.7022 - val_loss: 0.7058 - val_acc: 0.7527
Epoch 9/10
50000/50000 - 13s - loss: 0.8131 - acc: 0.7156 - val_loss: 0.6746 - val_acc: 0.7623
Epoch 10/10
50000/50000 - 13s - loss: 0.7941 - acc: 0.7191 - val_loss: 0.6801 - val_acc: 0.7622

NAG + Adam (this PR):
Using Theano backend.
Using gpu device 0: GeForce GTX 970 (CNMeM is disabled, cuDNN 5004)
X_train shape: (50000, 3, 32, 32)
50000 train samples
10000 test samples
Using real-time data augmentation.
Epoch 1/10
50000/50000 - 13s - loss: 1.7984 - acc: 0.3485 - val_loss: 1.2916 - val_acc: 0.5318
Epoch 2/10
50000/50000 - 13s - loss: 1.3098 - acc: 0.5289 - val_loss: 1.0351 - val_acc: 0.6300
Epoch 3/10
50000/50000 - 13s - loss: 1.1152 - acc: 0.6049 - val_loss: 0.8808 - val_acc: 0.6840
Epoch 4/10
50000/50000 - 13s - loss: 0.9938 - acc: 0.6498 - val_loss: 0.8003 - val_acc: 0.7159
Epoch 5/10
50000/50000 - 13s - loss: 0.9211 - acc: 0.6755 - val_loss: 0.7352 - val_acc: 0.7422
Epoch 6/10
50000/50000 - 14s - loss: 0.8693 - acc: 0.6952 - val_loss: 0.7321 - val_acc: 0.7469
Epoch 7/10
50000/50000 - 15s - loss: 0.8414 - acc: 0.7066 - val_loss: 0.7250 - val_acc: 0.7552
Epoch 8/10
50000/50000 - 13s - loss: 0.8122 - acc: 0.7186 - val_loss: 0.6938 - val_acc: 0.7627
Epoch 9/10
50000/50000 - 13s - loss: 0.7910 - acc: 0.7261 - val_loss: 0.6349 - val_acc: 0.7838
Epoch 10/10
50000/50000 - 15s - loss: 0.7747 - acc: 0.7319 - val_loss: 0.6148 - val_acc: 0.7902

@xingdi-eric-yuan

This comment has been minimized.

Show comment
Hide comment
@xingdi-eric-yuan

xingdi-eric-yuan May 19, 2016

Contributor

this looks nice

Contributor

xingdi-eric-yuan commented May 19, 2016

this looks nice

@the-moliver

This comment has been minimized.

Show comment
Hide comment
@the-moliver

the-moliver May 20, 2016

Contributor

Nice work! I would suggest you expose the three constants you have hardcoded in these lines so they can be modified by the user:

schedule_decay = 0.004  # Exactly given in [1] and [2]
momentum_cache_t = self.beta_1 * (1. - 0.5 * (K.pow(0.96, t * schedule_decay)))
momentum_cache_t_1 = self.beta_1 * (1. - 0.5 * (K.pow(0.96, (t + 1) * schedule_decay)))
Contributor

the-moliver commented May 20, 2016

Nice work! I would suggest you expose the three constants you have hardcoded in these lines so they can be modified by the user:

schedule_decay = 0.004  # Exactly given in [1] and [2]
momentum_cache_t = self.beta_1 * (1. - 0.5 * (K.pow(0.96, t * schedule_decay)))
momentum_cache_t_1 = self.beta_1 * (1. - 0.5 * (K.pow(0.96, (t + 1) * schedule_decay)))
@uralik

This comment has been minimized.

Show comment
Hide comment
@uralik

uralik May 21, 2016

Contributor

@the-moliver thanks! I also thought about that and when I did more experiments some time ago not with keras, I tried to jiggle this values without any improvements even more, usually I got worse performance.

This constants came from here, page 4, equation 5, authors said that this constants came from earlier Nesterov papers (one of them is from 1983) and he motivated these values also by some deeper research.

If @fchollet ask to expose this variables too, then I will do that. But I think the best is to give meaningful comment in documentation about these values. (I actually will do that in doc-string I guess).

Contributor

uralik commented May 21, 2016

@the-moliver thanks! I also thought about that and when I did more experiments some time ago not with keras, I tried to jiggle this values without any improvements even more, usually I got worse performance.

This constants came from here, page 4, equation 5, authors said that this constants came from earlier Nesterov papers (one of them is from 1983) and he motivated these values also by some deeper research.

If @fchollet ask to expose this variables too, then I will do that. But I think the best is to give meaningful comment in documentation about these values. (I actually will do that in doc-string I guess).

@fchollet

This comment has been minimized.

Show comment
Hide comment
@fchollet

fchollet May 21, 2016

Collaborator

In general we won't merge into Keras algorithms that aren't widely accepted or haven't been covered in a peer-reviewed paper. At the same time, we try to stay on top of things and incorporate the latest advances --as soon as we are confident in their viability.

Collaborator

fchollet commented May 21, 2016

In general we won't merge into Keras algorithms that aren't widely accepted or haven't been covered in a peer-reviewed paper. At the same time, we try to stay on top of things and incorporate the latest advances --as soon as we are confident in their viability.

@fchollet

This comment has been minimized.

Show comment
Hide comment
@fchollet

fchollet Jun 11, 2016

Collaborator

After external feedback, it seems like this is pretty cool and we'll merge it. Thanks!

Collaborator

fchollet commented Jun 11, 2016

After external feedback, it seems like this is pretty cool and we'll merge it. Thanks!

@fchollet fchollet merged commit c4c2d8b into keras-team:master Jun 11, 2016

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

bryan-lunt added a commit to bryan-lunt/keras that referenced this pull request Jul 6, 2016

Merge original repo into personal repo. (#1)
* Fix generators methods when passing data as dicts

* Callback style fix

* Fix callback issue with Sequential model

* Allow 'tf' ordering in ImageDataGenerator (#2291)

* Update preprocessing/image documentation

* Fix validation_split

* Fix siamese example

* Fix "trainable" argument

* Expose max_q_size and other generator_queue args (#2300)

* [#2287] expose generator_queue args

* [#2287] only expose max_q_size

* Added learning phase to callbacks (#2297) (#2303)

* added learning phase to callbacks (#2297)

* cleaned imports

* replaced tabs by spaces

* added case where uses_learning_phase is False

* fixed pep8 blank line bug

* Fix PEP8

* Fix Graph generator methods

* Fix case where output_shape in Merge is tuple

* Add set_learning_phase in TF backend.

* Max Over Time in imdb_cnn.py (#2320)

* Max Over Time in imdb_cnn.py

Following this issue keras-team#2296 i propose this PR.
The mayor optimisation a part of the Max over time are:

- Dropout in the Embedding layer.
- Longer input sequences (400 instead of 100), made possible from the speedup of the Max Over Time.
- Adam optimizer.

Overall it takes 90 to 100 sec per epoch on my laptop CPU and in two epochs it reaches 0.885 accuracy that is a 5 points improvement over the previous implementation. Moreover it requires less memory (300k parameters vs 3M+) since the number of parameters do not depend  by the length of the input sequence anymore.

* Update imdb_cnn.py

* Fix test_image unit test

* Style fixes in preprocessing/image

* Shape inference fix for Embedding

* Change error message in standardize_input_data (#2338)

* Fix typo in docs. loss_weight should be loss_weights (#2343)

* Fix support for custom metrics functions (#2351)

* Update model.md (#2348)

* Add TF/TH kernel conversion util

* Add batch_set_value for faster TF weight loading

* Fix Dropout in RNNs

* fixed TensorBoard callback (#2363)

* 1.0.1 release

* Update topology.py (#2373)

* Fix stateful unrolled RNNs in Theano

* Fix wrapper learning phase

* Add reset function to ImageDataGenerator

* Add inception v3 example

* Fixed typo. (#2401)

* set input_length before reshape (#2410)

* Update imagedatagenerator

* add `eye` to backened (#2407)

* Fix loss compatibility validation

* Make merge work with pure TF/TH tensors

* Add scikit_learn wrapper example (#2388)

* Add scikit_learn wrapper example

* Extract and evaluate best model in examples/mnist_sklearn_wrapper.py

* adding built check inside TimeDistributed (#2426)

* Add additional input data validation check

* Fix Travis concurrent directory creation issue

* DOC: models should be compiled upon loading (#2428)

* fixing the constants thing in theano rnn (#2429)

* fix layer/node topo sort problem (#2433)

* fix layer/node topo sort problem

* fix to only iterate over valid layer/node keys

* clarified usage of sparse_categorical_crossentropy (#2450)

- addressess #2444

* Update merge tests

* allows python3.5 to build alongside < 3.5 (#2457)

* correct inception_v3 network (#2472)

* fix accuracy with sparse_categorical_crossentropy (#2471)

* fix a benign but wrong range number in GRU's get_constants (#2475)

* fixed Merge Layer functional API (#2460)

* fixed Merge Layer functional API

* moved test to layers/test_core

* add weights for SGD optimizer (#2478)

* Fix PEP8

* Update antirectifier.py (#2485)

* Remove outdated comment

* Update regularizer tests

* Add new metrics and metrics tests

* Add model_from_config in models.py

* Add cos and sin to backend (#2493)

* Fix build

* Fixed minor typo in getting-started/sequential-model-guide (#2499)

* Added simple support for returning a multitarget loss

* Fix plot with show_shapes and multiple inputs/outputs. (#2421)

* Fix PEP8

* Improve TF session & variable management

* Fix typo in README

* Add root imports

* Add TF graph management warning

* adding a disable_b boolean to Dense (#2512)

* adding a disable_b boolean to Dense

* changing 'disable_b' to 'bias' 

Changing the name of the boolean & flipping its behavior so that the default is True and when set to False the bias is not used.

* integrating bias flag fully

changed the bias flag to affect the creation of the self.b variable as well as the output calculation

* fixing a blank line to appease pep8

* Rewriting image augmenter   (#2446)

* Much better image data augmentor

* removed unnecessary functions

* shift origin to centre of the image for homographies

* init commit

* change to zoom_range

* Added scikit-image to extras_require in setup.py

* add zoom_range test, exception for invalid zoom_range

* add scikit-image to dependency

* fix fit and retain old functions for unit test

* use ndi insteadskimage in random_transform

* removed buggy code in random_rotations, shears etc  and replaced it with todos.

* remove sci-image, implement ndimage based methods, refactor random_transform

* random_zoom, array_to_img consider dim_ordering

* add random_channel_shift, support fill_mode and cval

* image doc, update test_image, PEP8

* fix channel shift clip

* fix doc, refine code

* detail explain of zoom range

* check coding style

* Style fixes in preprocessing/image

* Fix docstring

* Style touch-ups

* Fix test_image path non-exist error in ci-travis (#2531)

* correct inception_v3 network

* store test images in class attribute

* PEP8

* Minor UX fix

* Re-raise exceptions to preserve stack trace (#2350)

* Prepare 1.0.2 PyPI release

* Make bias optional everywhere

* Improved docs of ImageDataGenerator (#2565)

* Misc fixes

* "total_loss" -> "loss"

* Added softsign activation function (#2097)

* fix activity regularizer so it can deal with multiple inbound nodes as well (#2573)

* Add doc page about writing custom layers.

* updated for list check bug in predict/predict_on_batch (#2585)

* updated for list check bug in predict/predict_on_batch

* pep fix

I think that's going to be the only pep complain..

* Fix typo in documentation

* one line fix for TensorBoard callback issue (#2574)

* one line fix for TensorBoard callback issue

Ref: keras-team#2570

* handle SummaryWriter based on tensorflow version

code contributed by @bnaul

bnaul@e04ce5e
88286d

* Fix typos in layer writing guide

* Improve optimizer configuration

* Add `batch_get_value` to backends (#2615)

* Add function to get multiple values at once

* Change to match existing batch_set_value

* Fix typo

* Allow use of predict without compilation

* Faster LSTM (#2523)

* Faster LSTM

* PEP8

* RNN dropout fix

* PEP

* PEP

* Less code duplication

* LSTM benchmark example

* PEP

* Test implementation modes

* Go through Keras backend

* Style fixes

* fix soft sign deprecation warning (#2623)

and backward compatible

* fixed docs for `Sequential.get_config`, and added a more helpful (#2635)

exception to `model_from_config`.

* remove unused import statement in keras dir (#2638)

* remove unused import statement in keras dir

* rewrite import graph statement

* Revert "remove unused import statement in keras dir" (#2641)

* Faster GRU (#2633)

* add a simple named entity recognition example

add a simple named entity recognition example

* add fast version of GRU

add fast version of GRU

* remove useless stuff

* Revert "Revert "remove unused import statement in keras dir"" (#2647)

* Fix initialization of index_array (#2590)

index_array should be initialized when self.batch_index is zero.

* Fix weight saving issue

* Style touch-up

* fixed shape typo (#2679)

* fixed shape typo

* pep8

* functional API intermediate output doc in faq (#2682)

* Residual connection should have the same dimension in case of no projection matrix (#2688)

* Update documentation docstring Embedding (#2693)

From the documentation it is not entirely clear that if mask_zero is set
to True, the input_dim argument should be equal to the size of the
vocabulary + 2, as index 0 cannot be used anymore.

(This behaviour seems a bit strange, as it has as a consequence that the
first column of the weights of the embeddings will never be used or
updated. The resulting network thus has a redundant set of parameters).

* Fix shape inference issue with TF.resize_images

* Update RMSprop, Adagrad, Adadelta

* Fix flaky test

* Normalize layer imports in examples

* Update RMSprop

* Update the reference of Batch Normalization (#2700)

We should refer the paper accepted in ICML 2015, instead of arXiv.

* Fix common LaTeX encoding issue

* Remove references to "join" merge mode

* Add K.tile test

* Add VAE example

* Prepare 1.0.3 release

* Input: proper error message for missing "shape" argument (#2727)

* Fix zero division in merge mode='cos' (#2725)

* fix cos zero division

* use backend epsilon

* save keras version & compile args when serializing models (#2690)

* save keras version & compile args when serializing models

* renamed prepare_config -> _updated_config + cleaner implementation

* rename z_log_sigma to z_log_std to match z_mean (which is not z_mu) (#2729)

* Update bibtex entry

* Fix TB callback with non-standard TF version nums

* Add download error suggestion for babi_rnn.py and babi_memnn.py. (#2752)

* changeable print_summary (#2761)

* use changeable print_summary

* minor

* Correction to fan_out initializaiton (#2252)

* account for receptive field size in fan_out

* added test for conv layer initializations

* removed old reference to kernel_size

* Fixed typo (#2770)

Fixed the year from "7 Apr 201" to "7 Apr 2015".

* Add FAQ entry about layer freezing

* Fix ActivityReg layer

* Fix first axis dim validation in multi-input model

* Clarify error message

* Fix serialization issue with nested Sequential

* Simplify imports in README

* correctly serialize loss function (#2806)

* Change way node depth is computed for shared layer

* Add stateless batchnorm mode

* Default values corrected for featurewise_std_normalization and featurewise_center (#2831)

For ImageDataGenerator, False is the default value for for featurewise_std_normalization and featurewise_center.

* Small changes in mask caching

* BN only uses learning phase in mode 0

* added required import line (#2839)

* s/TimeDistributedDense/TimeDistribute(Dense(.../g (#2843)

* Fix typo in doc

* Fix json serialization in merge layer (#2854)

Fix #2818

* Make Merge output_shape consistent with lambda

* Fix JSON deserialization issue

* fix typo (#2881)

* fix typo

* Update scikit-learn-api.md

* Fix YAML serialization when using Regularizers (#2883)

Fix #2871

* Added objective: Kullback Leibler Divergence (#2872)

* Added objective: Kullback Leibler Divergence

* KLD: Clip at 1

* fix bug: change seed range for RandomStreams in Theano (#2865)

* bug fixed, numpy randint only output positive numbers ranging from 1 to 10e6

* Update theano_backend.py

changed style and numpy randint range

* Update theano_backend.py

removed extra spaces

* limit progress bar update rate (#2860)

* limit progress bar update rate

Limit progress bar update rate in verbose=1 mode. This patch allows to
reduce terminal I/O throughput while keeping reasonable high visual
update rate (defaults to 100 refreshes per second). It helps greatly
when working with large but simple data sets with small batches, which
leads to millions of relatively useless screen updates per second. Also
it helps to keep network traffic at reasonable rates, which
exceptionally useful within laggy networking conditions when using
keras over telnet/ssh, and improve web browser responsibility when
using keras within Jupyter Notebook.

* add docstrings for 'interval' and 'force' arguments

* fixed formatting error in the docstring (#2797)

* fixed formatting error in the docstring

* fixed formatting error in TimeDistributedDense of core.py

* Make dim_ordering a global default

* Remove bit of deprecated code

* MaxoutDense no activation; incorrect docs (#2895)

Since MaxoutDense does not have activation it might be misleading to include "activation" as one of the arguments in the function docs.

* Tiny fixes in Sequential methods

* Refactor ImageDataGenerator, add directory support

* Improve docstring in preprocessing/image

* Update image preprocessing docs

* Fix some py3 generator issue

* Allow absence of labels in flow()

* Allow no layer names in plot()

* Fix PEP8 BS

* Docs adjustment

* Prepare 1.0.4 PyPI release

* Cleanup docs autogen script

* Fix typos in image preprocessing docs (#2906)

* Spellcheck source files (#2907)

* Fix predict_proba method of KerasClassifier to return probabilites for both classes in case of binary classification.  issue:2864 (#2924)

* Fix typo in docs

* fix 2852 (#2927)

* Add mode=2 option to the docstring in BatchNormalization (#2919)

Fix a tiny typo.

* Fix description about parameter `output_shape` for function `merge` (#2933)

* Make DirectoryIterator case insensitive (#2932)

* make DirectoryIterator case insensitive

* Also need to make filename case insensitive while appending it into self.filenames

* fix bug: rename duplicated loss name (#2842)

* rename duplicated loss name

* make python3 happy

* rewritten code to make it easy to read

* Small style fixes

* Eigenvalue Decay regularization (#2846)

* Update regularizers.py

I included a new regularizer named Eigenvalue Decay to the deep learning practitioner that aims at maximum-margin learning. This version approximates the dominant eigenvalue by a soft function given by the power method. For details, see:
Oswaldo Ludwig. "Deep learning with Eigenvalue Decay regularizer." ArXiv eprint arXiv:1604.06985 [cs.LG], (2016). https://www.researchgate.net/publication/301648136_Deep_Learning_with_Eigenvalue_Decay_Regularizer

The syntax for Eigenvalue Decay is similar to the other Keras weight regularizers, e.g.:

 model.add(Dense(100, W_regularizer=EigenvalueRegularizer(0.0005)))

* Example with Eigenvalue Decay regularization.

An example from Keras including regularization with Eigenvalue Decay. After training, you have to save the trained weights, create/compile a similar model without Eingenvalue Decay and save this model. Then, you can use your trained weights with this model, see lines 123-153 of  	CIFAR10_with_Eigenvalue_Decay.py (This is still an open issue).
This example yields a gain in the accuracy by the use of Eigenvalue Decay of 2.71% (averaged over 10 runs).

* Update CIFAR10_with_Eigenvalue_Decay.py

* Update CIFAR10_with_Eigenvalue_Decay.py

* Update CIFAR10_with_Eigenvalue_Decay.py

* Update regularizers.py

* Update regularizers.py

* Delete CIFAR10_with_Eigenvalue_Decay.py

* Update test_regularizers.py

* Update regularizers.py

* Update test_regularizers.py

* Update regularizers.py

* Update regularizers.py

I needed another reading in Keras backend...

* Issue to get shape of a tensor.

Issue to get shape of a tensor in the class EigenvalueRegularizer: the type returned for shape is different for Theano backend (Theano tensor type) and TF backend (TF TensorShape).

* Update regularizers.py

* Update regularizers.py

* Update regularizers.py

* Update regularizers.py

* Update regularizers.py

* Update regularizers.py

* Update regularizers.py

* Fix 1D convolution layers under Theano backend (#2938)

This issue is due to an unexpected loss of dimensionality when
composing the backend tensor operations "reshape" and "squeeze"
when there are dimensions of length 1.

For example, using a Theano backend the following fails with a
complaint about dimension mismatch:

UpSampling1D(2)(MaxPooling1D(2)(Reshape((2,1))(Input(shape=(2,)))))

The issue arises due to the conflict of two behaviors specific
to the Theano backend:

-   Reshape uses Theano's reshape function. Theano's reshape
    automatically makes dimensions with length 1 "broadcastable"

-   MaxPooling1D's implementation class _Pooling1D has a call method
    which uses a dummy dimension which it has to remove. The manner
    in which this dummy method is removed it to call "squeeze(x, axis)"
    from the backend. The squeeze implementation tells Theano to make
    the dummy dimension broadcastable, and then calls Theano's "squeeze",
    which removes ALL the broadcastable dimensions; not just the dummy
    dimension, but also the length 1 dimension flagged as broadcastable
    by reshape. This causes the problem observed above. This behavior
    is distinct from the behavior of the TensorFlow backend, which
    removes only the requested dimension.

This PR addresses this issue in two ways:

First, it introduces a test which checks the composition of "reshape"
and "squeeze" to make sure we get the same result using both Theano
and TensorFlow backends.

Second, it changes the implementation of squeeze(x,axis) so that the
Theano backend should behave similarly to the TensorFlow backend. With
this change the introduced test passes and the above example works.

* Update visualization.md (#2942)

* Update visualization.md

Added show_layer_names argument and its default value to docs

* Update visualization.md

* Convolution1D: apply activation after reshape

* Nadam optimizer and test for it added (#2764)

* Nadam optimizer and test for it added

* pep8 fix

* add comment in docstring and one more pep8 fix

* Nadam optimizer style fixes

* Fix issue with Sequential deserialization

* Fix initial variable in Evaluator. (#2955)

* Resolve #2960 (#2961)

* Resolve #2960

Introduce `K.var` so that the standard deviation computation can
be made numerically stable. Instead of

	K.std(x)

the user is able to write

	K.sqrt(K.var(x) + self.epsilon)

avoiding a division by zero in the gradient computation of `sqrt`.

* Fix typos

* Fix issue with cascade of Merge layers

* Fix tf-idf (#2980)

Fix #2974

* Clarify use of two-branch models

* Allow arbitrary output shapes for custom losses

* Fix get_word_index (#2981)

* Fix tf-idf again (#2986)

Fix 53aaa84
Fix #2974

* Fix TF-IDF in Python 2 (#2992)

Fix #2974

* Fix typo in docs

* fix wrong calls of __init__ in callbacks (#2999)

* Fix json serialization in merge layer with lamda output shape (#3011)

Fix #3008

* Fix json serialization in Lambda layer (#3012)

Fix #2582
Fix #3001

* Fix typo in training (#3014)

* Allow re-use of EarlyStopping callback objects. (#3000)

An EarlyStopping callback object has internal state variables to tell it
when it has reached its stopping point.  These were initialized in __init__(),
so attempting to re-use the same object resulted in immediate stopping. This
prevents (for example) performing early stopping during cross-validation with
the scikit-learn wrapper.

This patch initializes the variables in on_train_begin(), so they are re-set
for each training fold.  Tests included.

* doc: fix example for recurrent layer (#3022)

* Avoid double key lookup on callback.py (#3018)

On method on_epoch_end, to add new keys to the history dict, first it is
verified if a key is not on the history dict and if that is the case, a new key
is created on the history dict with an empty list as value.

However, this operation search for a key twice in the dict. This same behavior
can be achieved in a single step using dict setdefault method.

* Add comment for a note of caution (#3024)

* Moved epoch_logs = {} before batch loop to avoid UnboundLocalError. (#3019)

* fix: Sort subdirs before mapping them to classes. (#3052)

The documentation says that [1]: 

> If [classes are] not provided, the list of classes will be automatically inferred (and the order of the classes, which will map to the label indices, will be alphanumeric).

However, the code was adding classes in the order `os.listdir` returned them. This commit alphanumerically sorts the sub-directories before mapping them to label indices.

[1] http://keras.io/preprocessing/image/

* Support for masking in merged layers (#2413)

* added masking to merge layer (#2413)

* added documentation, fixed stylistic issues

* removed casting

* changed to using K.all

* Fix flaky test

* Small fixes in text gen example

* Remove unnecessary space

* A small typo (#3067)

* Fix typo (#3070)

* Fix flaky test

* Fix duplicated updates issue

* Add attribute caching for flattened_layers

* Prepare 1.0.5 PyPI release

* Fix flaky test

* model should use binary accuracy for binary crossentropy loss (#3098)

* Fix issue with multi-io + BatchNorm mask computing

* Remove unnecessary assert

* Fix masking test

* Style fix in test

* Added optional path argument (#3118)

* Validate dot_axes argument in cos mode and fix output shape (#3116)

* Validate dot_axes argument in cos mode

* Update topology.py

* Update topology.py

* Prevent image_dim_ordering from being overwritten

* TimeDistributedDense -> TimeDistributed(Dense()) in doc example

* Lambda should not support masking implicitly

* New conv ops (#3134)

* New function signature for conv2d in backend

* Clean up stuff

* Touch-up TF deconv op

* More cleanup

* Support for TF 3D conv/pool

* Move pooling layers to their own file

* Update TF version in Travis config

* Fix conv3d tests

* locally-connected layer

add unittest, fix output shape

PEP8

flatten weight, improve example

update docstring, remove cifar10 Alex exmaple

improve docstring, remove duplicate func

parallel by batch_dot

fix theano batch_dot

dim_ordering unit test, theano only use dot

dim_ordering unit test

Update locally connected layers

* Add tests for locally connected layers

* Add MIT license badge to README

* Add multiprocessing for fit generator (#3049)

* Add multiprocessing for fit generator

* Change maxproc to nb_worker and update documentation

* Simplify multiprocessing test, clarify doc replace maxproc by nb_worker

* Replace maxproc by nb_worker in test

* Replace maxproc by nb_worker in test

* Update the doc: specify non picklable arguments should not be used with multiprocessing

* Add multiprocessing as an option with the pickle_safe argument

* Lambda output shape (#2680)

* updating the info for lambda

* updated lambda doc a bit more

made it more readable and stuff

* fix docs bugs (#3142)

* fix docs bugs

* fix docs bugs

* Added 'max' operation to Merge layer (#3128)

* Added 'max' operation to Merge layer. It allows to implement convolutional maxout with two (or more) convoluion layers and one Merge.

* Added 'max' to merge test

* Use defaultdict for _UID_PREFIXES (#3087)

The method get_uid on common.py first check if a prefix is in _UID_PREFIXED dict
and if it is not, a variable is added to the dict.

However, using a defaultdict, this check is no longer necessary.

* Less frequent dataset tests

* Style touch-ups in TF backend

* fix get_output_shape_for in Merge, when mode is callable (#3144)

* Added optional field name argument to RemoteMonitor callback (#3157)

* Added optional path argument

* Added optional field name argument

* Create initial_state tensor filled with zeros without use of K.zeros (#3123)

* Create initial_state tensor filled with zeros without use of K.zeros

* minor PEP8 fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment