Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glorot_normal init should be glorot_uniform? #52

Closed
capybaralet opened this issue Apr 16, 2015 · 9 comments
Closed

glorot_normal init should be glorot_uniform? #52

capybaralet opened this issue Apr 16, 2015 · 9 comments

Comments

@capybaralet
Copy link
Contributor

I'm assuming this is meant to implement the novel initialization proposed in this paper: http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf

at the bottom of page 253, but that is a uniform initialization, and the numerator is 6, not 2.

@fchollet
Copy link
Member

Maybe we need both. I've seen this scheme around in various forms (sometimes normal, sometimes uniform) and various names (Caffe seems to call it xavier?). And it's been around far prior Glorot 2010 tbh.

@untom thoughts on this?

@capybaralet
Copy link
Contributor Author

where was it used before that paper?

On Thu, Apr 16, 2015 at 7:10 PM, fchollet notifications@github.com wrote:

Maybe we need both. I've seen this scheme around in various forms
(sometimes normal, sometimes uniform) and various names (Caffe seems to
call it xavier?). And it's been around far prior Glorot 2010 tbh.

@untom https://github.com/untom thoughts on this?


Reply to this email directly or view it on GitHub
#52 (comment).

@fchollet
Copy link
Member

scale / sqrt(nb_units) has been around since the late 80s, with scale values generally between 1 and 2. Sometimes you'd use nb_units = fan in, sometimes nb_units = fan_out, sometimes the sum of the two. For the latter you'd use high scale values, for balance. It's not rocket science.

It hasn't necessarily been written down very often (earliest might have been LeCun98), but it has been part of the old toolbox of NN "tricks" for a long time.

@untom
Copy link
Contributor

untom commented Apr 17, 2015

The main assumption behind the Glorot initialization is that the variance of the gradients should be the same in each layer. In eq 12 of the paper you can see that to achieve this, the variance of the layer-sizes should be 2 / (fan-in + fan-out) . To achieve this, you could initialize your weights either directly from a normal with sigma^2 =2 / (fan-in + fan-out) or by the uniform distribution given in in eq 16 of the paper (i.e., the one with sqrt(6) in the numerator). If you calculate the variance of the latter, you will see that it is again 2/(fi+fo).

So in terms of the Glorot paper, Normal(0, 2/(fi+fo)) achieves the same thing as Uniform(-sqrt(6)/sqrt(fi+fo), sqrt(6)/sqrt(fi+fo)), namely that the variance of the gradients is initially approximately the same in each layer.

So the only question remaining is whether one should use a normal or a uniform distribution. This is a personal choice. People around Bengio's group have always preferred initializing from a uniform distribution (hence they used that in the paper), while e.g. Hinton always advocates using a normal distribution. I personally think that using a normal is the more natural thing, since at the end of training, the weight distribution always looks approximately Gaussian anyway (unless you use e.g. L1 regularization), no matter what you started with. So my reasoning is that with a normal, you at least have the correct prior. Which is why my patch to keras used the normal, as well.

To be honest, I do not think it makes much of a difference. It certainly does not make a difference in terms of how Glorot derived his initialization.

@capybaralet
Copy link
Contributor Author

So the reasoning I have heard for initializing from uniform instead of
normal is that you don't have outliers. Of course, normal dist shouldn't
have extreme outliers, but it still has outliers. I'm not sure why that
would matter, but it seems somewhat intuitive to me.

The argument that it ends up being Gaussian doesn't seem strong to me...
you don't which weights should be large before training, and if it ends up
Gaussian anyways, then why does it matter?

Can you give a reference for Hinton advocating normals?

wrt the code, I just think it would be good to actually use the
initialization from the paper you reference, so people aren't surprised.

On Fri, Apr 17, 2015 at 2:12 AM, untom notifications@github.com wrote:

The main assumption behind the Glorot initialization is that the variance
of the gradients should be the same in each layer. In eq 12 of the paper
you can see that to achieve this, the variance of the layer-sizes should be
2 / (fan-in + fan-out) . To achieve this, you could initialize your weights
either directly from a normal that has this distribution (since the
variance is a parameter of the normal) or by the uniform distribution given
in in eq 16 of the paper (i.e., the one with sqrt(6) in the numerator). If
you calculate the variance of the later, you will see that it is again
2/(fi+fo).

So in terms of the Glorot paper, Normal(0, 2/(fi+fo)) achieves the same
thing as Uniform(-sqrt(6)/sqrt(fi+fo), sqrt(6)/sqrt(fi+fo)), namely that
the variance of the gradients is initially approximately the same in each
layer.

So the only question remaining is whether one should use a normal or a
uniform distribution. This is a personal choice. People around Bengio's
group have always preferred initializing from a uniform distribution (hence
they used that in the paper), while e.g. Hinton always advocates using a
normal distribution. I personally think that using a normal is the more
natural thing, since at the end of training, the weight distribution always
looks approximately Gaussian anyway (unless you use e.g. L1
regularization), no matter what you started with. So my reasoning is that
with a normal, you at least have the correct prior. Which is why my patch
to keras used the normal, as well.

To be honest, I do not think it makes much of a difference. It certainly
does not make a difference in terms of how Glorot derived his
initialization.


Reply to this email directly or view it on GitHub
#52 (comment).

@untom
Copy link
Contributor

untom commented Apr 17, 2015

That's an interesting reason. But thinking about it, are outliers in the weight-sizes are necessarily a bad thing? As long as overall the gradients don't explode/collapse (which is what the variance-derivation of Glorot is about) you're still okay, aren't you? And as long as learning can proceed, your weights will be adjusted.

The argument that it ends up being Gaussian doesn't seem strong to me... you don't which weights should be large before training, and if it ends up Gaussian anyways, then why does it matter?

The whole point of a "good initialization" is having one that somehow aids learning. You could always argue "if you and up with the same solution in the end, what does it matter how I initialize?". Like I said, picking a prior that matches the posterior just makes sense to me, but I'll admit the argument isn't strong. However, having each unit initialized by a Gaussian also allows each unit to focus more strongly on a given combination of inputs (since fewer of the weights will be large, and likely the combination of large weights will be different for each unit). Of course you'll end up with units that have nonsensical combinations, and those will take longer to recover than if they'd have been initialized uniformly.

Every paper from Hinton's group that mentions how they initialized weights uses Gaussians. Off the top of my head , a specific example from Hinton himself comes from his RBM-Training guide (https://www.cs.toronto.edu/~hinton/absps/guideTR.pdf , section 8). But if you go through papers from his group, several mention Gaussian initialization, none mention Uniform distributions.

In any case, like I said I don't think it will make much difference in the end. So if you think uniform is more userfriendly, then maybe that's the best thing to do. (Personally, I found the normal easier to understand because it doesn't hide the 2/(fi+fo), which was Glorot's main result). But yeah, if people expect a uniform, then of course it's confusing.

@fchollet
Copy link
Member

After some research, uniform seems to be more common across other libraries.

I don't really buy the gaussian argument. A random normal array is less likely to be a good approximation of another random normal array than a constant or random uniform (small scale) array. Yes, the distribution of values will be the same in aggregate, but the mean absolute error per weight will be larger.

The point of a good initialization is one where your weights 1) make learning possible (avoid pathological cases with no gradient or exploding gradients), and 2) are the closest possible to the final learned weights. Normal distributions seem less likely to fit 2) compared to uniform distributions.

I will add glorot_uniform and he_uniform, to match what other DL libraries are doing. I think we should also make glorot_uniform the default initialization for the layers in which uniform is currently used as default.

@fchollet
Copy link
Member

Benchmarking on MNIST gives me better results with glorot_uniform compared to glorot_normal. glorot_uniform also appears to perform about as well as lecun_uniform.

glorot_uniform:

Train on 37800 samples, validate on 4200 samples
Epoch 0
loss: 0.0257 - acc.: 0.7500 - val. loss: 0.0123 - val. acc.: 0.9348
Epoch 1
loss: 0.0092 - acc.: 1.0000 - val. loss: 0.0081 - val. acc.: 0.9512
Epoch 2
loss: 0.0112 - acc.: 0.8750 - val. loss: 0.0070 - val. acc.: 0.9590
Epoch 3
loss: 0.0031 - acc.: 1.0000 - val. loss: 0.0061 - val. acc.: 0.9631
Epoch 4
loss: 0.0029 - acc.: 1.0000 - val. loss: 0.0054 - val. acc.: 0.9664
Epoch 5
loss: 0.0027 - acc.: 1.0000 - val. loss: 0.0051 - val. acc.: 0.9674
Epoch 6
loss: 0.0047 - acc.: 1.0000 - val. loss: 0.0050 - val. acc.: 0.9657
Epoch 7
loss: 0.0012 - acc.: 1.0000 - val. loss: 0.0050 - val. acc.: 0.9679
Epoch 8
loss: 0.0119 - acc.: 0.8750 - val. loss: 0.0048 - val. acc.: 0.9700
Epoch 9
loss: 0.0011 - acc.: 1.0000 - val. loss: 0.0045 - val. acc.: 0.9712

glorot_normal:

Train on 37800 samples, validate on 4200 samples
Epoch 0
loss: 0.0208 - acc.: 0.8750 - val. loss: 0.0127 - val. acc.: 0.9367
Epoch 1
loss: 0.0113 - acc.: 1.0000 - val. loss: 0.0088 - val. acc.: 0.9490
Epoch 2
loss: 0.0045 - acc.: 1.0000 - val. loss: 0.0076 - val. acc.: 0.9548
Epoch 3
loss: 0.0245 - acc.: 0.7500 - val. loss: 0.0070 - val. acc.: 0.9598
Epoch 4
loss: 0.0090 - acc.: 0.8750 - val. loss: 0.0062 - val. acc.: 0.9643
Epoch 5
loss: 0.0032 - acc.: 1.0000 - val. loss: 0.0057 - val. acc.: 0.9660
Epoch 6
loss: 0.0009 - acc.: 1.0000 - val. loss: 0.0058 - val. acc.: 0.9650
Epoch 7
loss: 0.0032 - acc.: 1.0000 - val. loss: 0.0057 - val. acc.: 0.9643
Epoch 8
loss: 0.0155 - acc.: 0.8750 - val. loss: 0.0053 - val. acc.: 0.9679
Epoch 9
loss: 0.0053 - acc.: 1.0000 - val. loss: 0.0052 - val. acc.: 0.9679

Code is at https://www.kaggle.com/users/123235/fchollet/digit-recognizer/simple-deep-mlp-with-keras

@untom
Copy link
Contributor

untom commented Apr 17, 2015

Interesting, thanks for the tests! :)

hubingallin pushed a commit to hubingallin/keras that referenced this issue Sep 22, 2023
hubingallin pushed a commit to hubingallin/keras that referenced this issue Sep 22, 2023
* Add golden correctness tests for Adam and SGD

* Fix dtype issues

* Sync with main (keras-team#56)

* Minor touch ups

* Fix a pretty major bug

* Format code

* Big rethink of Variable API

* Make build-by-run the default build(), leveraging new zero_history KerasTensor mode

* Minor fixes

* Format code

* Switch back to build-by-eager-run for simplicity

* Add raise upon build failure

* Work around JAX bug.

* Add a few more tests.

* Add saving tests

* Adds test suite for SGD and golden correctness tests for all optimizers (keras-team#40)

* Add golden correctness tests for Adam and SGD

* Fix dtype issues

* Add binary accuracy (keras-team#41)

* chore: adding binary accuracy

* chore: fix docstring

* Add tests for add_loss and activity regularization.

* Reformat code

* Add ActivityRegularization layer

* Fix JAX CI.

* Add Lambda Callback (keras-team#42)

* Add LambdaCallback

* Add Lambda Callback

* Add Lambda Callback

* Rename lambda_callback_test.py

* Add einsum (keras-team#43)

* Add einsum

* address comments

* Fix format line length (keras-team#45)

* Add Embedding layer

* Shorten lines

* Add .vscode to .gitignore (keras-team#46)

* rm vscode settings

* add .vscode to gitignore

* Set demo program backend (keras-team#48)

* Add tests for training arg resolution in Layer.

* Implement mixed precision.

* Replace backend.execute with backend.numpy.XXX (keras-team#50)

* Add cosine similarity loss and update l2_normalize from regularizers (keras-team#34)

* Begin cosine loss

* Add testing for cosine similarity

* Fix formatting

* Docstring standardization

* Formatting

* Create numerical_utils

* Fix issue with call context lingering.

* Add the EarlyStopping callback (keras-team#44)

* add earlystopping callback

* addressing comments

* address comments

* addressing comments

* remove unused imports

* re-enable imports checks (keras-team#51)

* Add nn.one_hot (keras-team#52)

* Add GaussianDropout layer.

* Add GaussianNoise layer

* Add Categorical Accuracy Metric (keras-team#47)

* chore: adding categorical accuracy metric

* chore: reformat docstrings

* chore: reformat

* chore: ndims with len

* refactor the docstring

* Fix typos

* Implement masking.

---------

Co-authored-by: Francois Chollet <francois.chollet@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com>
Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>

* Adds rmsprop optimizer and tests

* Add AdamW optimizer and tests, minor formatting changes

* Implemented formatting fixes

---------

Co-authored-by: Francois Chollet <francois.chollet@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com>
Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>
hubingallin pushed a commit to hubingallin/keras that referenced this issue Sep 22, 2023
…m#72)

* Add golden correctness tests for Adam and SGD

* Fix dtype issues

* Sync with main (keras-team#56)

* Minor touch ups

* Fix a pretty major bug

* Format code

* Big rethink of Variable API

* Make build-by-run the default build(), leveraging new zero_history KerasTensor mode

* Minor fixes

* Format code

* Switch back to build-by-eager-run for simplicity

* Add raise upon build failure

* Work around JAX bug.

* Add a few more tests.

* Add saving tests

* Adds test suite for SGD and golden correctness tests for all optimizers (keras-team#40)

* Add golden correctness tests for Adam and SGD

* Fix dtype issues

* Add binary accuracy (keras-team#41)

* chore: adding binary accuracy

* chore: fix docstring

* Add tests for add_loss and activity regularization.

* Reformat code

* Add ActivityRegularization layer

* Fix JAX CI.

* Add Lambda Callback (keras-team#42)

* Add LambdaCallback

* Add Lambda Callback

* Add Lambda Callback

* Rename lambda_callback_test.py

* Add einsum (keras-team#43)

* Add einsum

* address comments

* Fix format line length (keras-team#45)

* Add Embedding layer

* Shorten lines

* Add .vscode to .gitignore (keras-team#46)

* rm vscode settings

* add .vscode to gitignore

* Set demo program backend (keras-team#48)

* Add tests for training arg resolution in Layer.

* Implement mixed precision.

* Replace backend.execute with backend.numpy.XXX (keras-team#50)

* Add cosine similarity loss and update l2_normalize from regularizers (keras-team#34)

* Begin cosine loss

* Add testing for cosine similarity

* Fix formatting

* Docstring standardization

* Formatting

* Create numerical_utils

* Fix issue with call context lingering.

* Add the EarlyStopping callback (keras-team#44)

* add earlystopping callback

* addressing comments

* address comments

* addressing comments

* remove unused imports

* re-enable imports checks (keras-team#51)

* Add nn.one_hot (keras-team#52)

* Add GaussianDropout layer.

* Add GaussianNoise layer

* Add Categorical Accuracy Metric (keras-team#47)

* chore: adding categorical accuracy metric

* chore: reformat docstrings

* chore: reformat

* chore: ndims with len

* refactor the docstring

* Fix typos

* Implement masking.

---------

Co-authored-by: Francois Chollet <francois.chollet@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com>
Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>

* Adds rmsprop optimizer and tests

* Add AdamW optimizer and tests, minor formatting changes

* Implemented formatting fixes

* Adds clip norm and clip value tests to Adam

* Adds Adagrad and Adadelta optimizers

* Applies fixes to formatting and deletes unnecessary kwargs

---------

Co-authored-by: Francois Chollet <francois.chollet@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com>
Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>
hubingallin pushed a commit to hubingallin/keras that referenced this issue Sep 22, 2023
…rl) (keras-team#80)

* Add golden correctness tests for Adam and SGD

* Fix dtype issues

* Sync with main (keras-team#56)

* Minor touch ups

* Fix a pretty major bug

* Format code

* Big rethink of Variable API

* Make build-by-run the default build(), leveraging new zero_history KerasTensor mode

* Minor fixes

* Format code

* Switch back to build-by-eager-run for simplicity

* Add raise upon build failure

* Work around JAX bug.

* Add a few more tests.

* Add saving tests

* Adds test suite for SGD and golden correctness tests for all optimizers (keras-team#40)

* Add golden correctness tests for Adam and SGD

* Fix dtype issues

* Add binary accuracy (keras-team#41)

* chore: adding binary accuracy

* chore: fix docstring

* Add tests for add_loss and activity regularization.

* Reformat code

* Add ActivityRegularization layer

* Fix JAX CI.

* Add Lambda Callback (keras-team#42)

* Add LambdaCallback

* Add Lambda Callback

* Add Lambda Callback

* Rename lambda_callback_test.py

* Add einsum (keras-team#43)

* Add einsum

* address comments

* Fix format line length (keras-team#45)

* Add Embedding layer

* Shorten lines

* Add .vscode to .gitignore (keras-team#46)

* rm vscode settings

* add .vscode to gitignore

* Set demo program backend (keras-team#48)

* Add tests for training arg resolution in Layer.

* Implement mixed precision.

* Replace backend.execute with backend.numpy.XXX (keras-team#50)

* Add cosine similarity loss and update l2_normalize from regularizers (keras-team#34)

* Begin cosine loss

* Add testing for cosine similarity

* Fix formatting

* Docstring standardization

* Formatting

* Create numerical_utils

* Fix issue with call context lingering.

* Add the EarlyStopping callback (keras-team#44)

* add earlystopping callback

* addressing comments

* address comments

* addressing comments

* remove unused imports

* re-enable imports checks (keras-team#51)

* Add nn.one_hot (keras-team#52)

* Add GaussianDropout layer.

* Add GaussianNoise layer

* Add Categorical Accuracy Metric (keras-team#47)

* chore: adding categorical accuracy metric

* chore: reformat docstrings

* chore: reformat

* chore: ndims with len

* refactor the docstring

* Fix typos

* Implement masking.

---------

Co-authored-by: Francois Chollet <francois.chollet@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com>
Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>

* Adds rmsprop optimizer and tests

* Add AdamW optimizer and tests, minor formatting changes

* Implemented formatting fixes

* Adds clip norm and clip value tests to Adam

* Adds Adagrad and Adadelta optimizers

* Applies fixes to formatting and deletes unnecessary kwargs

* Adds Adamax and Adafactor and associated tests

* Adds Nadam and Ftrl optimizers and associated tests

---------

Co-authored-by: Francois Chollet <francois.chollet@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com>
Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>
kernel-loophole pushed a commit to kernel-loophole/keras that referenced this issue Sep 25, 2023
kernel-loophole pushed a commit to kernel-loophole/keras that referenced this issue Sep 25, 2023
* Add golden correctness tests for Adam and SGD

* Fix dtype issues

* Sync with main (keras-team#56)

* Minor touch ups

* Fix a pretty major bug

* Format code

* Big rethink of Variable API

* Make build-by-run the default build(), leveraging new zero_history KerasTensor mode

* Minor fixes

* Format code

* Switch back to build-by-eager-run for simplicity

* Add raise upon build failure

* Work around JAX bug.

* Add a few more tests.

* Add saving tests

* Adds test suite for SGD and golden correctness tests for all optimizers (#40)

* Add golden correctness tests for Adam and SGD

* Fix dtype issues

* Add binary accuracy (#41)

* chore: adding binary accuracy

* chore: fix docstring

* Add tests for add_loss and activity regularization.

* Reformat code

* Add ActivityRegularization layer

* Fix JAX CI.

* Add Lambda Callback (#42)

* Add LambdaCallback

* Add Lambda Callback

* Add Lambda Callback

* Rename lambda_callback_test.py

* Add einsum (#43)

* Add einsum

* address comments

* Fix format line length (#45)

* Add Embedding layer

* Shorten lines

* Add .vscode to .gitignore (#46)

* rm vscode settings

* add .vscode to gitignore

* Set demo program backend (#48)

* Add tests for training arg resolution in Layer.

* Implement mixed precision.

* Replace backend.execute with backend.numpy.XXX (#50)

* Add cosine similarity loss and update l2_normalize from regularizers (#34)

* Begin cosine loss

* Add testing for cosine similarity

* Fix formatting

* Docstring standardization

* Formatting

* Create numerical_utils

* Fix issue with call context lingering.

* Add the EarlyStopping callback (#44)

* add earlystopping callback

* addressing comments

* address comments

* addressing comments

* remove unused imports

* re-enable imports checks (keras-team#51)

* Add nn.one_hot (keras-team#52)

* Add GaussianDropout layer.

* Add GaussianNoise layer

* Add Categorical Accuracy Metric (#47)

* chore: adding categorical accuracy metric

* chore: reformat docstrings

* chore: reformat

* chore: ndims with len

* refactor the docstring

* Fix typos

* Implement masking.

---------

Co-authored-by: Francois Chollet <francois.chollet@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com>
Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>

* Adds rmsprop optimizer and tests

* Add AdamW optimizer and tests, minor formatting changes

* Implemented formatting fixes

---------

Co-authored-by: Francois Chollet <francois.chollet@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com>
Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>
kernel-loophole pushed a commit to kernel-loophole/keras that referenced this issue Sep 25, 2023
…m#72)

* Add golden correctness tests for Adam and SGD

* Fix dtype issues

* Sync with main (keras-team#56)

* Minor touch ups

* Fix a pretty major bug

* Format code

* Big rethink of Variable API

* Make build-by-run the default build(), leveraging new zero_history KerasTensor mode

* Minor fixes

* Format code

* Switch back to build-by-eager-run for simplicity

* Add raise upon build failure

* Work around JAX bug.

* Add a few more tests.

* Add saving tests

* Adds test suite for SGD and golden correctness tests for all optimizers (#40)

* Add golden correctness tests for Adam and SGD

* Fix dtype issues

* Add binary accuracy (#41)

* chore: adding binary accuracy

* chore: fix docstring

* Add tests for add_loss and activity regularization.

* Reformat code

* Add ActivityRegularization layer

* Fix JAX CI.

* Add Lambda Callback (#42)

* Add LambdaCallback

* Add Lambda Callback

* Add Lambda Callback

* Rename lambda_callback_test.py

* Add einsum (#43)

* Add einsum

* address comments

* Fix format line length (#45)

* Add Embedding layer

* Shorten lines

* Add .vscode to .gitignore (#46)

* rm vscode settings

* add .vscode to gitignore

* Set demo program backend (#48)

* Add tests for training arg resolution in Layer.

* Implement mixed precision.

* Replace backend.execute with backend.numpy.XXX (#50)

* Add cosine similarity loss and update l2_normalize from regularizers (#34)

* Begin cosine loss

* Add testing for cosine similarity

* Fix formatting

* Docstring standardization

* Formatting

* Create numerical_utils

* Fix issue with call context lingering.

* Add the EarlyStopping callback (#44)

* add earlystopping callback

* addressing comments

* address comments

* addressing comments

* remove unused imports

* re-enable imports checks (keras-team#51)

* Add nn.one_hot (keras-team#52)

* Add GaussianDropout layer.

* Add GaussianNoise layer

* Add Categorical Accuracy Metric (#47)

* chore: adding categorical accuracy metric

* chore: reformat docstrings

* chore: reformat

* chore: ndims with len

* refactor the docstring

* Fix typos

* Implement masking.

---------

Co-authored-by: Francois Chollet <francois.chollet@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com>
Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>

* Adds rmsprop optimizer and tests

* Add AdamW optimizer and tests, minor formatting changes

* Implemented formatting fixes

* Adds clip norm and clip value tests to Adam

* Adds Adagrad and Adadelta optimizers

* Applies fixes to formatting and deletes unnecessary kwargs

---------

Co-authored-by: Francois Chollet <francois.chollet@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com>
Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>
kernel-loophole pushed a commit to kernel-loophole/keras that referenced this issue Sep 25, 2023
…rl) (keras-team#80)

* Add golden correctness tests for Adam and SGD

* Fix dtype issues

* Sync with main (keras-team#56)

* Minor touch ups

* Fix a pretty major bug

* Format code

* Big rethink of Variable API

* Make build-by-run the default build(), leveraging new zero_history KerasTensor mode

* Minor fixes

* Format code

* Switch back to build-by-eager-run for simplicity

* Add raise upon build failure

* Work around JAX bug.

* Add a few more tests.

* Add saving tests

* Adds test suite for SGD and golden correctness tests for all optimizers (#40)

* Add golden correctness tests for Adam and SGD

* Fix dtype issues

* Add binary accuracy (#41)

* chore: adding binary accuracy

* chore: fix docstring

* Add tests for add_loss and activity regularization.

* Reformat code

* Add ActivityRegularization layer

* Fix JAX CI.

* Add Lambda Callback (#42)

* Add LambdaCallback

* Add Lambda Callback

* Add Lambda Callback

* Rename lambda_callback_test.py

* Add einsum (#43)

* Add einsum

* address comments

* Fix format line length (#45)

* Add Embedding layer

* Shorten lines

* Add .vscode to .gitignore (#46)

* rm vscode settings

* add .vscode to gitignore

* Set demo program backend (#48)

* Add tests for training arg resolution in Layer.

* Implement mixed precision.

* Replace backend.execute with backend.numpy.XXX (#50)

* Add cosine similarity loss and update l2_normalize from regularizers (#34)

* Begin cosine loss

* Add testing for cosine similarity

* Fix formatting

* Docstring standardization

* Formatting

* Create numerical_utils

* Fix issue with call context lingering.

* Add the EarlyStopping callback (#44)

* add earlystopping callback

* addressing comments

* address comments

* addressing comments

* remove unused imports

* re-enable imports checks (keras-team#51)

* Add nn.one_hot (keras-team#52)

* Add GaussianDropout layer.

* Add GaussianNoise layer

* Add Categorical Accuracy Metric (#47)

* chore: adding categorical accuracy metric

* chore: reformat docstrings

* chore: reformat

* chore: ndims with len

* refactor the docstring

* Fix typos

* Implement masking.

---------

Co-authored-by: Francois Chollet <francois.chollet@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com>
Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>

* Adds rmsprop optimizer and tests

* Add AdamW optimizer and tests, minor formatting changes

* Implemented formatting fixes

* Adds clip norm and clip value tests to Adam

* Adds Adagrad and Adadelta optimizers

* Applies fixes to formatting and deletes unnecessary kwargs

* Adds Adamax and Adafactor and associated tests

* Adds Nadam and Ftrl optimizers and associated tests

---------

Co-authored-by: Francois Chollet <francois.chollet@gmail.com>
Co-authored-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com>
Co-authored-by: Ramesh Sampath <1437573+sampathweb@users.noreply.github.com>
Co-authored-by: Chen Qian <chenmoney@google.com>
Co-authored-by: Haifeng Jin <5476582+haifeng-jin@users.noreply.github.com>
Co-authored-by: Gabriel Rasskin <43894452+grasskin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants