New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential bug #7
Comments
For example, the second sequence gives different gradients:
|
But the first sequence gives identical gradients (which means the implementation reverts back to the underlying non-SAM optimizer.
|
Thank you so much for pointing this out. Lesson learned, indeed! Would you be interested in sending a PR reflecting this change in the notebook? |
Sure. Thanks for confirming! I initially thought this may indicate another function / eager inconsistency in TensorFlow. We've been chasing wildly after such corner cases ;) I am not particular good at notebook PRs. Any suggested process other than editing json directly? |
The structure of the train_step in cell 8 of the notebook is very unconventional.
Usually for the parameter update to affect the second tape gradient the update shall be before the second model evaluation.
The text was updated successfully, but these errors were encountered: