Potential bug #7

rainwoodman · 2021-11-05T17:46:30Z

The structure of the train_step in cell 8 of the notebook is very unconventional.

def train_step(self, data):
... first model evaluation
... first tape gradient
... second model evaluation
... update parameters
... second tape gradient

Usually for the parameter update to affect the second tape gradient the update shall be before the second model evaluation.

def train_step(self, data):
... first model evaluation
... first tape gradient
... update parameters
... second model evaluation
... second tape gradient

The text was updated successfully, but these errors were encountered:

rainwoodman · 2021-11-05T17:49:25Z

For example, the second sequence gives different gradients:

import tensorflow as tf
import numpy as np

x = tf.Variable(tf.constant(3.))

@tf.function
def epsilon_before_eval():
  with tf.GradientTape() as tape:
    y = x * x
  g1 = tape.gradient(y, x)

  x.assign_add(1.0)
  with tf.GradientTape() as tape:
    y = x * x
  g = tape.gradient(y, x)
  x.assign_sub(1.0)
  print(y, g)
  return g1, g

g1, g = epsilon_before_eval()
assert x == 3.0
assert g1 - g == -2

rainwoodman · 2021-11-05T17:50:21Z

But the first sequence gives identical gradients (which means the implementation reverts back to the underlying non-SAM optimizer.

import tensorflow as tf
import numpy as np

x = tf.Variable(tf.constant(3.))

@tf.function
def epsilon_after_eval():
  with tf.GradientTape() as tape:
    y = x * x
  g1 = tape.gradient(y, x)

  with tf.GradientTape() as tape:
    y = x * x

  x.assign_add(1.0)
  g = tape.gradient(y, x)
  x.assign_sub(1.0)
  print(y, g)
  return g1, g

g1, g = epsilon_after_eval()
assert x == 3.0
assert g1 - g == -2  ## fails because g1 == g

sayakpaul · 2021-11-06T02:50:08Z

Thank you so much for pointing this out. Lesson learned, indeed!

Would you be interested in sending a PR reflecting this change in the notebook?

rainwoodman · 2021-11-08T17:54:28Z

Sure. Thanks for confirming! I initially thought this may indicate another function / eager inconsistency in TensorFlow. We've been chasing wildly after such corner cases ;)

I am not particular good at notebook PRs. Any suggested process other than editing json directly?

sayakpaul · 2021-11-09T01:58:23Z

Yeah so, you could first clone this repository.

Then you could open the notebook in Colab directly, make changes, and commit that directly inside your repository from Colab itself. Then you could raise the PR.

Let me know if anything is unclear.

Fix #7

rainwoodman mentioned this issue Nov 10, 2021

Fix #7 #8

Merged

sayakpaul closed this as completed in #8 Nov 11, 2021

sayakpaul added a commit that referenced this issue Nov 11, 2021

Merge pull request #8 from rainwoodman/main

8b08871

Fix #7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential bug #7

Potential bug #7

rainwoodman commented Nov 5, 2021

rainwoodman commented Nov 5, 2021

rainwoodman commented Nov 5, 2021

sayakpaul commented Nov 6, 2021

rainwoodman commented Nov 8, 2021

sayakpaul commented Nov 9, 2021

Potential bug #7

Potential bug #7

Comments

rainwoodman commented Nov 5, 2021

rainwoodman commented Nov 5, 2021

rainwoodman commented Nov 5, 2021

sayakpaul commented Nov 6, 2021

rainwoodman commented Nov 8, 2021

sayakpaul commented Nov 9, 2021