-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Our manual backprop weight average is missing #173
Comments
We just need to divide by its batch size. N = X.shape[0]
d_W1 = tf.matmul(tf.transpose(X), d_l1) / N |
How about something like this? I removed W1 = tf.Variable(tf.random_normal([2, 2]), name='weight1')
b1 = tf.Variable(tf.random_normal([2]), name='bias1')
layer1 = tf.sigmoid(tf.matmul(X, W1) + b1)
W2 = tf.Variable(tf.random_normal([2, 1]), name='weight2')
b2 = tf.Variable(tf.random_normal([1]), name='bias2')
Y_pred = tf.sigmoid(tf.matmul(layer1, W2) + b2)
# cost/loss function
cost = -tf.reduce_mean(Y * tf.log(Y_pred) + (1 - Y) *
tf.log(1 - Y_pred))
d_Y_pred = (Y_pred - Y) / (Y_pred * (1.0 - Y_pred) + 1e-7)
d_sigma = Y_pred * (1 - Y_pred)
# Layer 2
d_o2 = d_Y_pred * d_sigma
d_l2 = tf.multiply(d_o2, d_sigma)
d_b2 = d_l2
d_W2 = tf.matmul(tf.transpose(layer1), d_l2)
# Mean
d_b2_mean = tf.reduce_mean(d_b2, axis=[0])
d_W2_mean = d_W2 / tf.cast(tf.shape(layer1)[0], dtype=tf.float32)
# Layer 1
d_o1 = layer1 * (1-layer1)
d_l1 = tf.multiply(tf.matmul(d_l2, tf.transpose(W2)), d_o1)
d_b1 = d_l1
d_W1 = tf.matmul(tf.transpose(X), d_l1)
# Mean
d_W1_mean = d_W1 / tf.cast(tf.shape(X)[0], dtype=tf.float32)
d_b1_mean = tf.reduce_mean(d_b1, axis=[0])
# Weight update
step = [
tf.assign(W2, W2 - learning_rate * d_W2_mean),
tf.assign(b2, b2 - learning_rate * d_b2_mean),
tf.assign(W1, W1 - learning_rate * d_W1_mean),
tf.assign(b1, b1 - learning_rate * d_b1_mean)
] |
I don't have a machine to run a test right now, but I guess it will work. |
@kkweon Do you need a machine to run? :-) I think your brain is enough. The previous code starts with diff: Let me know if you have any refactoring suggestions. |
@hunkim
Like you remember when it turned into 1.0, all the basic operations were renamed. |
Refactored: # cost/loss function
cost = -tf.reduce_mean(Y * tf.log(Y_pred) + (1 - Y) *
tf.log(1 - Y_pred))
# Loss derivative
d_Y_pred = (Y_pred - Y) / (Y_pred * (1.0 - Y_pred) + 1e-7)
# Layer 2
d_sigma2 = Y_pred * (1 - Y_pred)
d_l2 = d_Y_pred * d_sigma2
d_b2 = d_l2
d_W2 = tf.matmul(tf.transpose(layer1), d_l2)
# Mean
d_b2_mean = tf.reduce_mean(d_b2, axis=[0])
d_W2_mean = d_W2 / tf.cast(tf.shape(layer1)[0], dtype=tf.float32)
# Layer 1
d_sigma1 = layer1 * (1-layer1)
d_l1 = d_l2 * d_sigma1
d_b1 = d_l1
d_W1 = tf.matmul(tf.transpose(X), d_l1)
# Mean
d_W1_mean = d_W1 / tf.cast(tf.shape(X)[0], dtype=tf.float32)
d_b1_mean = tf.reduce_mean(d_b1, axis=[0])
# Weight update
step = [
tf.assign(W2, W2 - learning_rate * d_W2_mean),
tf.assign(b2, b2 - learning_rate * d_b2_mean),
tf.assign(W1, W1 - learning_rate * d_W1_mean),
tf.assign(b1, b1 - learning_rate * d_b1_mean)
] make sense? |
looks good. autopep8 will do the rest. |
@kkweon This is right version: # Network
# p1 a1 l1 p2 a2 l2 (y_pred)
# X -> (*) -> (+) -> (sigmoid) -> (*) -> (+) -> (sigmoid) -> (loss)
# ^ ^ ^ ^
# | | | |
# W1 b1 W2 b2
# Loss derivative
d_Y_pred = (Y_pred - Y) / (Y_pred * (1.0 - Y_pred) + 1e-7)
# Layer 2
d_sigma2 = Y_pred * (1 - Y_pred)
d_a2 = d_Y_pred * d_sigma2
d_p2 = d_a2
d_b2 = d_a2
d_W2 = tf.matmul(tf.transpose(l1), d_p2)
# Mean
d_b2_mean = tf.reduce_mean(d_b2, axis=[0])
d_W2_mean = d_W2 / tf.cast(tf.shape(l1)[0], dtype=tf.float32)
# Layer 1
d_l1 = tf.matmul(d_p2, tf.transpose(W2))
d_sigma1 = l1 * (1 - l1)
d_a1 = d_l1 * d_sigma1
d_b1 = d_a1
d_p1 = d_a1
d_W1 = tf.matmul(tf.transpose(X), d_a1)
# Mean
d_W1_mean = d_W1 / tf.cast(tf.shape(X)[0], dtype=tf.float32)
d_b1_mean = tf.reduce_mean(d_b1, axis=[0])
# Weight update
step = [
tf.assign(W2, W2 - learning_rate * d_W2_mean),
tf.assign(b2, b2 - learning_rate * d_b2_mean),
tf.assign(W1, W1 - learning_rate * d_W1_mean),
tf.assign(b1, b1 - learning_rate * d_b1_mean)
] Can you run in your brain? |
Yes, the comment really helped. It looks great. |
@kkweon Do you like the naming? |
@hunkim it should be fine with the comment. Honestly I thought it is a name for the activation layer but I was able to figure it out by reading the comment |
@kkweon Still I don't like names. Let me know if you have any suggestions. |
For example,
https://github.com/hunkim/DeepLearningZeroToAll/blob/master/lab-09-x-xor-nn-back_prop.py
d_W1 = tf.matmul(tf.transpose(X), d_l1)
X's shape is (?, 2), X^T's shape is (2,?), and dl_1's shape is (?, 2). The shape of d_W1 should be (2,2), but the values of d_W1 are proportional to the sample size. We need to average these values.
The current sample size is only 4 so it's OK, but when sample size is too big, it does not work.
To reproduce add this code:
FYI, d_b values are averaged:
tf.reduce_mean(d_b1, axis=[0])
The text was updated successfully, but these errors were encountered: