-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lab 10. Back Propagation Implementation. #25
Comments
MSE will still work, just not very well as it's less suitable for multinomial distribution. I'm guessing the choice is for simplicity reasons, as cross-entropy+softmax would make it harder to follow. For pedagogical reasons, not introducing extra concepts which are extensions of the base concept (in this case, backprop) being taught is probably better for the audience. (That said, TensorFlow's notation decreases the readability significantly with little gain, so there is that...) |
I agree it can look tedious but I thought to bring up because of the following reasons:
# Forward
...
loss = tf.reduce_sum(tf.square(y_pred - Y)) / 2 # or loss = tf.nn.l2_loss(y_pred - Y) / 2
diff = (y_pred - Y)
... |
@kkweon : " I think it's still better to explicitly define what loss function we are using in every files (even if we stick to the MSE and do backprops by hands)" I think it's a very good idea. |
I second the point that it would be useful for the readers if the method used was noted, and probably a inline comment mentioning "you wouldn't do this in a real world environment" about the inadequacy of the pieces used. (And deal with better tools for this later.) I understand that TensorFlow is the cool thing to do, but I'm a bit curious if it would have been better to do this in raw numpy for beginners. (TF's lazy evaluation and un-Pythonic notation can be confusing even for seasoned Python programmers.) |
It seems good, but some dimensions are wrong which I suppose you are aware of this. I think it's also worth mentioning how to do a quick dimension check. So, we know
So as long as we know the derivative of W2 must have the same dimension, we can just focus on a normal calculus (without worrying about matrix). |
@kkweon Thanks for the comments. For figure 1, it's for single values so no need to worry about the dimensions. In Figure 1, I just wanted to show how forward and back prop works with the simple chain rule. I added Figure 2 for matrix, and I guess the dimensions are all correct. Basically, we can directly write code from these rules. Could you do a quick check? Cheers! For easy comments, I shared the slides + latex code at https://docs.google.com/presentation/d/1_ZmtfEjLmhbuM_PqbDYMXXLAqeWN0HwuhcSKnUQZ6MM/edit?usp=sharing. |
@cynthia I agree. Using TF to write backprp is not the best idea. However, I don't want to introduce new numpy functions such as np.dot, etc. Do you think we can simplify this code as much as we can? For example l1 = tf.add(tf.matmul(X, w1), b1) -> l1 = tf.matmul(X, w1) + b1. It's just my thought. Feel free to add yours. |
There were two typos (left comments in the google slides). Everything else looks good to me. |
So, here are my two cents: consistency wise, I'm not sure if I am in full agreement that TF notation is easier to understand than numpy notation. As for the slides, content wise I don't think I can add more than what has been mentioned above - but who is your audience? If you want your audience to be graduate school (or at least CS undergrad) level, the slides are fine. If you want to make the material accessible for everyone, using mathematical notation is not a great idea. (Even the most "obvious" greek characters are enough to scare away most programmers.) |
@cynthia I see. Perhaps, could you make a simple numpy version of lab 10-X1? I really appreciate it. Slides, I guess they are for more advanced students/developers. Certainly, they are not for the beginners. |
Sure, that's probably a separate issue, I'll send in a numpy PR when I have time. As for the remark about this being advanced students, I think advanced students deserve better datasets. I'm personally a bit uncomfortable with the data used ([1 2 3] -> [1 2 3]) as it's not the best data for demonstrating the characteristics of the underlying algorithms involved. Obviously, this is a subjective remark from one person so feel free to ignore it. Aside from that nit, LGTM. (LGTM is not for the slides, I haven't looked at them carefully so I don't have any remarks) |
@cynthia "I'll send in a numpy PR when I have time." +1 "data used ([1 2 3] -> [1 2 3]) as it's not the best" , agree. However, I used that in my theory lecture part, so it's hard to change in the lab. When I remake the theory video, I'll change it. Thanks for your comments. |
In,
lab-10-X1-mnist_back_prop.py
Back propagation is defined as follows:
Problem
This backpropagation is only true when the loss function is
Proof
Current Forward Step:
If we assume the loss function is above,
which is represented as
We can continue for other variables
Conclusion
loss
function should be clearly defined before going into any back propagations. This follows Andrew Kapathy's approach as well.The text was updated successfully, but these errors were encountered: