Skip to content

Hands on gradients derivations for common supervised machine learning and deep learning loss functions.

Notifications You must be signed in to change notification settings

jaaack-wang/hands-on-gradients-derivation-for-ml-dl-loss-func

Repository files navigation

Step-by-step deatiled gradients derivations for common supervised machine learning and deep learning loss functions, may be suitable for people who just started learning machine learning and deep learning. The notations in the field can vary widely from person to person or change over time, but the notations here are consistent and easy.

Currently, I have derived the gradients of the loss functions for linear regression and logistic regression using the following set of notations. To see how gradients can be derived in a deep learning nerual network setting, you can check my Notes-for-Stanford-CS224N-NLP-with-Deep-Learning, although the notations there are different from those used here:

In the future, I will use the following notations to do the gradients derivations for nerual networks too so that there can be a very unified hands-on tutorials for common supervised machine learning and deep learning loss functions. Keep learning!

Notations

  • : the input variables (features).
  • : the true output variables that we want to predict (observations);
  • : the predicted values;
  • : weights for the input variables.
  • : bais term for the input variables. In some machine learning courses and tutorials, people may use the theta sign to reprensent both the weights and the bais term, which seem to be much more popular notations to derive basic machine learning loss fuction gradients, such as linear regression and the logistic regression. But in deep learning,the and separation seems to be more common.
  • The bold font is for vector, is a vector of , and is a vector of . And so on. But please note that, is for all the variables we have for any given single training example, but is for all the training examples. This is because the obervation to predict is always a single fixed value, regardless of being a discrete class (for classification), or a continuous value (for regression).
  • The uppercase plus a bold font denotes matrix, such as , etc.
  • always denotes the number of the training examples, so for . The subscript is always related to .
  • always means the number of input variables for any given training example, so for . The subscript is always related to . In classification problems, the subscript instead stands for the index where the true class is.
  • When two subscripts are used together, is always before , such that means the th input variable in the th training example. Thus, for and for

About

Hands on gradients derivations for common supervised machine learning and deep learning loss functions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages