GitHub - jaaack-wang/hands-on-gradients-derivation-for-ml-dl-loss-func: Hands on gradients derivations for common supervised machine learning and deep learning loss functions.

Step-by-step deatiled gradients derivations for common supervised machine learning and deep learning loss functions, may be suitable for people who just started learning machine learning and deep learning. The notations in the field can vary widely from person to person or change over time, but the notations here are consistent and easy.

Currently, I have derived the gradients of the loss functions for linear regression and logistic regression using the following set of notations. To see how gradients can be derived in a deep learning nerual network setting, you can check my Notes-for-Stanford-CS224N-NLP-with-Deep-Learning, although the notations there are different from those used here:

[Jack's Notes] 1-Intro and Word Vectors.ipynb provides gradients derivations for the average negative log likelihood loss function used in word2vec algorithm, a shallow nerual network architecture.
[Jack's Notes] 3-Neural Networks.ipynb provides the general way to derive gradients in multilayer nerual networks using chain rules and Jacobian Matrix.
[Jack's notes] 4-Backpropagation.ipynb provides an easy and unified way to derive gradients for both sigmoid function and softmax function, two most common output layer activation functions used in nerual networks.

In the future, I will use the following notations to do the gradients derivations for nerual networks too so that there can be a very unified hands-on tutorials for common supervised machine learning and deep learning loss functions. Keep learning!

Notations

$x$ : the input variables (features).
$y$ : the true output variables that we want to predict (observations);
$\hat y$ : the predicted values;
$w$ : weights for the input variables.
$b$ : bais term for the input variables. In some machine learning courses and tutorials, people may use the theta sign $\theta$ to reprensent both the weights and the bais term, which seem to be much more popular notations to derive basic machine learning loss fuction gradients, such as linear regression and the logistic regression. But in deep learning,the $w$ and $b$ separation seems to be more common.
The bold font is for vector, $\mathbf{x}$ is a vector of $x$ , and $\mathbf{y}$ is a vector of $y$ . And so on. But please note that, $\mathbf{x}$ is for all the variables we have for any given single training example, but $\mathbf{y}$ is for all the training examples. This is because the obervation to predict is always a single fixed value, regardless of being a discrete class (for classification), or a continuous value (for regression).
The uppercase plus a bold font denotes matrix, such as $\mathbf{X}$ , $\mathbf{W}$ etc.
$m$ always denotes the number of the training examples, so $\mathbf{y} = [y_i]$ for $i \in (1, m)$ . The subscript $i$ is always related to $m$ .
$n$ always means the number of input variables for any given training example, so $\mathbf{x} = [x_k]$ for $k \in (1, k)$ . The subscript $k$ is always related to $n$ . In classification problems, the subscript $j$ instead stands for the index where the true class is.
When two subscripts are used together, $i$ is always before $k$ , such that ${x_{ik}}$ means the $k$ th input variable in the $i$ th training example. Thus, $\mathbf{X} = [x_{ik}]$ for $k \in (1, k)$ and for $i \in (1, m)$

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
pdf		pdf
README.md		README.md
linear_regression_gradient_descent.html		linear_regression_gradient_descent.html
logistic_regression_gradient_descent.html		logistic_regression_gradient_descent.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Notations

About

Releases

Packages

Languages

jaaack-wang/hands-on-gradients-derivation-for-ml-dl-loss-func

Folders and files

Latest commit

History

Repository files navigation

Notations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages