# Chain Rule

[Back to index](https://shotahorii.github.io/math-for-ds/)

---

## Table of contents
1. **Single-variable functions**  
1.1. First order  
1.2. Second order  
1.3. Higher derivatives
2. **Multi-variable functions**  
2.1. Multi-functions of a single variable  
2.2. Multi-functions of multi-variables  
2.3. Chain Rule and Jacobian

---

## 1. Single-variable functions
### 1.1. First order

Given $f(g(x))$, we can differentiate $f$ with respect to $x$ as below.

$\frac{df}{dx} = \frac{df}{dg}\frac{dg}{dx}$

**Example**

$y = (logx)^2$

Let : $g(x) = logx, \, f(g) = g^2$

$y' = \frac{df}{dg}\frac{dg}{dx} = 2g\cdot \frac{1}{x} = \frac{2logx}{x}$

**Generalisation**

Given : $f_1(f_2(f_3(...f_n(x))))$,

$\frac{df_1}{dx} = \frac{df_1}{df_2}\frac{df_2}{df_3}...\frac{df_n}{dx}$

---

### 1.2. Second order

Given : $f(g(x))$

$\frac{d^2f}{dx^2} = \frac{d^2f}{dg^2}\cdot(\frac{dg}{dx})^2 + \frac{df}{dg}\cdot\frac{d^2g}{dx^2}$

Because

$\frac{d^2f}{dx^2} = \frac{d}{dx}\{\frac{df}{dg}\frac{dg}{dx}\}$

Note : $\frac{d}{dx}\{u(x)\cdot v(x)\} = \frac{d}{dx}u(x)\cdot v(x) + u(x)\cdot \frac{d}{dx}v(x)$

$= \frac{d}{dx}\{\frac{df}{dg}\}\cdot\frac{dg}{dx} + \frac{df}{dg}\cdot \frac{d}{dx}\{\frac{dg}{dx}\}$

$= \frac{d^2f}{dg^2}\cdot\frac{dg}{dx} \cdot\frac{dg}{dx} + \frac{df}{dg}\cdot \frac{d^2g}{dx^2}$

$= \frac{d^2f}{dg^2}\cdot(\frac{dg}{dx})^2 + \frac{df}{dg}\cdot\frac{d^2g}{dx^2}$

---

### 1.3. Higher derivatives 

See [Wikipedia - Chain_rule#Higher_derivatives](https://en.wikipedia.org/wiki/Chain_rule#Higher_derivatives) and [Wikipedia - Faà di Bruno's formula](https://en.wikipedia.org/wiki/Fa%C3%A0_di_Bruno%27s_formula)

---

## 2. Multi-variables functions
### 2.1. Multi-functions of a single variable

Given : $f(g_1(x),g_2(x),...,g_n(x))$

$\frac{df}{dx} = \sum^n_{i=1} \frac{df}{dg_i}\frac{dg_i}{dx}$

**Example**

$y = x^2e^xsinx$

Let : $g_1(x) = x^2, \, g_2(x) = e^x, \, g_3(x) = sinx, \, f(g_1,g_2,g_3) = g_1g_2g_3$

$y' = \frac{df}{dg_1}\frac{dg_1}{dx} + \frac{df}{dg_2}\frac{dg_2}{dx} + \frac{df}{dg_3}\frac{dg_3}{dx}$

$=g_2g_3 2x + g_1g_3 e^x +  g_1g_2 cosx$

$=2xe^xsinx + x^2e^xsinx + x^2e^xcosx = xe^x(2sinx+xsinx+xcosx)$

---

### 2.2. Multi-functions of multi-variables

Given : $f(g_1(x_1,x_2,...,x_m),g_2(x_1,x_2,...,x_m),...,g_n(x_1,x_2,...,x_m))$

$\frac{\partial f}{\partial x_m} = \sum^n_{i=1} \frac{\partial f}{\partial g_i}\frac{\partial g_i}{\partial x_m}$

**Example**

$y = (x_1^2+x_2^2)sin(x_1x_2)$

Let : $g_1(x_1,x_2) = x_1^2+x_2^2, \, g_2(x_1,x_2) = sin(x_1x_2), \, f(g_1,g_2) = g_1g_2$

$\frac{\partial f}{\partial x_1} = \frac{\partial f}{\partial g_1}\frac{\partial g_1}{\partial x_1} + \frac{\partial f}{\partial g_2}\frac{\partial g_2}{\partial x_1}$

$= g_2 2x_1+g_1 x_2 cos(x_1x_2) = 2x_1sin(x_1x_2) + (x_1^2x_2+x_2^3)cos(x_1x_2)$

$\frac{\partial f}{\partial x_2} = \frac{\partial f}{\partial g_1}\frac{\partial g_1}{\partial x_2} + \frac{\partial f}{\partial g_2}\frac{\partial g_2}{\partial x_2}$

$= g_2 2x_2+g_1 x_2 cos(x_1x_2) = 2x_2sin(x_1x_2) + (x_1^3+x_1x_2^2)cos(x_1x_2)$

---

### 2.3. Chain Rule and Jacobian

Given: $g: (x_1,x_2,...,x_n) \mapsto (u_1,u_2,...,u_m), \,\, f: (u_1,u_2,...,u_m) \mapsto (y_1,y_2,...,y_k) \,\,$ then $\,\, f \circ g : (x_1,x_2,...,x_n) \mapsto (y_1,y_2,...,y_k)$

Let: $J_g$ as the Jacobian of the mapping $g$, $J_f$ as the Jacobian of the mapping $f$ then, the Jacobian of the mapping $f \circ g$ is $J = J_fJ_g$

where, $J$ is a $k \times n$ matrix, $J_f$ is a $k \times m$ matrix and $J_g$ is a $m \times n$ matrix.

$
J = 
\begin{bmatrix}
\frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & ... & \frac{\partial y_1}{\partial x_n} \\
\frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & ... & \frac{\partial y_2}{\partial x_n} \\
...  & ...  & ... & ...   \\
\frac{\partial y_k}{\partial x_1} & \frac{\partial y_k}{\partial x_2} & ... & \frac{\partial y_k}{\partial x_n} \\
\end{bmatrix}$
$\,\,\,\,\,\,\,\,\,\,
J_f = 
\begin{bmatrix}
\frac{\partial y_1}{\partial u_1} & \frac{\partial y_1}{\partial u_2} & ... & \frac{\partial y_1}{\partial u_m} \\
\frac{\partial y_2}{\partial u_1} & \frac{\partial y_2}{\partial u_2} & ... & \frac{\partial y_2}{\partial u_m} \\
...  & ...  & ... & ...   \\
\frac{\partial y_k}{\partial u_1} & \frac{\partial y_k}{\partial u_2} & ... & \frac{\partial y_k}{\partial u_m} \\
\end{bmatrix}$
$\,\,\,\,\,\,\,\,\,\,
J_g = 
\begin{bmatrix}
\frac{\partial u_1}{\partial x_1} & \frac{\partial u_1}{\partial x_2} & ... & \frac{\partial u_1}{\partial x_n} \\
\frac{\partial u_2}{\partial x_1} & \frac{\partial u_2}{\partial x_2} & ... & \frac{\partial u_2}{\partial x_n} \\
...  & ...  & ... & ...   \\
\frac{\partial u_m}{\partial x_1} & \frac{\partial u_m}{\partial x_2} & ... & \frac{\partial u_m}{\partial x_n} \\
\end{bmatrix}$

Here, $J = J_fJ_g$

$
\begin{bmatrix}
\frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & ... & \frac{\partial y_1}{\partial x_n} \\
\frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & ... & \frac{\partial y_2}{\partial x_n} \\
...  & ...  & ... & ...   \\
\frac{\partial y_k}{\partial x_1} & \frac{\partial y_k}{\partial x_2} & ... & \frac{\partial y_k}{\partial x_n} \\
\end{bmatrix}
= \begin{bmatrix}
\frac{\partial y_1}{\partial u_1} & \frac{\partial y_1}{\partial u_2} & ... & \frac{\partial y_1}{\partial u_m} \\
\frac{\partial y_2}{\partial u_1} & \frac{\partial y_2}{\partial u_2} & ... & \frac{\partial y_2}{\partial u_m} \\
...  & ...  & ... & ...   \\
\frac{\partial y_k}{\partial u_1} & \frac{\partial y_k}{\partial u_2} & ... & \frac{\partial y_k}{\partial u_m} \\
\end{bmatrix}
\begin{bmatrix}
\frac{\partial u_1}{\partial x_1} & \frac{\partial u_1}{\partial x_2} & ... & \frac{\partial u_1}{\partial x_n} \\
\frac{\partial u_2}{\partial x_1} & \frac{\partial u_2}{\partial x_2} & ... & \frac{\partial u_2}{\partial x_n} \\
...  & ...  & ... & ...   \\
\frac{\partial u_m}{\partial x_1} & \frac{\partial u_m}{\partial x_2} & ... & \frac{\partial u_m}{\partial x_n} \\
\end{bmatrix}$

