# Derivative

```{epigraph}
*"Do the difficult things while they are easy and do the great things while they are small. A journey of a thousand miles must begin with a single step."*

-- Lao Tzu
```

```{seealso}
- [Real number](https://www.britannica.com/science/real-number)
- [Real number](https://en.wikipedia.org/wiki/Real_number)
```

The derivative of a univariable function at a point $a$ is defined as $f'(a)=\lim_{h\rightarrow 0}\frac{f(a+h)-f(a)}{h}$. This definition cannot be generalized to multivariable cases because the division $\frac{f(a+h)-f(a)}{h}$ by a vector $h$ is not defined.

````{prf:definition} differetiable function
:label: differentiable_function

A continuous mapping $f: A \rightarrow \mathbb{R}^m$ from $A\subset \mathbb{R}^n$ to $\mathbb{R}^m$ is differentiable at an interial point $a\in A^{\mathrm{o}}$, if there exists a linear mapping $T_a: \mathbb{R}^n \rightarrow \mathbb{R}^m$ satisfying the condition

$$
f(a+h)-f(a)-T_a(h) = o(h)
$$
````
The linear map $T_a$ is called the **derivative** of $f$ at $a$, written $D f_a$ or $(D f)_a$. The matrix of the linear map $D f_a$ is written $f^{\prime}(a)$ and is called the **Jacobian matrix** of $f$ at $a$.

The derivative of a multivariable function $f(\vec{x})$ at a point $a$ is a linear map which can approximate the function $f(\vec{x})$ at the point $a$. The entries of the matrix of the linear map are partial derivatives $f_i'(x)$ which is defined below.

- The linear map $T_a$, if exists, is unique.

````{prf:definition} partial derivative
:label: partial_derivative

Let $A$ be a subset of $\mathbb{R}^n$, The continuous mapping $f$ : $A \rightarrow \mathbb{R}$ is differentiable at an interior point $a\in A^{\mathrm{o}}$. Fix $j \in\{1, \ldots, n\}$. Define

$$
\varphi(t)=f\left(a_1, \ldots, a_{j-1}, t, a_{j+1}, \ldots, a_n\right) \text { for } t \text { near } a_j .
$$
Then the $j$ th partial derivative of $f$ at $a$ is defined as

$$
D_j f(a)=\varphi^{\prime}\left(a_j\right)
$$
````

If the function $f$ is differentiable at the point $a$, then all partial derivatives of $f$ at the point $a$ exist. However, if all partial derivatives exist, the funtion may not be differentiable. But if all partial derivatives exist and continuous at the point $a$, then the function is differentiable at the point $a$.

With this definition, we can check if a function is differentiable, and if a function is differentiable we can calculate its derivative. We begin with two examples to demonstrate how to find the derivative of a differentiable function.

- Let $C: A \rightarrow \mathbb{R}^m$ (where $A \subset \mathbb{R}^n$ be the constant mapping $C(x)=c$ for all $x \in A$, where $c$ is some fixed value in $\mathbb{R}^m$. Then the derivative of $C$ at any interior point a of $A$ is the zero mapping.
- The derivative of a linear mapping $T: \mathbb{R}^n \rightarrow \mathbb{R}^m$ at any point $a \in \mathbb{R}^n$ is again $T$.

Since the derivatives are the linear maps, they have the linearity property, i.e., $D(f+g)_a = Df_a+Dg_a$ and $D(cf)_a=cDf_a$ as described in the following theorem.

````{prf:theorem} linearity of the derivative
:label: linearity_derivative_theorem 

Let $f: A \rightarrow \mathbb{R}^m$ (where $A \subset \mathbb{R}^n$ ) and $g: B \rightarrow \mathbb{R}^m$ (where $B \subset \mathbb{R}^n$ ) be mappings. Suppose that $f$ and $g$ are differentiable at $a\in A\cap B$ with derivatives $D f_a$ and $D g_a$. Then

1. The sum $f+g: A \cap B \longrightarrow \mathbb{R}^m$ is differentiable at a with derivative $D(f+g)_a=D f_a+D g_a$.
2. For any $\alpha \in \mathbb{R}$, the scalar multiple $\alpha f: A \longrightarrow \mathbb{R}^m$ is differentiable at a with derivative $D(\alpha f)_a=\alpha D f_a$.
````
In univariable calculus, the derivative of the product of two functions is $(fg)'=f'g+g'f$. Here is the multivariable version.

````{prf:theorem} derivatives of the product 
:label: product derivative 
If two functions $f: A \rightarrow \mathbb{R}$ (where $A \subset \mathbb{R}^n$) and $g: B \rightarrow \mathbb{R}$ where $B \subset \mathbb{R}^n$ are differentiable at $a$, then $fg$ is differentiable at $a$ with derivative

$$
D(f g)_a=f(a) D g_a+g(a) D f_a
$$

If $g(a) \neq 0$, then $f / g$ is differentiable at $a$ with derivative

$$
D\left(\frac{f}{g}\right)_a=\frac{g(a) D f_a-f(a) D g_a}{g(a)^2}
$$
````

The most important theorem for calculating multivariable derivatives is the chain rule.

````{prf:theorem} multivariable chain rule
:label: multivariable_derivative_chain_rule

If a continuous map $f: A \longrightarrow \mathbb{R}^m$ (where $\left.A \subset \mathbb{R}^n\right)$ is differentiable at the point $a \in A$, and another continuous mapping $g: B \rightarrow \mathbb{R}^{\ell}$ is differentiable at the point $f(a) \in B \subset \mathbb{R}^m$, then the composition $g \circ f$ is differentiable at the point $a$, and its derivative is

$$D(g \circ f)_a=D g_{f(a)} \circ D f_a .$$

In terms of Jacobian matrices, since the matrix of a composition is the product of the matrices, the Chain Rule is

$$
(g \circ f)^{\prime}(a)=g^{\prime}(f(a)) f^{\prime}(a) .
$$

````

## Extreme values
````{prf:theorem} Multivariable Critical Point Theorem
Suppose that the function $f: A \rightarrow \mathbb{R}$ takes an extreme value at the point $a\in A\subset \mathbb{R}^n$. If $f$ is differentiable at $a$, then $f'=\vec{0}$.
````

````{prf:definition} second derivative matrix
:label: Second Derivative Matrix 
Let $f: A \longrightarrow \mathbb{R}$ (where $A \subset \mathbb{R}^n$ ) be a function and let a be an interior point of $A$. The second derivative matrix of $f$ at $a$ is the $n$-by-n matrix whose $(i, j)$ th entry is the second order partial derivative $D_{i j} f(a)$. Thus

$$
f^{\prime \prime}(a)=\left[\begin{array}{ccc}
D_{11} f(a) & \cdots & D_{1 n} f(a) \\
\vdots & \ddots & \vdots \\
D_{n 1} f(a) & \cdots & D_{n n} f(a)
\end{array}\right]
$$
````

````{prf:proposition} Two-variable $\operatorname{Max} / \min$ Test
Let $f: A \longrightarrow \mathbb{R}$ (where $\left.A \subset \mathbb{R}^2\right)$ be $\mathcal{C}^2$ on its interior points. Let $(a, b)$ be an interior point
(1) If $\alpha>0$ and $\alpha \delta-\beta^2>0$ then $f(a, b)$ is a local minimum.
(2) If $\alpha<0$ and $\alpha \delta-\beta^2>0$ then $f(a, b)$ is a local maximum.
(3) If $\alpha \delta-\beta^2<0$ then $f(a, b)$ is a saddle point.
````

````{prf:definition} characteristic polynomial
:label: characteristic_polynomial
Let $M$ be an $n$-by-$n$ matrix. Its characteristic polynomial is $p_M(\lambda)=\operatorname{det}(M-\lambda I)$.
````

```{note}
The characteristic polynomial of $M$ is a polynomial of degree $n$ in the scalar variable $\lambda$.
```

````{prf:theorem} Definite/Indefinite
Let $M$ be a symmetric matrix in $\mathrm{M}_n(\mathbb{R})$. Then
1. $M$ is positive definite if and only if all the roots of $p_M(\lambda)$ are positive.
2. $M$ is negative definite if and only if all the roots of $p_M(\lambda)$ are negative.
3. $M$ is indefinite if and only if $p_M(\lambda)$ has positive roots and negative roots.
````

````{prf:proposition} General Max/min Test
Let $f: A \longrightarrow \mathbb{R}$ (where $\left.A \subset \mathbb{R}^n\right)$ be $\mathcal{C}^2$ on its interior points. Let a be an interior point of $A$, and suppose that $f^{\prime}(a)=0_n$. Let the second derivative matrix $f^{\prime \prime}(a)$ have characteristic polynomial $p(\lambda)$.
1. If all roots of $p(\lambda)$ are positive then $f(a)$ is a local minimum.
2. If all roots of $p(\lambda)$ are negative then $f(a)$ is a local maximum.
3. If $p(\lambda)$ has positive and negative roots then $f(a)$ is a saddle point.
````


## Directional derivatives
````{prf:definition} directional derivative
:label: directional_derivative
Let $f: A \longrightarrow \mathbb{R}$ (where $A \subset$ $\left.\mathbb{R}^n\right)$ be a function, let a be an interior point of $A$, and let $d \in \mathbb{R}^n$ be a unit vector. The directional derivative of $f$ at $a$ in the $d$ direction is

$$
D_d f(a)=\lim _{t \rightarrow 0} \frac{f(a+t d)-f(a)}{t},
$$
if this limit exists.
````


The derivative matrix $f^{\prime}(a)$ of a scalar-valued function $f$ at $a$ is often called the gradient of $f$ at $a$ and written $\nabla f(a)$. That is,

$$
\nabla f(a)=f^{\prime}(a)=\left[D_1 f(a), \ldots, D_n f(a)\right]
$$

````{prf:theorem} Directional Derivative and Gradient
Let the function $f: A \longrightarrow \mathbb{R}$ (where $A \subset \mathbb{R}^n$ ) be differentiable at $a$, and let $d \in \mathbb{R}^n$ be a unit vector. Then the directional derivative of $f$ at $a$ in the d direction exists, and it is equal to

$$
\begin{aligned}
D_d f(a) &=\sum_{j=1}^n D_j f(a) d_j \\
&=\langle\nabla f(a), d\rangle \\
&=|\nabla f(a)| \cos \theta_{\nabla f(a), d} .
\end{aligned}
$$
````

- The rate of increase of $f$ at a in the $d$ direction varies with $d$, from $-|\nabla f(a)|$ when $d$ points in the direction opposite to $\nabla f(a)$, to $|\nabla f(a)|$ when $d$ points in the same direction as $\nabla f(a)$.
- In particular, the vector $\nabla f(a)$ points in the direction of greatest increase of $f$ at a, and its modulus $|\nabla f(a)|$ is precisely this greatest rate.
- Also, the directions orthogonal to $\nabla f(a)$ are the directions in which $f$ neither increases nor decreases at a.

## Integral curves

If a function $f: \mathbb{R}^n \longrightarrow \mathbb{R}$ has a continuous gradient, then from any starting point $a \in \mathbb{R}^n$ where the gradient $\nabla f(a)$ is nonzero, there is a path of steepest ascent of $f$ (called an **integral curve** of $\nabla f$ ) starting at $a$. 

If $n=2$ and the graph of $f$ is seen as a surface in 3-space, then the integral curve from the point $(a, b) \in \mathbb{R}^2$ is the shadow of the path followed by a particle climbing the graph, starting at $(a, b, f(a, b))$. If $n=2$ or $n=3$ and $f$ is viewed as temperature, then the integral curve is the path followed by a heat-seeking bug.

To find the integral curve, we set up an equation that describes it. The idea is to treat the gradient vector as a divining rod and follow it starting at $a$. Doing so produces a path in $\mathbb{R}^n$ that describes time-dependent motion, always in the direction of the gradient, and always with speed equal to the modulus of the gradient. Computing the path amounts to finding an interval $I \subset \mathbb{R}$ containing 0 and $a$ mapping

$$
\gamma: I \rightarrow \mathbb{R}^n
$$

that satisfies the differential equation with initial conditions $\gamma^{\prime}(t)=\nabla f(\gamma(t)), \quad \gamma(0)=a$

## Inverse function theorem

````{prf:theorem} inverse function theorem
:label: inverse_function_theorem

Let $f: A \rightarrow \mathbb{R}^n$ (where $\left.A \subset \mathbb{R}^n\right)$ be a mapping, let $a$ be an interior point of $A$, and let $f$ be continuously differentiable on some $\varepsilon$-ball about a. (Continuous differentiability on a ball means first that the derivative mapping $D f_x$ exists for each $x$ in the ball, and second that the entries of the derivative matrix $f^{\prime}(x)$, i.e., the partial derivatives $D_j f_i(x)$, are continuous functions of $x$ on the ball.) Suppose that $\operatorname{det} f^{\prime}(a) \neq 0$. Then there is an open set $V \subset A$ containing $a$ and an open set $W \subset \mathbb{R}^n$ containing $f(a)$ such that $f: V \rightarrow W$ has a continuously differentiable inverse $f^{-1}: W \rightarrow V$. For each $y=f(x) \in W$, the derivative of the inverse is the inverse of the derivative,

$$
D\left(f^{-1}\right)_y=\left(D f_x\right)^{-1} .
$$

````

## Implicit function theorem

````{prf:theorem} implicit funciton theorem
:label: implicit_function_theorem
Let $c$ and $n$ be positive integers with $n>c$, and let $r=n-c$. Let $A$ be an open subset of $\mathbb{R}^n$, and let $g: A \longrightarrow \mathbb{R}^c$ have continuous partial derivatives at every point of $A$. Consider the level set

$$
L=\left\{v \in A: g(v)=0_c\right\} .
$$
Let $p$ be a point of $L$, i.e., let $g(p)=0_c$. Let $p=(a, b)$ where $a \in \mathbb{R}^r$ and $b \in \mathbb{R}^c$, and let $g^{\prime}(p)=\left[\begin{array}{ll}M & N\end{array}\right]$ where $M$ is the left c-by-r submatrix and $N$ is the remaining right square c-by-c submatrix.

If $\operatorname{det} N \neq 0$ then the level set $L$ is locally a graph near $p$. That is, the condition $g(x, y)=\mathbf{0}_c$ for $(x, y)$ near $(a, b)$ implicitly defines $y$ as a function $y=\varphi(x)$ where $\varphi$ takes $r$-vectors near a to $c$-vectors near $b$, and in particular $\varphi(a)=b$. The function $\varphi$ is differentiable at a with derivative matrix

$$
\varphi^{\prime}(a)=-N^{-1} M .
$$
Hence $\varphi$ is well approximated near a by its affine approximation,

$$
\varphi(a+h) \approx b-N^{-1} M h .
$$
````

## Lagrange multiplier
We aim to find maxima and minima of a function $f(x, y)$ in the presence of a constraint $g(x, y)=$ 0 . A necessary condition for a critical point is that the gradients of $f$ and $g$ are parallel because otherwise the we can move along the curve $g$ and increase $f$. The directional derivative of $f$ in the direction tangent to the level curve is zero if and only if the tangent vector to $g$ is perpendicular to the gradient of $f$ or if there is no tangent vector.

````{prf:definition} Lagrange multiplier
:label: Lagrange_multiplier

The system of equations $\nabla f(x, y)=\lambda \nabla g(x, y), g(x, y)=0$ for the three unknowns $x, y, \lambda$ are called Lagrange equations. The variable $\lambda$ is a Lagrange multiplier.
````

Lagrange theorem: Extrema of $f(x, y)$ on the curve $g(x, y)=c$ are either solutions of the Lagrange equations or critical points of $g$.