# Table of Contents

1. [Partial Derivative](#Partial-Derivative)
2. [Second Partial Derivative](#Second-Partial-Derivative)
3. [Gradient](#Gradient)
4. [Directional Derivatives](#Directional-Derivatives)
5. [Derivatives of Vector-Valued Functions](#Derivatives-of-Vector-Valued-Functions)
6. [Curvature](#Curvature)
7. [Multivariable Chain Rule](#Multivariable-Chain-Rule)
8. [Partial Derivatives of Parametric Surfaces](#Partial-Derivatives-of-Parametric-Surfaces)
9. [Divergence](#Divergence)
10. [2D-Curl](#2D-Curl)
11. [Curl](#Curl)
12. [Laplacian](#Laplacian)
13. [Jacobian](#Jacobian)
14. [Tangent Planes](#Tangent-Planes)
15. [Local Linearization](#Local-Linearization)
16. [Hessian Matrix](#Hessian-Matrix)
17. [Quadratic Approximation](#Quadratic-Approximation)
18. [Hyperplane](#Hyperplane)
19. [Second Partial Derivative Test](#Second-Partial-Derivative-Test)
20. [Constrained Optimization](#Constrained-Optimization)

# Partial Derivative

Given a function $f$:
> $\displaystyle f(x,y) = x^2y$

The partial derivatives of $f$ are:
> $\displaystyle \frac{\partial\;{f}}{\partial\;{x}} = 2xy$
>
> $\displaystyle \frac{\partial\;{f}}{\partial\;{y}} = x^2$

$\partial$ is called **del** or **partial**.

Alternate notation:

> $\displaystyle f_x \leftrightarrow \frac{\partial\;f}{\partial\;x}$
>
> $\displaystyle f_y \leftrightarrow \frac{\partial\;f}{\partial\;y}$

# Second Partial Derivative

Given a function $f$:
> $\displaystyle f(x,y) = x^2 y^3$

The partial derivatives of $f$ are:
> $\displaystyle \frac{\partial\;f}{\partial\;x} = 2xy^3$
>
> $\displaystyle \frac{\partial\;f}{\partial\;y} = 3x^{2}y^{2}$

The second partial derivatives of $f$ are:
> $\displaystyle \frac{\partial}{\partial{x}}\left(\frac{\partial{f}}{\partial{x}}\right) = \frac{\partial^{2}f}{\partial{x^2}} = 2y^3$
>
> $\displaystyle \frac{\partial}{\partial{x}}\left(\frac{\partial{f}}{\partial{y}}\right) = \frac{\partial^{2}f}{\partial{x}\partial{y}} = 6xy^2$
>
> $\displaystyle \frac{\partial}{\partial{y}}\left(\frac{\partial{f}}{\partial{x}}\right) = \frac{\partial^{2}f}{\partial{y}\partial{x}} = 6xy^2$
>
> $\displaystyle \frac{\partial}{\partial{y}}\left(\frac{\partial{f}}{\partial{y}}\right) = \frac{\partial^{2}f}{\partial{y^2}} = 6x^{2}y$
>
> The order of the derivations is **right to left**.

Alternate notation:
> $\displaystyle (f_x)_x = f_{xx} \leftrightarrow \frac{\partial^{2}f}{\partial{x^2}}$
>
> $\displaystyle (f_y)_x = f_{yx} \leftrightarrow \frac{\partial^{2}f}{\partial{x}\partial{y}}$
>
> $\displaystyle (f_x)_y = f_{xy} \leftrightarrow \frac{\partial^{2}f}{\partial{y}\partial{x}}$
>
> $\displaystyle (f_y)_y = f_{yy} \leftrightarrow \frac{\partial^{2}f}{\partial{y^2}}$
>
> The order of the derivations is **left to right**.

$f_{xy}$ or $f_{yx}$ are called **mixed partial derivatives**.

If the second partial derivatives are continuous at a point, the **symmetry** will apply:
> $\displaystyle \frac{\partial^{2}f}{\partial{x}\partial{y}} = \frac{\partial^{2}f}{\partial{y}\partial{x}}$

## Gradient

The gradient of a **scalar-valued multivariable function $f(x,y,\dots)$**, denoted $\nabla{f}$ packages all its partial derivatives information into a **vector**:
> $\displaystyle \nabla{f} = \begin{bmatrix}
\frac{\partial f}{\partial x} \\
\frac{\partial f}{\partial y} \\
\vdots \\
\end{bmatrix}$
>
> $\nabla{f}$ is a **vector-valued** function.

$\nabla$ is called **nabla** or **del**.

The gradient of $f$, if evaluated at an input $(x_0,y_0)$, points in the direction of **steepest ascent**.

The **magnitude** of $\nabla{f(x_0,y_0)}$ tells you what the **slope** of the hill is in that direction.

These gradient vectors $\nabla{f(x_0,y_0)}$ are **perpendicular to the contour lines** of $f$.

# Directional Derivatives

If you have some **multivariable function $f(x, y)$** and some **vector** in the function's input space $\vec{\mathbf{v}}$,
the directional derivative of $f$ along $\vec{\mathbf{v}}$ tells you the rate at which $f$ will change while the input moves with velocity vector $\vec{\mathbf{v}}$.

The notation here is $\nabla_{\vec{\mathbf{v}}}\;f$, and it is computed by taking the dot product between the gradient of $f$ and the vector $\vec{\mathbf{v}}$, that is $\nabla{f}\cdot\vec{\mathbf{v}}$:
> $\displaystyle \vec{\mathbf{v}} = \begin{bmatrix}
2 \\
3 \\
-1 \end{bmatrix}$
>
> $\displaystyle \nabla f = \begin{bmatrix}
\frac{\partial f}{\partial x} \\
\frac{\partial f}{\partial y} \\
\frac{\partial f}{\partial z} \end{bmatrix}$
>
> $\displaystyle \nabla_{\vec{\mathbf{v}}}\;f = \nabla{f}\cdot\vec{\mathbf{v}} = 2\frac{\partial f}{\partial x} + 3\frac{\partial f}{\partial y} + (-1)\frac{\partial f}{\partial z}$

When the directional derivative is used to compute the **slope**, be sure to **normalize** the vector $\vec{\mathbf{v}}$ first:
> $\displaystyle \frac{\nabla{f}\cdot\vec{\mathbf{v}}}{\lVert\mathbf{v}\rVert}$

Alternate notations:
- $\nabla_{\vec{\mathbf{v}}}\;f$
- $\displaystyle \frac{\partial\;{f}}{\partial\;{\vec{\mathbf{v}}}}$
- $f'_{\vec{\mathbf{v}}}$
- $D_{\vec{\mathbf{v}}}\;f$
- $\partial_{\vec{\mathbf{v}}}\;f$

# Derivatives of Vector-Valued Functions

To take the derivative of a **vector-valued** function, take the derivative of each component:
> $\displaystyle \vec{\mathbf{s}}\;(t) = \begin{bmatrix}
x(t) \\
y(t) \end{bmatrix}$
>
> $\displaystyle \vec{\mathbf{s}}\;'\;(t) = \begin{bmatrix}
x\,'(t) \\
y\,'(t) \end{bmatrix}$
>
> $\displaystyle \frac{d}{dt}\vec{\mathbf{s}} = \begin{bmatrix}
\frac{d}{dt}x(t) \\
\frac{d}{dt}y(t) \end{bmatrix}$

If you interpret the initial function as giving the position of a particle as a function of time, the derivative gives the **velocity vector** of that particle as a function of time.

# Curvature

The radius of curvature at a point on a curve is, loosely speaking, the radius of a circle which fits the curve most snugly at that point.

The **curvature**, denoted $\kappa$, is one divided by the radius of curvature:
> $\displaystyle \kappa = \frac{1}{R}$

To find the curvature given the **parametric function $\vec{\mathbf{s}}$ defining a curve**:

- Find the **unit tangent vector** $T$ by normalizing the derivative of $\vec{\mathbf{s}}$:
> $\displaystyle T(t) = \frac{\vec{\mathbf{s}}\,'(t)}{\lVert\vec{\mathbf{s}}\,'(t)\rVert}$
- Curvature is defined as the magnitude of the derivative of this value with respect to arc length $s$. You can compute that as follows:
> $\displaystyle \kappa = \left\|\frac{dT}{ds}\right\| = \frac{\left\|\frac{dT}{dt}\right\|}{\left\|\frac{d\vec{\mathbf{s}}}{dt}\right\|}$

The intuition here is that the unit tangent vector tells you which direction you are moving, and the rate at which it changes with respect to small steps $ds$ along the curve is a good indication of how quickly you are turning.

# Multivariable Chain Rule

Given a multivariable function $f(x, y)$ and two single variable functions $x(t)$ and $y(t)$, here's what the multivariable chain rule says:
> $\displaystyle \frac{d}{dt}\,f(x(t),y(t)) = \frac{\partial\,f}{\partial\,x}\frac{dx}{dt} + \frac{\partial\,f}{\partial\,y}\frac{dy}{dt}$

Written with vector notation, where $\displaystyle \vec{\mathbf{v}} = \begin{bmatrix}x(t)\\y(t)\end{bmatrix}$, this rule has a very elegant form in terms of the **gradient** of $f$ and the **vector-derivative** of $\vec{\mathbf{v}}(t)$:
> $\displaystyle \frac{d}{dt}\,f(\vec{\mathbf{v}}(t)) = \nabla{f}\cdot\vec{\mathbf{v}}\,'(t)$

# Partial Derivatives of Parametric Surfaces

As setup, we have some vector-valued function with a two-dimensional input and a three-dimensional output:
> $\displaystyle \vec{\mathbf{v}}(s,t) = \begin{bmatrix}
x(s,t)\\
y(s,t)\\
z(s,t)
\end{bmatrix}$

Its partial derivatives are computed by taking the partial derivative of each component:
> $\displaystyle \frac{\partial\,\vec{\mathbf{v}}(s,t)}{\partial{t}} = \begin{bmatrix}
\frac{\partial}{\partial{t}}x(s,t)\\
\frac{\partial}{\partial{t}}y(s,t)\\
\frac{\partial}{\partial{t}}z(s,t)
\end{bmatrix}$
>
> $\displaystyle \frac{\partial\,\vec{\mathbf{v}}(s,t)}{\partial{s}} = \begin{bmatrix}
\frac{\partial}{\partial{s}}x(s,t)\\
\frac{\partial}{\partial{s}}y(s,t)\\
\frac{\partial}{\partial{s}}z(s,t)
\end{bmatrix}$

You can interpret these partial derivatives as giving vectors tangent to the parametric surface defined by $\vec{\mathbf{v}}$.

# Divergence

Interpret a **vector field** as representing a fluid flow.

The divergence is an operator, which takes in the **vector-valued** function defining this vector field, and outputs a **scalar-valued** function measuring the **change in density** of the fluid at each point:
> $\displaystyle \vec{\mathbf{v}}(x_1,\dots,x_n) = \begin{bmatrix}
v_1(x_1,\dots,x_n) \\
\vdots \\
v_n(x_1,\dots,x_n) \\
\end{bmatrix}$
>
> $\displaystyle div\;\vec{\mathbf{v}} = \nabla\cdot\vec{\mathbf{v}} = \begin{bmatrix}
\frac{\partial}{\partial\,x_1} \\
\vdots \\
\frac{\partial}{\partial\,x_n} \\
\end{bmatrix} \cdot \begin{bmatrix}
v_1 \\
\vdots \\
v_n \\
\end{bmatrix} = \frac{\partial\,v_1}{\partial\,x_1} + \cdots + \frac{\partial\,v_n}{\partial\,x_n}$

If the divergence at a point is **positive**, this means a fluid flowing along the vector field would tend to become **more dense** at that point:
> $\displaystyle \vec{\mathbf{v}}(x_1,\dots,x_n) > 0$

If the divergence at a point is **negative**, this means a fluid flowing along the vector field would tend to become **less dense** at that point:
> $\displaystyle \vec{\mathbf{v}}(x_1,\dots,x_n) < 0$

If the divergence at a point is **zero**, this means a fluid flowing along the vector field has a **constant density** at that point:
> $\displaystyle \vec{\mathbf{v}}(x_1,\dots,x_n) = 0$

A vector field with an **overall constant density** is called **divergence-free**.

# 2D-Curl


Curl measures the **rotation** in a **vector field**.

In two dimensions, if a **vector field** is given by a function $\displaystyle \vec{\mathbf{v}}(x,y) = \begin{bmatrix}v_1(x,y)\\v_2(x,y)\end{bmatrix}$, this rotation is given by the formula:
> $\displaystyle \text{2d-curl}\;\vec{\mathbf{v}} = \frac{\partial\,v_2}{\partial\,x} - \frac{\partial\,v_1}{\partial\,y}$

If the curl at a point is **positive** it indicates a general tendency to rotate **counterclockwise** around that point:
> $\displaystyle \text{2d-curl}\;\vec{\mathbf{v}} > 0$

If the curl at a point is **negative** it indicates a general tendency to rotate **clockwise** around that point:
> $\displaystyle \text{2d-curl}\;\vec{\mathbf{v}} < 0$

If the curl at a point is **zero**, there is no rotation in the fluid around that point:
> $\displaystyle \text{2d-curl}\;\vec{\mathbf{v}} = 0$

If an object is rotating in **two dimensions**, you can describe the rotation completely with a single number: the **angular velocity**. A positive angular velocity indicates a counter-clockwise rotation while a negative number indicates a clockwise rotation. The absolute value of the angular velocity gives the speed of rotation, typically in radians per second.

The curl formula gives precisely **twice the angular velocity** of the fluid near a point.

# Curl

Curl is an operator which takes in a function representing a **three-dimensional vector field**, and gives another function representing a different **three-dimensional vector field**.

**Curl itself only applies to three-dimensional vector fields**.

If a fluid flows in **three-dimensional space** along a vector field, the **rotation of that fluid around each point**, represented as a **vector**, is given by the curl of the original vector field evaluated at that point. The curl vector field **should be scaled by a half** if you want the magnitude of curl vectors to equal the **rotational speed** of the fluid.

If a **three-dimensional vector-valued** function $\vec{\mathbf{v}}(x,y,z)$ has component functions $\vec{\mathbf{v_1}}(x,y,z)$, $\vec{\mathbf{v_2}}(x,y,z)$ and $\vec{\mathbf{v_3}}(x,y,z)$ the curl is computed as follows:
> $\displaystyle \begin{align*}
\text{curl}\;\vec{\mathbf{v}} &= \begin{bmatrix}
\frac{\partial}{\partial\,x} \\
\frac{\partial}{\partial\,y} \\
\frac{\partial}{\partial\,z} \\
\end{bmatrix} \times \begin{bmatrix}
v_1(x,y,z) \\
v_2(x,y,z) \\
v_3(x,y,z) \\
\end{bmatrix} \\
&= \det\left(\begin{bmatrix}
\vec{\mathbf{i}} & \vec{\mathbf{j}} & \vec{\mathbf{k}} \\
\frac{\partial}{\partial\,x} & \frac{\partial}{\partial\,y} & \frac{\partial}{\partial\,z} \\
v_1 & v_2 & v_3 \\
\end{bmatrix}\right) \\
&= \nabla\times\vec{\mathbf{v}} = \left(\frac{\partial\,v_3}{\partial\,y} - \frac{\partial\,v_2}{\partial\,z}\right)\vec{\mathbf{i}} + \left(\frac{\partial\,v_1}{\partial\,z} - \frac{\partial\,v_3}{\partial\,x}\right)\vec{\mathbf{j}} + \left(\frac{\partial\,v_2}{\partial\,x} - \frac{\partial\,v_1}{\partial\,y}\right)\vec{\mathbf{k}}\\
\end{align*}$

Rotation in **three dimensions** is typically described using a single vector. The magnitude of the vector indicates the angular speed, and the **direction** is determined by a super-important convention called the **right-hand rule**.

The **magnitude** of the curl vector at a point is equal to **twice the angular speed** of the fluid near that point.


# Laplacian

Given a **scalar-valued multivariable** function $f(x,y)$, the Laplacian is a new **scalar-valued** function computing the **divergence of the gradient** of $f$:
> $\displaystyle \begin{align*}
\triangle\,f(x_1,\dots,x_N) &= \text{div}\,\nabla\,f = \nabla\cdot\nabla\,f \\
&= \begin{bmatrix}
\frac{\partial}{\partial\,x_1} \\
\vdots \\
\frac{\partial}{\partial\,x_N}
\end{bmatrix} \cdot \begin{bmatrix}
\frac{\partial\,f}{\partial\,x_1} \\
\vdots \\
\frac{\partial\,f}{\partial\,x_N}
\end{bmatrix} \\
&= \frac{\partial^{2}\,f}{\partial\,x_1^2} + \dots + \frac{\partial^{2}\,f}{\partial\,x_N^2}\\
&= \sum_{i=1}^{N}\frac{\partial^{2}\,f}{\partial\,x_i^2}
\end{align*}$

The Laplacian is **high** at **minimum points**:
> $\displaystyle \triangle\,f(x,y) > 0$

The Laplacian is **low** at **maximum points**.
> $\displaystyle \triangle\,f(x,y) < 0$

The Laplacian is **analog to the second derivative** for scalar-valued multivariable functions.

An **Harmonic function** $f$ is defined in terms of the Laplacian, where for each point $(x,y)$:
> $\triangle\,f(x,y) = 0$.

# Jacobian

The **Jacobian matrix** is the matrix of all **first-order partial derivatives** of a **vector-valued function**:
> $\displaystyle \mathbf{f}(x_1,\dots,x_n) = \begin{bmatrix}
f_1(x_1,\dots,x_n) \\
\vdots \\
f_n(x_1,\dots,x_n) \\
\end{bmatrix}$
>
> $\displaystyle \mathbf{J} = \begin{bmatrix}
\frac{\partial\,\mathbf{f}}{\partial\,x_1} & \cdots & \frac{\partial\,\mathbf{f}}{\partial\,x_n}\\
\end{bmatrix} = \begin{bmatrix}
\frac{\partial\,f_1}{\partial\,x_1} & \cdots & \frac{\partial\,f_1}{\partial\,x_n} \\
\vdots & \ddots & \vdots \\
\frac{\partial\,f_m}{\partial\,x_1} & \cdots & \frac{\partial\,f_1}{\partial\,x_n} \\
\end{bmatrix}$

The **Jacobian determinant** at a given point gives important information about the behavior of $\mathbf{f}$ near that point.

If the Jacobian determinant at a point is **positive**, then $\mathbf{f}$ **preserves orientation** near that point:
> $\det \mathbf{J} > 0$

If the Jacobian determinant at a point is **negative**, $\mathbf{f}$ **reverses orientation**:
> $\det \mathbf{J} < 0$

The **absolute value** of the Jacobian determinant at a point gives us the factor by which the function $\mathbf{f}$ **expands or shrinks** volumes near that point. 

# Tangent Planes

The equation for the tangent plane of the graph of a **two-variable** **scalar-valued** function $f(x, y)$ at a particular point $(x_0,y_0)$ looks like this:
> $\displaystyle T(x,y) = f(x_0,y_0) + \frac{\partial\,f(x_0,y_0)}{\partial\,x}(x - x_0) + \frac{\partial\,f(x_0,y_0)}{\partial\,y}(y - y_0)$

# Local Linearization

Local linearization generalizes the idea of tangent planes to any multivariable function.

The idea is to **approximate a function** near one of its inputs with a **simpler affine function** that has:
- the **same value at that input**.
- the **same partial derivative values**.

Written with **vectors**, here's what the approximation function looks like:
> $\displaystyle \mathbf{x} = \begin{bmatrix}x \\ y \\ z \\ \vdots\end{bmatrix}$
>
> $\displaystyle \mathbf{x}_0 = \begin{bmatrix}x_0 \\ y_0 \\ z_0 \\ \vdots\end{bmatrix}$
>
> $\displaystyle L_f(\mathbf{x}) = \underbrace{f(\mathbf{x}_0)}_{\text{Constant}} + \underbrace{\nabla\,f(\mathbf{x}_0)}_{\text{Constant vector}}\cdot\overbrace{(\mathbf{x} - \mathbf{x}_0)}^{\mathbf{x}\text{ is the variable}}$

# Hessian Matrix

The **Hessian matrix** of a **multivariable function** $f(x,y,z,\dots)$, which different authors write as $\textbf{H}\,(f)$, $\textbf{H}\,f$ or $\textbf{H}_f$ organizes **all second partial derivatives** into a matrix:
> $\displaystyle \textbf{H}\,f = \begin{bmatrix}
\frac{\partial^2\,f}{\partial{x^2}} & \frac{\partial^2\,f}{\partial{x}\partial{y}} & \frac{\partial^2\,f}{\partial{x}\partial{z}} & \cdots \\
\frac{\partial^2\,f}{\partial{y}\partial{x}} & \frac{\partial^2\,f}{\partial{y^2}} & \frac{\partial^2\,f}{\partial{y}\partial{z}} & \cdots \\
\frac{\partial^2\,f}{\partial{z}\partial{x}} & \frac{\partial^2\,f}{\partial{z}\partial{y}} & \frac{\partial^2\,f}{\partial{z^2}} & \cdots \\
\vdots & \vdots & \vdots & \ddots \\
\end{bmatrix}$

# Quadratic Approximation

The goal, as with a local linearization, is to **approximate** a potentially complicated **multivariable** function $f$ near some input, which I'll write as the vector $\textbf{x}_0$. A quadratic approximation does this more tightly than a local linearization, using the information given by **second partial derivatives**:
> $\displaystyle \mathbf{x} = \begin{bmatrix}x \\ y \\ z \\ \vdots\end{bmatrix}$
>
> $\displaystyle \mathbf{x}_0 = \begin{bmatrix}x_0 \\ y_0 \\ z_0 \\ \vdots\end{bmatrix}$
>
> $\displaystyle Q_f(\mathbf{x}) = \underbrace{f(\mathbf{x}_0)}_{\textbf{Constant}} + \underbrace{\nabla{f(\mathbf{x}_0})\cdot(\mathbf{x} - \mathbf{x}_0)}_{\textbf{Linear term}} + 
\underbrace{\frac{1}{2}(\mathbf{x} - \mathbf{x}_0)^\top\textbf{H}_f(\mathbf{x} -\mathbf{x}_0)}_{\textbf{Quadratic term}}$


# Hyperplane

The formal term for a subspace that has **one dimension less** than its ambient space is **hyperplane**. So, formally, the **gradient of a multivariable function** corresponds to a tangent hyperplane.

# Second Partial Derivative Test

Once you find a point where the **gradient of a multivariable function** is the **zero vector**($\nabla\,f(x_0,y_0) = \mathbf{0}$), meaning the tangent plane of the graph is flat at this point, the second partial derivative test is a way to tell if that **stable point** is a **local maximum**, **local minimum**, or a **saddle point**.
They key term of the second partial derivative test is this:
> $\displaystyle \begin{align*}
H &= \text{det}\,(\textbf{H}\,f(x_0,y_0)) = \text{det}\left(\begin{bmatrix}
f_{xx}(x_0,y_0) & f_{yx}(x_0,y_0) \\
f_{xy}(x_0,y_0) & f_{yy}(x_0,y_0)
\end{bmatrix}\right)\\
&= f_{xx}(x_0, y_0)f_{yy}(x_0, y_0) - f_{xy}(x_0, y_0)f_{yx}(x_0,y_0) \\
&= f_{xx}(x_0, y_0)f_{yy}(x_0, y_0) - f_{xy}(x_0, y_0)^2
\end{align*}$

If $H > 0$ the function definitely has a **local maximum/minimum** at the point $(x_0, y_0)$.

- If $f_{xx}(x_0, y_0) > 0$, it is a **minimum**.
- If $f_{xx}(x_0, y_0) < 0$, it is a **maximum**.

If $H < 0$, the function definitely has a **saddle point** at $(x_0, y_0)$.

If $H = 0$, there is not enough information to tell.



# Constrained Optimization

When you want to **maximize (or minimize) a multivariable function** $f(x, y, \dots)$ subject to the **constraint** that **another multivariable function equals a constant**, $g(x, y, \dots) = c$, follow these steps:

1. Introduce a new variable $\lambda$ and define a new function $\mathcal{L}$ as follows:
> $\mathcal{L}(x,y,\dots,\lambda) = f(x,y,\dots) - \lambda(g(x,y,\dots) - c)$
>
> This function $\mathcal{L}$ is called the **Lagrangian**, and the new variable $\lambda$ is referred to as a **Lagrange multiplier**.
2. Set the gradient of $\mathcal{L}$ equal to the zero vector:
> $\nabla\,\mathcal{L}(x,y,\dots,\lambda) = \mathbf{0}$
>
> In other words, find the **critical points**(or **stable points**) of $\mathcal{L}$.
3. Consider each solution, which will look something like $(x_0,y_0,\dots,\lambda_0)$. Remove the $\lambda_0$ then plug it into $f$, since $f$ does not have $\lambda$ as an input. Whichever one gives the greatest (or smallest) value is the maximum (or minimum) point you are seeking.

It's common to write the **maximizing critical point** which solves our constrained optimization problem, as $(x^*,y^*,\dots,\lambda^*$).

Let $M^*$ the **constrained maximum value** of $f$ as a function of $c$, the Lagrange multiplier $\lambda^*(c)$ gives the derivative of $M^*$:
> $\displaystyle \frac{d\,M^*}{d\,c}(c) = \lambda^*(c)$

This says that the Lagrange multiplier $\lambda^*$ gives **the rate of change of the solution** to the constrained maximization problem **as the constraint varies**.