Permalink
Switch branches/tags
Nothing to show
Find file
Fetching contributors…
Cannot retrieve contributors at this time
628 lines (587 sloc) 27.6 KB
\documentclass{article}
\usepackage{amsmath,amssymb,amsthm, graphicx, float, subfig, hyperref}
\begin{document}
\renewcommand{\arraystretch}{1.9}
\centerline{\sc \large Particle \& Collision Simulation}
\centerline{Course Notes from Fundamentals of Computer Animation Comp559 with Paul Kry.}
\section{Mathematical background}
\subsection{Multivariate/Vector Calculus Basics} First of all, we consider a
real-valued multivariate function $f$ defined as follows: $f:\mathbb{R}^n
\longrightarrow \mathbb{R}$. This function, which we can call a scalar field,
maps $n$ independent variables to one real-valued dependent variable. We say it is real-valued because elements in its range belong to $\mathbb{R}$, and multivariate (or ``of a vector variable'') because elements in its domain belong to $\mathbb{R}^n$. Here is one example:
\[
f(\mathbf{x}) = f(x_1, x_2, \dots, x_n) = \sum_{i=1}^n x_i
\]
Now consider the vector-valued multivariate function $\mathbf{f}:\mathbb{R}^n \longrightarrow
\mathbb{R}^m$. This function which maps $n$ independent variables to a vector of
$m$ dimensions.
\[
\mathbf{f}(\mathbf{x}) =
\begin{bmatrix}
f_1(\mathbf{x}) \\
f_2(\mathbf{x}) \\
\vdots \\
f_m(\mathbf{x})
\end{bmatrix}
=
\begin{bmatrix}
f_1(x_1, x_2, \dots, x_n) \\
f_2(x_1, x_2, \dots, x_n) \\
\vdots \\
f_m(x_1, x_2, \dots, x_n)
\end{bmatrix}
\]
This type of function is typically called a vector field, most commonly when $n=m=3$.
By the end of this section, our hope is that you will
understand this statement: \textit{``The Hessian of a scalar field is the
Jacobian of its gradient while the Laplacian of a scalar field is the divergence
of its gradient.''}
\subsection{Functions defined over scalar fields}
For simplicity, we will assume in the following sections that $m=n$.
Furthermore, we will be using the \textit{numerator-layout notation}.
\footnote{\url{http://en.wikipedia.org/wiki/Matrix\_calculus\#Layout\_conventions}}
\subsubsection{Gradient}
What is the gradient of $f$? It is the derivative of a scalar field; essentially
a derivative in higher dimensions. We first treat the nabla symbol $\nabla$ as
if it was a row-vector differential operator:
\[
\nabla =
\begin{bmatrix}
\dfrac{\partial }{\partial x_1} &
\dfrac{\partial }{\partial x_2} &
\hdots &
\dfrac{\partial }{\partial x_n}
\end{bmatrix}
\]
Therefore, the gradient of $\nabla f$ is a vector of its partial derivatives:
\[
\nabla f =
\begin{bmatrix}
\dfrac{\partial f}{\partial x_1} &
\dfrac{\partial f}{\partial x_2} &
\hdots &
\dfrac{\partial f}{\partial x_n}
\end{bmatrix}
\]
Applied over the whole domain of $f$, it yields a vector field where the vectors
point in the direction of the greatest rate of increase of the scalar field. As
such we can view $\nabla : (\mathbb{R}^n \rightarrow \mathbb{R})
\longrightarrow (\mathbb{R}^n \rightarrow \mathbb{R}^n) $ since it transforms a
scalar field into a vector field.
%TODO: FIELD NOTATION IS SKETCHY. C^k functions maybe? See wikipedia laplacian
%article.
\subsubsection{Hessian}
The Hessian $\nabla \otimes \nabla f$ or
$H(f)$ of a function $ f: \mathbb{R}^n \longrightarrow \mathbb{R} $ is the
square matrix of its second partial derivatives.\footnote{$\nabla \otimes \nabla$ is a non-standard notation. The Hessian is usually denoted $\nabla^2 f$ but I find this
very confusing since it would be indistinguishable from the Laplacian.}
\[
\nabla \otimes \nabla f =
\begin{bmatrix}
\frac{\partial^2 f}{\partial x_1^2} &
\frac{\partial^2 f}{\partial x_1\,\partial x_2} &
\cdots & \frac{\partial^2 f}{\partial x_1\,\partial x_n} \\
\frac{\partial^2 f}{\partial x_2\,\partial x_1}
& \frac{\partial^2 f}{\partial x_2^2} & &
\frac{\partial^2 f}{\partial x_2\,\partial x_n} \\
\vdots & & \ddots & \vdots \\
\frac{\partial^2 f}{\partial x_n\,\partial x_1}
& \frac{\partial^2 f}{\partial x_n\,\partial x_2} &
\cdots & \frac{\partial^2 f}{\partial x_n^2}
\end{bmatrix}
\]
While the gradient mapped a scalar field to a vector field, the Hessian maps a
scalar field to a a ``matrix field'': $\nabla \otimes \nabla : (\mathbb{R}^n
\rightarrow \mathbb{R}) \longrightarrow (\mathbb{R}^n \rightarrow
\mathbb{R}^{n^2})$. Just like a second derivative, the matrix associated to a
point in the scalar field describes the local curvature of the field at that
point.
We can derive the Hessian by taking the outer product of $\nabla$ and the
gradient of $f$. Using only two dimensions, we have:
\[
\nabla \otimes \nabla f = (\nabla)^T \nabla{f} =
\begin{bmatrix}
\dfrac{\partial }{\partial x_1} \\
\dfrac{\partial }{\partial x_2}
\end{bmatrix} \cdot
\begin{bmatrix}
\dfrac{\partial f_1}{\partial x_1} &
\dfrac{\partial f_1}{\partial x_2}
\end{bmatrix} =
\begin{bmatrix}
\dfrac{\partial^2 f_1}{\partial^2 x_1} &
\dfrac{\partial^2 f_1}{\partial x_1 \partial x_2} \\
\dfrac{\partial^2 f_1}{\partial x_2 \partial x_1} &
\dfrac{\partial^2 f_1}{\partial^2 x_2^2}
\end{bmatrix}
\]
\subsubsection{Laplacian}
The Laplacian $\nabla \cdot \nabla f$ of a function $ f: \mathbb{R}^n
\longrightarrow \mathbb{R}$ is the sum of all its ``unmixed'' second partial
derivatives:
\[
\nabla \cdot \nabla f = \sum_{i=1}^n \frac {\partial^2 f}{\partial x^2_i}
\]
You can also view it as the trace of the Hessian matrix. As opposed to the
Hessian, the Laplacian is the inner product/dot product of $\nabla$
and the gradient of $f$.
\[
\nabla \cdot \nabla f = \nabla (\nabla{f})^T =
\begin{bmatrix}
\dfrac{\partial }{\partial x_1} &
\dfrac{\partial }{\partial x_2}
\end{bmatrix} \cdot
\begin{bmatrix}
\dfrac{\partial f_1}{\partial x_1} \\
\dfrac{\partial f_1}{\partial x_2}
\end{bmatrix} =
\dfrac{\partial }{\partial x_1} \dfrac{\partial f_1}{\partial x_1} +
\dfrac{\partial f_1}{\partial x_2} \dfrac{\partial f_1}{\partial x_2} =
\dfrac{\partial^2 f_1}{\partial^2 x_1} + \dfrac{\partial^2 f_1}{\partial^2 x_2^2}
\]
It is a different kind of second derivative of a scalar field. Most noticeably,
it maps a scalar field to a scalar field. In general terms, it measures the rate at
which the average value of $f$ over spheres centered at a point $p$, deviates
from $f(p)$ as the radius of the sphere grows.
\subsection{Functions defined over vector fields}
\subsubsection{Jacobian}
The Jacobian $J(\mathbf{f})$ of a function $ \mathbf{f}: \mathbb{R}^n \longrightarrow \mathbb{R}^n $
is a matrix of its partial derivatives. It is the generalization of the gradient
for vector-valued function.
\[
J(\mathbf{f})=
\begin{bmatrix}
\dfrac{\partial f_1}{\partial x_1} &
\dfrac{\partial f_1}{\partial x_2} &
\cdots &
\dfrac{\partial f_1}{\partial x_n} \\
\dfrac{\partial f_2}{\partial x_1} &
\dfrac{\partial f_2}{\partial x_2} & &
\dfrac{\partial f_2}{\partial x_n} \\
\vdots & & \ddots & \vdots \\
\dfrac{\partial f_n}{\partial x_1} &
\dfrac{\partial f_n}{\partial x_2} &
\cdots &
\dfrac{\partial f_n}{\partial x_n}
\end{bmatrix} =
\begin{bmatrix}
\nabla f_1 \\ \nabla f_2 \\ \hdots \\ \nabla f_n
\end{bmatrix}
\]
Notice that each column of $J$ is the gradient of $\mathbf{f}$: $J_{(i, :)} = \nabla
f_i(x)$. It maps a vector field to a ``matrix field''.
\subsubsection{Divergence}
The divergence $\nabla \cdot \mathbf{f}$ of a function $ \mathbf{f}: \mathbb{R}^n \longrightarrow
\mathbb{R}^n $ is the sum of the partial derivatives of the vector.
\begin{align*}
\nabla \cdot \mathbf{f} &=
\begin{bmatrix}
\dfrac{\partial }{\partial x_1} &
\dfrac{\partial }{\partial x_2} &
\hdots
\dfrac{\partial }{\partial x_n}
\end{bmatrix} \cdot
\begin{bmatrix}
f_1 \\
f_2 \\
\vdots \\
f_n
\end{bmatrix} \\ &=
\dfrac{\partial f_1}{\partial x_1} +
\dfrac{\partial f_2}{\partial x_2} +
\hdots +
\dfrac{\partial f_n}{\partial x_n}
\end{align*}
It maps a vector field to a scalar field. For a given point $p$ in a vector
field, it measures the density of the outward flux of the vector field from an
infinitesimal region around around $p$.
\subsection{Relationships}
As you may have guessed, if the Jacobian takes in as a vector field the gradient
of a particular scalar field $f$, that Jacobian matrix is identical to the
Hessian of $f$. They both reach the same ``matrix field''. The same principle
applies to the divergence: if the divergence is applied to gradient of $f$, it
will yield the same scalar field as the Laplacian of $f$.
\begin{figure}[H]
\centering
\includegraphics[width=0.6\textwidth]{img/hessian.png}
\caption{Transition from a scalar field, to a vector field, to a ``matrix
field'' using first and second-order derivatives.}
\end{figure}
\begin{figure}[H]
\centering
\includegraphics[width=0.6\textwidth]{img/laplacian.png}
\caption{Transition from a scalar field, to a vector field, back to a scalar
field using first and second-order derivatives.}
\end{figure}
\subsubsection{Example}
`The Jacobian of the force is the Hessian of the energy.'' First, let us define
an \underline{energy} function $V : \mathbb{R}^2 \rightarrow \mathbb{R}$ which
yields a potential energy value for each point in the space (a scalar field). We
also define a \underline{force} function $\mathbf{F} : \mathbb{R}^2 \rightarrow
\mathbb{R}^2$ which yields a force vector field.
We can obtain the force field by computing the gradient of the energy field.
\[
\mathbf{F} = (\nabla V)^T =
\begin{bmatrix}
\dfrac{\partial V}{\partial x_1} \\
\dfrac{\partial V}{\partial x_2}
\end{bmatrix} =
\begin{bmatrix} F_1 \\ F_2 \end{bmatrix}
\]
Now the Hessian of the energy:
\[
H(V) =
\begin{bmatrix}
\dfrac{\partial^2 V}{\partial^2 x_1} &
\dfrac{\partial^2 V}{\partial x_1 \partial x_2} \\
\dfrac{\partial^2 V}{\partial x_1 \partial x_2} &
\dfrac{\partial^2 V}{\partial^2 x_2}
\end{bmatrix}
\]
The Jacobian of the force:
\[
J(\mathbf{F}) =
\begin{bmatrix}
\dfrac{\partial F_1}{\partial x_1} & \dfrac{\partial F_1}{\partial x_2} \\
\dfrac{\partial F_2}{\partial x_1} & \dfrac{\partial F_2}{\partial x_2}
\end{bmatrix} =
\begin{bmatrix}
\dfrac{\partial}{\partial x_1} \dfrac{\partial V}{\partial x_1} &
\dfrac{\partial}{\partial x_2} \dfrac{\partial V}{\partial x_1} \\
\dfrac{\partial}{\partial x_1} \dfrac{\partial V}{\partial x_2} &
\dfrac{\partial}{\partial x_2} \dfrac{\partial V}{\partial x_2}
\end{bmatrix} =
\begin{bmatrix}
\dfrac{\partial^2 V}{\partial^2 x_1} & \dfrac{\partial^2 V}{\partial x_1 \partial x_2} \\
\dfrac{\partial^2 V}{\partial x_1 \partial x_2} & \dfrac{\partial^2 V}{\partial^2 x_2}
\end{bmatrix} = H(V)
\]
\subsection{Chain rule}
Recall the definition of the chain rule for functions of a single variable. Let $z=f(x)$ and $x=u(t)$ so that $z = (f \circ x)(t)$.
\[
\frac{dz}{dx} = f'(x) \cdot x'(t) = \frac{df}{dx} \cdot \frac{dx}{dt}
\]
Equivalently using a different notation, we have:
\[
Dz = D(f \circ x)(t) = D(f(x))D(x(t))
\]
Now we consider the chain rule in the
context of multivariate functions. For simplicity, we shall use only two
dimensions.
\subsubsection{Case 1}
Let $z=f(x)$ and $x=u(\mathbf{t})=u(s, t)$ so that $z = (f \circ x)(s,t)$.
\begin{align*}
\begin{bmatrix}
\dfrac{\partial z}{\partial s} &
\dfrac{\partial z}{\partial t}
\end{bmatrix} &=
\begin{bmatrix}
\dfrac{df}{dx} \dfrac{\partial x}{\partial s} &
\dfrac{df}{dx} \dfrac{\partial x}{\partial t}
\end{bmatrix} \\
&= \dfrac{df}{dx} \nabla x \\
\end{align*}
So we have, $Dz = D(f \circ x)(\mathbf{t}) =
D(f(x)) D(x(\mathbf{t}))$.
\subsubsection{Case 2}
Let $z= f(\mathbf{x}) =f(x_1, x_2)$ where $x_1=u(t)$ and $x_2=v(t)$.
\begin{align*}
\dfrac{dz}{dt} &= \dfrac{\partial f}{\partial x_1} \dfrac{ d x_1}{dt} +
\dfrac{\partial f}{\partial x_2} \dfrac{ d x_2}{dt} \\
&= \begin{bmatrix}
\dfrac{\partial f}{\partial x_1} & \dfrac{\partial
f}{\partial x_2}
\end{bmatrix} \cdot
\begin{bmatrix}
\dfrac{d x_1}{dt} \\ \dfrac{d x_2}{dt}
\end{bmatrix} \\
&= \nabla f \dfrac{d\mathbf{x}}{dt} \\
\end{align*}
So we have, $Dz = D(f \circ \mathbf{x})(t) =
D(f(\mathbf{x}))D(\mathbf{x}(t))$.
\subsubsection{Case 3}
Let $z= f(\mathbf{x}) = f(x_1, x_2)$ where $x_1=u(\mathbf{t})=u(s, t)$ and
$x_2= v(\mathbf{t}) = v(s, t)$.
\begin{align*}
\begin{bmatrix}
\dfrac{\partial z}{\partial s} &
\dfrac{\partial z}{\partial t}
\end{bmatrix} &=
\begin{bmatrix}
\dfrac{\partial f}{\partial x_1} \dfrac{\partial x_1}{\partial s} +
\dfrac{\partial f}{\partial x_2} \dfrac{\partial x_2}{\partial s} &
\dfrac{\partial f}{\partial x_1} \dfrac{\partial x_1}{\partial t} +
\dfrac{\partial f}{\partial x_2} \dfrac{\partial x_2}{\partial t}
\end{bmatrix} \\
&=
\begin{bmatrix}
\dfrac{\partial f}{\partial x_1} &
\dfrac{\partial f}{\partial x_2}
\end{bmatrix} \cdot
\begin{bmatrix}
\dfrac{\partial x_1}{\partial s} &
\dfrac{\partial x_1}{\partial t} \\
\dfrac{\partial x_2}{\partial s} &
\dfrac{\partial x_2}{\partial t} \\
\end{bmatrix} \\
&= \nabla f \cdot J(\mathbf{x})
\end{align*}
So we have, $Dz = D(f \circ \mathbf{x})(\mathbf{t}) =
D(f(\mathbf{x}))D(\mathbf{x}(\mathbf{t}))$.
\subsubsection{Case 4}
At last, let's consider the case of a vector-valued function $\mathbf{z} =
\mathbf{f}(\mathbf{x}) = \mathbf{f}(x_1, x_2)$ where $x_1=u(\mathbf{t})=u(s, t)$ and
$x_2= v(\mathbf{t}) = v(s, t)$.
\[
\mathbf{z} =
\begin{bmatrix}
f_1(\mathbf{x}) \\
f_2(\mathbf{x})
\end{bmatrix}
\]
\begin{align*}
\begin{bmatrix}
\dfrac{\partial f_1}{\partial s} &
\dfrac{\partial f_1}{\partial t} \\
\dfrac{\partial f_2}{\partial s} &
\dfrac{\partial f_2}{\partial t}
\end{bmatrix}
&=
\begin{bmatrix}
\dfrac{\partial f_1}{\partial x_1} &
\dfrac{\partial f_1}{\partial x_2} \\
\dfrac{\partial f_2}{\partial x_1} &
\dfrac{\partial f_2}{\partial x_2}
\end{bmatrix} \cdot
\begin{bmatrix}
\dfrac{\partial x_1}{\partial s} &
\dfrac{\partial x_1}{\partial t} \\
\dfrac{\partial x_2}{\partial s} &
\dfrac{\partial x_2}{\partial t} \\
\end{bmatrix} \\
&= J(\mathbf{f}) \cdot J(\mathbf{x})
\end{align*}
So we have, $D\mathbf{z} = D(\mathbf{f} \circ \mathbf{x})(\mathbf{t}) =
D(\mathbf{f}(\mathbf{x}))D(\mathbf{x}(\mathbf{t}))$.
\subsubsection{Summary}
Using the \textit{numerator-layout notation}, we found that the chain rule can
be generalized to multivariate functions. The only difference between the cases
is how the differential operator is applied.
\begin{itemize}
\item The derivative of a scalar by a scalar $D(f(x))$ produces a scalar.
\item The derivative of a vector by a scalar $D(\mathbf{f}(x))$ produces a column vector.
\item The derivative of a scalar by a vector $D(f(\mathbf{x}))$ produces a row vector.
\item The derivative of a vector by a vector $D(\mathbf{f}(\mathbf{x}))$ produces a
jacobian matrix.
\end{itemize}
\subsection{Taylor expansion}
The taylor serie of an infinitely differentiable function $f$ at $x=a$ is given by:
\[
f(x) = \sum_{k=0}^{+\infty} \frac{f^{(k)}(a)}{k!}(x-a)^k
\]
A finite number of terms can be used as an approximation function of $f$ around $a$.
For a k-differentiable function, Taylor's theorem states that:
\begin{align*}
f(x) &= f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + \cdots \\ &\qquad {} +\frac{f^{(k)}(a)}{k!}(x-a)^k + h_k(x)(x-a)^k \\
f(x) &= P_k(x) + h_k(x)(x-a)^k \\
f(x) &= P_k(x) + R_k(x)
\end{align*} where $lim_{x\to a}h_k(x)=0$. \newline
The remainder term $R_k(x) = h_k(x)(x-a)^k = o(|x-a|^k), x \longrightarrow a$ represents the approximation error.
Thus, say we have a function $x(t+h)$. We then can approximate it around $t$ using its Taylor expansion:
\begin{align*}
x(t+h) &= x(t) + x'(t)((t+h)-t) + \frac{x''(h)}{2!}((t+h)-t)^2 + \cdots \\ &\qquad {} +\frac{x^{(k)}(h)}{k!}((t+h)-t)^k + h_k(t)((t+h)-t)^k \\
x(t+h) &= x(t) + h x'(t) + \frac{h^2}{2!}x''(h)+ \cdots + \frac{h^k}{k!}x^{(k)}(h) + h^k h_k(t)
\end{align*}
We will later use this expansion as a position update formula ($x$ will be the position and $h<1$ the timestep).
\subsection{Taylor expansion for multivariate or vector functions}
Very often in our computations, we will use vector/matrix-form values --- position, velocity and acceleration are all vectors.
Hence, we can try to derive an Taylor expansion approximation of second-order for a vector function.
\begin{align}
y=f(\mathbf{x}+\Delta\mathbf{x})\approx f(\mathbf{x}) + J(\mathbf{x})\Delta \mathbf{x} +\frac{1}{2} \Delta\mathbf{x}^\mathrm{T} H(\mathbf{x}) \Delta\mathbf{x}
\end{align}
\section{Particle simulation}
In physically based animation, particles can be divided into two main groups:
\begin{itemize}
\item Non-interacting particles such as dust, sparks, snow, rain, \ldots
\item Interacting particles such as cloth, cables, elastic solids, dluids, \ldots
\end{itemize}
\subsection{Introductory example}
Say we have snow blown by a simple wind force function $f(x, t)$ which depends on both time and position:
\[\dot{x} = f(x, t)\]
In solving an ODE such as this one, we might have a boundary value problem where knowing the starting point $x_0$ and the ending point $x_n$, we look for $x(t)$. This is usually hard to compute and can have many possible solutions (i.e. cycles). We could also have an initial value problem: we know $x_0$ (and $\dot{x_0}$), and we want to compute $x(t)$. This method is generally faster.
\begin{align*}
\mathbf{x} = \begin{bmatrix} x \\ y \end{bmatrix}&, \quad
\mathbf{\dot{x}} = \begin{bmatrix} \dot{x} \\ \dot{y} \end{bmatrix} = \begin{bmatrix} 1 \\ -3y \end{bmatrix}, \quad
\mathbf{x} \in \mathbb{R} \\
\dot{x} = 1,\quad \dot{y} = -3y \quad &\Rightarrow \quad x(t) = c_1+t, \quad y(t) = c_2 e^{-3 x} \\
\mathbf{x_0} = \begin{bmatrix} 0 \\ 1 \end{bmatrix} \quad &\Rightarrow \quad
x(t) = t, \quad y(t) = e^{-3 x}
\end{align*}
However, computing the exact formula for a particle path would generally be unrealistic. The only viable approach is to approximate the ODE solution by discretizing time. The particle path $x(t)$ will then describe a discrete sequence of displacements --- hopefully small enough so that they remain unnoticeable. Correct appearance and expected behavior are our primary concerns.
\subsection{Forward Euler}
Using the forward euler approximation method, we discretize time and find solution at each time steps $0<h<1$. For all the methods we will explore, the size of $h$ will be an important component to the stability of the approximation. Very often, time steps too large will cause the particle system to 'blow up'.
\[\mathbf{x}_0, \mathbf{x}_1, \mathbf{x}_2, \ldots = \mathbf{x}(0),\mathbf{x}(h), \mathbf{x}(2h), \ldots \mbox{ for small h, so that } \mathbf{x}(t+h) = \mathbf{x}_{k+1}\]
\begin{align}
\mathbf{x}_{k+1} &= \mathbf{x}_k + h \mathbf{\dot{x}}_k \\
&= \mathbf{x}_k + h f(\mathbf{x}_k, t)
\end{align}
We can also see it as a first derivative approximation:
\begin{align}
\mathbf{\dot{x}}_k &\approx \frac{\mathbf{x}_{k+1}-\mathbf{x}_k}{h}
\end{align}
Or using the taylor expansion:
\begin{align}
\mathbf{x}(t+h) &= \mathbf{x}(t) + h \mathbf{x}'(t) + \frac{h^2}{2!}\mathbf{x}''(h)+ \cdots + \frac{h^k}{k!}\mathbf{x}^{(k)}(h) + h^k h_k(t) \\
&= \mathbf{x}(t) + h \mathbf{\dot{x}}(t) + O(h^2)
\end{align}
\section{Collision Detection}
\subsection{Lagrange Multiplier}
Recall that a Lagrange Multiplier $\lambda$ is defined as follows:
\begin{equation}
\nabla f(\mathbf{x_0}) = \lambda \cdot \nabla g(\mathbf{x_0})
\end{equation}
where $(\mathbf{x_0})$ is the maxima or the minima of a function $f$ subject to
a constraint $g$. It is easier to gain intuition when the domain of $f$ is
$\mathbb{R}^2$. Both $f : \mathbb{R}^2 \rightarrow \mathbb{R}$ and $g :
\mathbb{R}^2 \rightarrow \mathbb{R}$ are standard scalar-valued functions
mapping a plane position to a height value, generating a surface in 3d space.
However, setting $g(x,y)=c$ effectively restricts the domain to a 2d contour
line. Moving along the contour line $g=c$, the value of f can vary. However, an
extremum of $f$ will be reached when no variation occurs (at a stationary
point). This happens only when the contour line for $g=c$ meets contour lines of
$f$ tangentially, or equivalently when the gradients at this point are parallel.
This is equivalent to the Lagrange Multiplier equation where $\lambda$ simply is
a scaling factor between two parallel gradient vectors.
\begin{figure}[H]
\centering
\includegraphics[scale=0.7]{img/LagrangeMultipliers3D.png}
\caption{Find x and y to maximize f(x,y) subject to a constraint (shown in red) g(x,y) = c.}
\end{figure}
\begin{figure}[H]
\centering
\includegraphics[scale=0.6]{img/LagrangeMultipliers2D.png}
\caption{Contour map. The red line shows the constraint g(x,y) = c. The blue lines are contours of f(x,y). The point where the red line tangentially touches a blue contour is our solution.}
\end{figure}
As an example, say we want to maximize the volume of a box without a cover, using 12 unit squared of material. We must then maximize $f(x,y,z) = xyz $ (the volume) under the constraint $g(x,y,z) = 2xz + 2yz + xy - 12 = 0$ (the surface). Under the Lagrange multiplier method, we look for values of x, y and z such that $\nabla f = \lambda \nabla g$ and $g(x,y,z) = 0$. Thus, we have the following system of four equations with four unknowns:
\begin{equation}
\begin{pmatrix}
\dfrac{\partial f}{\partial x} \\ \dfrac{\partial f}{\partial y} \\ \dfrac{\partial f}{\partial z}
\end{pmatrix} =
\lambda \cdot
\begin{pmatrix}
\dfrac{\partial g}{\partial x} \\ \dfrac{\partial g}{\partial y} \\ \dfrac{\partial g}{\partial z}
\end{pmatrix} \implies
\begin{pmatrix}
yz \\ xz \\ xy
\end{pmatrix} =
\lambda \cdot
\begin{pmatrix}
2z+y \\ 2z+x \\ 2x + 2y
\end{pmatrix}
\end{equation}
\begin{equation}
2xz + 2yz + xy - 12 = 0
\end{equation}
After solving this system of equations with a few algebraic manipulations, we find that $x = y = 2z \implies 4 z^2+ 4 z^2+ 4 z^2 = 12 \implies x = y = 2, z=1$. Note that finding the lagrange multiplier $\lambda$ was unnecessary in this case.
\subsection{Collision Constraints}
\subsubsection{Bilateral Constraints, or equality constraints}
In general, one equality constraint function removes one degree of freedom. Say we have the plane $\mathbf{x} \in \mathbb{R}^2$. Adding the constraint $g_1(x) = x_1$ effectively reduces $\mathbf{x}$ to $\mathbb{R}$, a single line. Adding a further constraint $g_1(x) = x_2$ reduces $\mathbf{x}$ to $(x_1, x_2)$, a single point with 0 degrees of freedom. However, if the new constraint is not linearly independent, then it does not reduce the degrees of freedom (adds no more constraint).
Let's consider a physical system where a particle is constrained to a circle of radius 1 around the origin. The constraint on the particle position would be $g(x,y) = \sqrt{x^2+y^2} - 1 = 0$, or in vector form $g(\mathbf{x}) = \sqrt{x_1^2+x_2^2} - 1 = ||\mathbf{x}|| -1 = 0$. Forces on the particle would be written, using Newton's law:
\[
M \mathbf{\ddot{x}} = F(\mathbf{x}, \mathbf{\dot{x}}) + F_{constraint}
\]
such that $g(\mathbf{x}) = 0$. Note that this equation has two unknowns: $\mathbf{\ddot{x}}$ and $F_{constraint}$.
The Principle of Virtual Work states that if we are going to add forces to the system in order to ensure $g(\mathbf{x})=0$ then we must not push in a direction in a direction in which it is admissible for the system to move, otherwise we do \underline{real work}! The 'virtual displacements' that we apply are infinitesimal change in the position coordinates of a system such that the constraints remain satisfied. Note that this principle is really a form of the more general principle of least action.
%FIXME: THERE IS PROBABLY AN ERROR IN THE FOLLOWING SECTION. Need to review.
Back to our circular constraint example, how do we find this force $F_{constraint}$? First, in which direction will it be? Using the principle of virtual work, the force must be perpendicular to the circle. Knowing that the gradient vector fields radiates from the origin outwards, $F_{constraint} = - (\nabla g)^T \lambda $, where $\lambda$ is an unknown scaling coefficient (lagrange multiplier) we must find.
Next, consider $g(\mathbf{x}) = 0$. At the velocity level,
\[
\frac{\partial{g(\mathbf{x})}}{\partial{t}} = \frac{\partial{g(\mathbf{x})}}{\partial{\mathbf{x}}} \cdot \frac{\partial{\mathbf{x}}}{\partial{t}} = 0
\]
At the acceleration level,
\[
\frac{\partial^2{g(\mathbf{x})}}{\partial^2 t} = \frac{\partial^2{g(\mathbf{x})}}{\partial^2{\mathbf{x}}} \cdot \frac{\partial^2{\mathbf{x}}}{\partial^2{t}} = 0
\]
Using a different notation we can write,
\begin{equation}
J = \nabla g = \frac{\partial{g}}{\partial{\mathbf{x}}}
\end{equation}
\begin{equation}
J \dot{\mathbf{x}} = 0
\end{equation}
\begin{equation}
\dot{J} \dot{\mathbf{x}} + J \ddot{\mathbf{x}} = 0
\end{equation}
where the partial derivative of $J$ are always on $\mathbf{x}$, while the derivatives of $\mathbf{x}$ are always on $t$.
\[ \frac{\partial g(\mathbf{x})}{\partial x_1} = \frac{\partial}{\partial x_1}((x_1^2+x_2^2)^{\frac{1}{2}} - 1) = \frac{1}{2} \cdot (x_1^2 + x_2^2)^{-\frac{1}{2}} \cdot 2 x_1 = \frac{x_1}{\sqrt{x_1^2 + x_2^2}}
\]
\[
\frac{\partial g(\mathbf{x})}{\partial x_2} = \frac{\partial}{\partial x_2}((x_1^2+x_2^2)^{\frac{1}{2}} - 1) = \frac{1}{2} \cdot (x_1^2 + x_2^2)^{-\frac{1}{2}} \cdot 2 x_2 = \frac{x_2}{\sqrt{x_1^2 + x_2^2}}
\]
Combining both in a single vector equation, we find:
\begin{align*}
J &= \nabla g = \frac{\partial{g}}{\partial{\mathbf{x}}} =
\frac{1}{||\mathbf{x}||} \cdot
\begin{pmatrix}
x_1 & x_2
\end{pmatrix} =
\frac{\mathbf{x}^T}{||\mathbf{x}||}\\
\dot{J} &=
\begin{bmatrix}
\dfrac{x_2^2}{(x_1^2 + x_2^2)^{\frac{3}{2}}} & \dfrac{x_1 x_2}{(x_1^2 + x_2^2)^{\frac{3}{2}}} \\
\dfrac{-x_1 x_2}{(x_1^2 + x_2^2)^{\frac{3}{2}}} & \dfrac{x_1^2}{(x_1^2 + x_2^2)^{\frac{3}{2}}}
\end{bmatrix} =
\frac{1}{||\mathbf{x}||^3} \cdot
\begin{bmatrix}
x_2^2 & x_1 x_2 \\
-x_1 x_2 & x_1^2
\end{bmatrix}
\end{align*}
Using the acceleration constrain (12), we can solve the following system of equation:
\begin{align*}
\begin{bmatrix}
M & J^T \\ J & 0
\end{bmatrix}
\begin{bmatrix}
\ddot{\mathbf{x}} \\ \lambda
\end{bmatrix} &=
\begin{bmatrix}
F(\mathbf{x} ,\dot{\mathbf{x}}) \\ -\dot{J} \dot{\mathbf{x}}
\end{bmatrix} \\
\begin{bmatrix}
J M^{-1} M & J M^{-1} J^T \\ J & 0
\end{bmatrix}
\begin{bmatrix}
\ddot{\mathbf{x}} \\ \lambda
\end{bmatrix} &=
\begin{bmatrix}
J M^{-1} F(\mathbf{x} ,\dot{\mathbf{x}}) \\ -\dot{J} \dot{\mathbf{x}}
\end{bmatrix} \\
\begin{bmatrix}
0 & J M^{-1} J^T \\ J & 0
\end{bmatrix}
\begin{bmatrix}
\ddot{\mathbf{x}} \\ \lambda
\end{bmatrix} &=
\begin{bmatrix}
J M^{-1} F(\mathbf{x} ,\dot{\mathbf{x}}) +\dot{J} \dot{\mathbf{x}} \\ -\dot{J} \dot{\mathbf{x}}
\end{bmatrix}
\end{align*}
From the first row, we obtain the equation
\[
J M^{-1} J^T \lambda = J M^{-1} F(\mathbf{x}, \dot{\mathbf{x}}) + \dot{J} \dot{\mathbf{x}}
\]
Solving for $\lambda$ we obtain $F_{constraint} = J^T \lambda$ which can be used to compute the acceleration update which takes into account our constraint.
Sidenote: we consider scalar values to be automatically multiplied by the identity vector/matrix in order to fill all the entries. In the last derivation, we would have for example:
\[ M =
\begin{bmatrix}
M & 0 \\ 0 & M
\end{bmatrix}
\]
\[ \lambda =
\begin{bmatrix}
\lambda \\ \lambda
\end{bmatrix}
\]
\subsubsection{Unilateral Constraints, or inequality constraints}
\end{document}