# Unit 5: Curves and Surfaces

### A square matrix $A$ is called ***orthogonal*** if
# $$ A^T = A^{-1} $$

## Vector magnitude

### The magnitude of a vector $\vec{v} = \langle v_1, v_, v_3 \rangle$ is given by
## $$ \lvert \vec{v} \rvert = \sqrt{v_1^2+v_2^2+v_3^2} $$

## Remark 2.3

### ***Warning***: The formula for the magnitude in the definition only works because we write the coordinates of our vectors with respect to an orthonormal basis. 
### That is
## $$ \begin{pmatrix} v_1 \\ v_2 \\ v_3 \end{pmatrix} = v_1 \begin{pmatrix} 1 \\ 0 \ 0 \end{pmatrix} + v_2 \begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix} + v_3 \begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix} $$
### where $\begin{pmatrix} 1 \\ 0 \\ 0 \end{pmatrix}$, $\begin{pmatrix} 0 \\ 1 \\ 0 \end{pmatrix}$, $\begin{pmatrix} 0 \\ 0 \\ 1 \end{pmatrix}$ are all unit length and mutually orthogonal. There are ways to write vectors with respect to generic bases that are not orthonormal, and this formula would not hold!

## Dot product

### The dot product of vectors $\vec{v} = \langle v_1, v_2, v_3 \rangle$ and $\vec{w} = \langle w_1, w_2, w_3 \rangle$ is the scalar quantity
## $$ \vec{v} \cdot \vec{w} = v_1 w_1 + v_2 w_2 + v_3 w_3 $$
### In 3D, we can also interpret the dot product as
## $$ \vec{v} \cdot \vec{w} = \lvert \vec{v} \rvert \lvert \vec{w} \rvert cos(\theta) $$
### where $\theta$ is the angle between the two vectors (as measured within the plane that contains both vectors).

## Equations of planes

### The equation
## $$ 2 x + 2 y + z = 0 $$

### is the equation for a plane. This equation can be written in terms of a “hidden dot product"
## $$ \langle 2,2,1 \rangle \cdot \langle x,y,z \rangle = 0 $$

### Therefore the following three statements are equivalent:
### - A point $(x, y, z)$ lies on the plane defined by the equation $ 2 x + 2 y + z = 0 $
### - $ \langle 2,2,1 \rangle \cdot \langle x,y,z \rangle = 0 $
### - $ \langle x,y,z \rangle $ is perpendicular to $\langle 2,2,1 \rangle$

### For example to check the questions:
### Do point $(1.5, -2, 1)$ lie on the plane $ 2 x + 2 y + z = 0 $?
### We check if equation
## $$ \langle 2,2,1 \rangle \cdot \langle 1.5,-2,1 \rangle = 0 $$
### is true, and if it is - then the point ***is on the plane***

### - The equation $ax+by+cz=0$ describes the plane that is perpendicular to the vector $\langle a,b,c \rangle$ and passes through the origin $(0,0,0)$.
### - The equation $ax+by+cz=0$ describes a plane that is perpendicular to the vector $\langle a,b,c \rangle$.

## Functions of three variables

### A function of three variables $f(x, y, z)$ depends on three independent variables.

### Definition 4.1
### The domain of $f(x,y,z)$ is the set of points $(x,y,z)$ in 3D space such that $f$ is defined.

### Examples 4.2
### - The function $f(x,y,z)=2x+2y+z$ is defined for all points $(x,y,z)$ in 3 dimensional space. We say that the domain is $\mathbb{R}^3$ (spoken as "R" "3").
### - The domain of the function $f(x,y,z)=x^2+y^2+z^2$ is $\mathbb{R}^3$.
### - The domain of the function $f(x,y,z)=\sqrt{1 - (x^2+y^2+z^2)}$ is the set of points $(x,y,z)$ such that $x^2+y^2+z^2 \leq 1$. This set describes a solid ball of radius 1.

## Visualizing function of three variables

### We try to visualize a function $f(x,y,z)$ similarly to the way we used level curves or contour plots to understand functions of two variables.

### Definition 4.3
### For any real number $k$, we can consider the set of points $(x,y,z)$ such that the relation $f(x,y,z)=k$ holds. This set of points is known as a ***level surface*** of the function $f(x,y,z)$. For differentiable functions $f(x,y,z)$, its level surfaces will be smooth surfaces that sit in three-dimensional space.

### Examples 4.4

### - The level surfaces of $f(x,y,z)=2x+2y+z$ are the parallel planes. The plane $2x+2y+z=k$ is the plane perpendicular to $ \langle 2,2,1 \rangle$ through the point $(0,0,k)$.

![Level Surfaces 1](img/level-surf-1.png)

### Lets find the level surfaces of the function $f(x,y,z)=x^2+y^2+z^2$.
### - The surface $x^2+y^2+z^2=1$ is described as the set of points $(x,y,z)$ such that
## $$ 1 = \langle x,y,z \rangle \cdot \langle x,y,z \rangle = \lvert \langle x,y,z \rangle \rvert = 1 $$
### Therefore this is the set of points $(x,y,z)$ such that the vector $\langle x,y,z \rangle$ has unit length. This exactly describes the unit sphere about the origin.
### - The surface $x^2+y^2+z^2=2$ describes the sphere of radius $\sqrt{2}$ centered at the origin.

![Level Surfaces 2](img/level-surf-2.png)

## Partial derivatives

### Definition 5.1
### Given a continuous function $f(x,y,z)$,
### - The $x$-partial derivative is written as $\frac{\partial f}{\partial x}$ or $f_x$, and is taken by taking the derivative with respect to $x$ while holding all other variables fixed.
## $$ \lim_{h \to 0} \frac{f(x + h, y, z) - f(x, y, z)}{h} $$
### - The $y$-partial derivative is written as $\frac{\partial f}{\partial y}$ or $f_y$, and is taken by taking the derivative with respect to $y$ while holding all other variables fixed.
## $$ \lim_{h \to 0} \frac{f(x, y + h, z) - f(x, y, z)}{h} $$
### - The $z$-partial derivative is written as $\frac{\partial f}{\partial z}$ or $f_z$, and is taken by taking the derivative with respect to $z$ while holding all other variables fixed.
## $$ \lim_{h \to 0} \frac{f(x, y, z + h) - f(x, y, z)}{h} $$

### The partial derivatives measure how the function $f(x,y,z)$ changes as you change each variable independently. We can use these partial derivatives to get an overall approximation to the function in a small region about a point $(x_0,y_0,z_0)$.

## Gradient
### Definition 5.2
### The gradient of a function $f(x,y,z)$ is the vector field
## $$ \nabla f = \langle f_x, f_y, f_z \rangle $$

## Theorem 
### The gradient of a function $f(x,y,z)$ is ***normal*** to the level surfaces $f(x,y,z)=c$.
### Example 5.4
### Let's consider the linear function $w=a_1 x+a_2 y+a_3 z$. The gradient is
## $$ \nabla w= \langle a1,a2,a3 \rangle $$
### The level curves are the functions
## $$ a_1 x+a_2 y+a_3 z = constant $$
### These level curves are the equations of planes whose normal vector is $\langle a_1,a_2,a_3 \rangle$. Thus the normal vector is exactly the gradient.

### Intuitively speaking, when a vector is normal to a surface at a point $p$, the vector points “straight out of the surface" at $p$. It could also point “straight in to the surface" depending on the orientation. Imagining the surface as approximated by a tangent plane at $p$, the normal vector will be the normal vector of this tangent plane.

### Example 5.5
### Question. 
### Let $S$ be the unit sphere $x^2+y^2+z^2=1$. Find a vector $\vec{n}$  that is normal to $S$ at a point $(x,y,z)$ on the unit sphere.
### Solution
### The unit sphere is the level curve of height 1 of the function $f(x,y,z)=x^2+y^2+z^2$. The gradient of $f$ is
## $$ \nabla f(x,y,z) = \langle 2x,2y,2z \rangle $$
### The gradient is normal to the level curves of $f$. In particular, it is normal to the unit sphere. Thus we have
### - $ \langle 2x,2y,2z \rangle$ is normal to $S$ at $(x,y,z)$, or dividing by the scalar,
### - $ \langle x,y,z \rangle $ is normal to S at (x,y,z)
![Gradient](img/3-gradient.png)

## Linear approximation
### Definition 6.1
### The linear approximation*** of a differentiable function $f(x,y,z)$ near a point $(x_0,y_0,z_0)$ is given by
## $$ f(x_0 + \Delta x, y_0 + \Delta y, z_0 + \Delta z) \approx f(x_0, y_0, z_0) + f_x(x_0, y_0, z_0) \Delta x + f_y(x_0, y_0, z_0) \Delta y + f_z(x_0, y_0, z_0) \Delta z $$

### Alternatively, we can write this linear approximation in terms of $x$, $y$, and $z$ as:
## $$ f(x, y, z) \approx f(x_0, y_0, z_0) + f_x(x_0, y_0, z_0) (x - x_0) + f_y(x_0, y_0, z_0) (y - y_0) + f_z(x_0, y_0, z_0) (z - z_0) $$

### Remark 6.2
### The linear approximation can be written more succinctly using the gradient notation.
## $$ f(x_0 + \Delta x, y_0 + \Delta y, z_0 + \Delta z) \approx f(x_0, y_0, z_0) + \nabla f(x_0, y_0, z_0) \cdot \langle \Delta x, \Delta y, \Delta z \rangle$$
### Rewriting this as
## $$ \underbrace{f(x_0 + \Delta x, y_0 + \Delta y, z_0 + \Delta z)}_{\text{change in }f, \Delta f} \approx f(x_0, y_0, z_0) + \nabla f(x_0, y_0, z_0) \cdot \langle \Delta x, \Delta y, \Delta z \rangle$$
### we can express the change in $f$ near $(x_0,y_0,z_0)$ as
## $$ \Delta f \approx f(x_0, y_0, z_0) + \nabla f(x_0, y_0, z_0) \cdot \langle \Delta x, \Delta y, \Delta z \rangle$$



### Example 6.3
### ***Question***: Find the equation of the tangent plane to the surface
## $$ x^2 + y^2 - z^2 = 4 $$
### at the point $(2, 1, 1)$
### ***Solution***: The function $g(x,y,z)=x^2+y^2-z^2$ is a hyperboloid. We know that the gradient of the function $g(x,y,z)=x^2+y^2-z^2$ is normal to its level surface $g(x,y,z)=4$.
### The gradient is:
## $$ \nabla g(x,y,z) = \langle 2x,2y,-2z \rangle $$
## $$ \nabla g(2,1,1) = \langle 4,2,-2 \rangle $$
### This vector is both the normal vector to the level surface, as well as the normal vector to the tangent plane at $(2,1,1)$. So the equation of the tangent plane is the equation of a plane normal to this vector, which can be written as
## $$ 4 x + 2 y - 2 z = c $$
### where $c$ is a constant. To solve for the constant, we plug in a point that lies on the plane. In particular, the only point we know, which is $(2,1,1)$. This gives us
## $$ 4 x + 2 y - 2 z = 8 + 2 - 2 = 8 $$
### We can simplify the equation by dividing through by $2$ if we want.
## $$ 2 x + y - z = 4 $$
![Example](img/example-6-3.png)

## Theorem

### The level plane of the linear approximation of a function $f(x,y,z)$ at a point $(x_0,y_0,z_0)$ is the plane $\nabla f(x_0,y_0,z_0) \cdot \langle x,y,z \rangle = \nabla f(x_0,y_0,z_0) \cdot \langle x_0,y_0,z_0 \rangle$, which is called the tangent plane at the point $(x_0,y_0,z_0)$.

### Let's look at the level surface of a function $f(x_0,y_0,z_0)=c$. If we look at points $(x_0+\Delta x,y_0+\Delta y,z_0+ \Delta z)$ that also lie on this level surface, we can use the approximation formula, which says that for $\Delta x$, $\Delta y$, and $\Delta z$, the change in the function is approximately
## $$ \Delta f \approx \nabla \cdot \langle \Delta x, \Delta y, \Delta z \rangle $$

### However, if we are staying within the level surface, this tells us that $\Delta f=0$. That is, the condition that we are locally in the level surface, is equivalent to saying that
## $$ 0 \approx \nabla f \cdot \langle \Delta x, \Delta y, \Delta z \rangle $$
### We can rewrite this as
## $$ 0 \approx \nabla f \cdot \langle x, y, z \rangle - \nabla f \cdot \langle x_0, y_0, z_0 \rangle $$
### Thus the plane defined by
## $$ \nabla f (x_0, y_0, z_0) \cdot \langle x, y, z \rangle = \nabla f (x_0, y_0, z_0) \cdot \langle x_0, y_0, z_0 \rangle $$
### This plane is called the ***tangent plane***. It is the linear approximation to the level surface $f(x,y,z)=c$ at the point $(x_0,y_0,z_0)$.
### Observe that this is the 3 dimensional analogue of the fact that as we zoom in on any function of 2 variables $f(x,y)$, its level curves are closer and closer to the level curves of its tangent plane, which are parallel lines. That is, as we zoom in on a function of 3 variables $f(x,y,z)$, the level surfaces become indistinguishable from the parallel planes that are the level surfaces of the linear approximation.

## Definition 8.2
### The ***critical points*** of a function $w=f(x,y,z)$ are the points ***in the domain of definition*** where the gradient is zero (meaning $\nabla w = \vec{0}$ ), or the gradient is undefined.

## Constrained optimization in 3 or more variables
### How are level surfaces related to constrained optimization?
### We wish to find the maximum and minimum value of a function $f(x,y,z)$ restricted to a surface defined by $g(x,y,z)=3$.
![Constrain Optimization](img/constrained-optimization-3-var.png)

### A function $f(x,y,z)$ has level surfaces $f(x,y,z)=c$. Five level surfaces are shown in the image above. The values of $c$ defining the level surfaces increase as $x$ decreases. The gray ellipsoid is the level surface defined by $g(x,y,z)=3$ such that $g(x,y,z)<3$ inside of the level surface, and $g(x,y,z)>3$ outside of the level surface.
### Lagrange multipliers - The maximum and minimum of the function $f(x,y,z)$ restricted to the surface $g(x,y,z)=3$ is given at the locations where $\nabla f = \lambda \nabla g$. That is, the normal vector to the constraint surface and the normal vector to the level surface of $f$ point in the same or opposite direction.
### In fact, the maximum occurs at the point where $\nabla f$ is pointing in the same direction as $\nabla g$. And the minimum occurs at the point where $\nabla f$ is in the opposite direction. At these locations, the tangent plane to the level surface of $f$ is exactly equal to the tangent plane to the surface defined by $g=3$.
![Constrain Optimization](img/constrained-optimization-3-var-2.png)



## Parametric curves

### Imagine a particle moving through two-dimensional space. We can describe the motion of the particle by specifying the position of the particle at time $t$, where $t$ runs through a set of values. For example, let's imagine a particle whose position at time $t$ is given by the $x$,$y$-coordinates $(t^2,1+t^2)$, where $0 \leq t < \infty$.
### By plotting a few points, we can see that the particle moves in a straight line from the point $(0,1)$ in the north-eastern direction.
![Parametric Example](img/parametric-example.png)
### In the image above, we have plotted the particle's ***trajectory***, that is, the set of points that the particle goes through. We can imagine letting the tip of a pencil follow the particle around on a piece of paper, which creates this image of the particle's trajectory.
### Equations such as $x(t)=t^2$ and $y(t)=1+t^2$ for $0 \leq t < \infty$ are known as ***parametric equations***. The terminology comes from the fact that $x$ and $y$ each depend on the ***parameter*** $t$.

### Some questions we will be interested in answering:
### - What is the particle's velocity at time $t$?
### - What is the particle's speed at time $t$?
### - Is there a way to see that $(t^2,1+t^2)$ describes a straight line without plotting points?

### Straight-line trajectory
### Can we get a better feeling for the motion described by $(t^2,1+t^2)$? 
### One approach is to use the language of vector arithmetic. We can represent the point $(t^2,1+t^2)$ by the vector 
## $$\begin{pmatrix} t^2 \\ 1+t^2 \end{pmatrix}$$
### Then, we can separate this vector into the sum of vectors:
## $$ \begin{pmatrix} t^2 \\ 1+t^2 \end{pmatrix} = \begin{pmatrix} 0 \\ 1 \end{pmatrix} + t^2 \begin{pmatrix} 1 \\ 1 \end{pmatrix} $$
### This form gives better insight into the motion of the particle. We can see that at $t=0$, the particle will be at the point $(0,1)$, and as $t$ increases, it moves along the vector $\begin{pmatrix} 1 \\ 1 \end{pmatrix}$. Since the vector $\begin{pmatrix} 1 \\ 1 \end{pmatrix}$ doesn't depend on $t$, the trajectory of the particle is indeed a straight line. In fact, the trajectory's line is parallel to the vector $\begin{pmatrix} 1 \\ 1 \end{pmatrix}$.

### Remark 2.1  (Parameterizing a Line) 
### In general, the parametric equation $(x(t),y(t))=\vec{v} +f(t) \vec{w}$  for vectors $\vec{v}$ ,$\vec{w}$  and any function $f(t)$, gives a trajectory that is contained within a straight line. The point $\vec{v}$  will be the starting point at $t=0$, and the vector $\vec{w}$  will be parallel to the trajectory's line.

## A parametric equation is a vector-valued function

### It is common to think of a pair of parametric equations $(x(t),y(t))$ as a two-dimensional vector that varies with the parameter $t$. It is standard to use the letter $\vec{r}$  to represent this vector.
## $$ \vec{r}(t) = \begin{pmatrix} x(t) \\ y(t) \end{pmatrix} $$
### or sometimes just $\vec{r}$ , if the parameter $t$ is clear from context. Technically, $\vec{r}$  is known as a vector-valued function, which means it is a function whose output is a vector. The input to the vector-valued function $\vec{r}$  is the parameter $t$.

### Remark 2.2
### The notation $\vec{r}$ is used quite often for the position vector at a time $t$, sometimes without explicit comment. It will be important to recognize this meaning when such notation is seen.

## Trajectory versus Motion
### It is important to remember that the particle's trajectory is not the same as the particle's motion. Looking only at the plot of the trajectory, we would not be able to tell the speed of the particle or the direction of motion.
### Can we figure out the speed and direction of motion just by looking at the formulas for $x(t)$ and $y(t)$? The answer is yes, but first, we need to clarify the difference between ***speed*** and ***velocity***.

## Speed versus Velocity
### ***Speed*** is a nonnegative real number that measures how fast the particle is moving. If you were to toss a tennis ball in the air, it would start with a high speed, which decreases down to zero at the peak, and then the speed would increase again as the ball accelerates towards the ground.
### On the other hand, ***velocity*** has a directional component. For a tossed tennis ball, it starts with an upwards velocity, but after reaching its peak, the velocity begins to point downwards.

## Same Speed Different Velocity
### To take another example, imagine serving a volleyball. The following figure shows its trajectory:
![Speed vs Velocity](img/speed-velocity.png)
### The two arrows represent the velocity vectors of the volleyball at two moments in time. Comparing these two points in time, the volleyball has the same speed but different velocity.

## Finding the velocity vector
### Definition 3.1   (Velocity in 2D) 
### If a particle follows a parametric equation $(x(t),y(t))$ then its velocity vector at time $t$ is the vector 
## $$ \begin{pmatrix} x'(t) \\ y'(t) \end{pmatrix} $$

### It is also common to write this vector using the differential notation:
## $$ \frac{d\vec{r}}{dt} = \begin{pmatrix} x'(t) \\ y'(t) \end{pmatrix} $$
### The differential notation should be understood in the same way as in single-variable calculus: it is the limit of the ratio $\frac{\Delta r}{\Delta t}$ as $\Delta t$ goes to zero (here $\Delta r$ is a vector and $\Delta t$ is a real number).
### Example 3.2
### Our first example was $(x(t),y(t))=(t^2,1+t^2)$. The velocity vector is therefore $\begin{pmatrix} 2t \\ 2t \end{pmatrix}$. We see that the velocity vector is parallel to $\begin{pmatrix} 1 \\ 1 \end{pmatrix}$, the trajectory of the particle.
### Speed is the magnitude of the velocity vector.

### Example 3.3

### Again let's consider $(x(t),y(t))=(t^2,1+t^2)$. The velocity vector is $\begin{pmatrix} 2t \\ 2t \end{pmatrix}$. The speed is given by $\sqrt{8 t^2}$, which shows that the particle is accelerating, that is, its speed is increasing over time.

## 3D Motion

### To describe a moving point in three-dimensional space, we need formulas for $x(t)$, $y(t)$, and $z(t)$. For example, we may have
## $$ x(t) = 1 + t^3 $$
## $$ y(t) = 2 t^3 $$
## $$ z(t) = 1 - t^3 $$
### where $0 \leq t < \infty$
### One may imagine a particle whose position at time $t$ is given by $(x(t),y(t),z(t))$.
### As in two dimensions, we can plot the trajectory. In this case, we get another straight line (ray):
![3D Motion 1](img/3d-motion-1.png)
### Again, we can use vector arithmetic to see why we get this line. The vector
## $$ \vec{r} = \begin{pmatrix} x(t) \\ y(t) \\ z(t) \end{pmatrix} = \begin{pmatrix} 1 + t^3 \\ 2 t^3 \\ 1 - t^3 \end{pmatrix} $$
### can be written as a sum:
## $$ \vec{r} = \begin{pmatrix} 1 \\ 0 \\ 1 \end{pmatrix} + t^3 \begin{pmatrix} 1 \\ 2 \\ -1 \end{pmatrix} $$
### We can recognize this equation as one of the form $\vec{r} = \vec{u} + f(t) \vec{w}$, which will have a straight-line trajectory. The base point $\vec{u}$  in this case is $(1,0,1)$ and the direction of motion $\vec{w}$  is parallel to the vector $\begin{pmatrix} 1 \\ 2 \\ -1 \end{pmatrix}$

### Let's look at an example where the trajectory is not a straight line.
## $$ x(t) = \cos(t) $$
## $$ y(t) = \sin(t) $$
## $$ z(t) = t $$
### where $0 \leq t < \infty$
### How to visualize this trajectory? First, let's ignore the $z$ component. Then we can recognize that, in the $x$, $y$ plane, the point is tracing out the unit circle repeatedly. Now we can think about the $z$-coordinate. Since $z(t)=t$, it means the point is moving steadily upwards. The resulting shape is known as a helix:

![3D Motion 2](img/3d-motion-2.png)

## 3D Velocity

### To compute the velocity of a particle moving in three dimensions, we again take the derivative of each component.
### Definition 6.1 (Velocity in 3D)
###  If a particle follows a parametric equation $(x(t),y(t),z(t))$ then its ***velocity vector*** at time $t$ is the vector
## $$ \vec{v} \frac{d\vec{r}}{dt} = \begin{pmatrix} x'(t) \\ y'(t) \\ z'(t) \end{pmatrix} $$

### The velocity vector $\vec{v}$  points in the direction of motion of the particle. More precisely, $\vec{v}$ points along the straight line that best approximates the motion at the given time $t$.
### For the helix example, we have:
## $$ \vec{r} = \begin{pmatrix} \cos(t) \\ \sin(t) \\ t \end{pmatrix} $$
### and
## $$ \vec{v} = \begin{pmatrix} -\sin(t) \\ \cos(t) \\ 1 \end{pmatrix} $$
### In the visualization below, there is the plot of the helix and the velocity vector at time $t=2\pi$.
![3D Motion 3](img/3d-motion-3.png)

## 3D Speed
### As in two dimensions, the speed of a particle is given by the magnitude of the velocity vector, that is $\frac{d\vec{r}}{dt}$. For the helix example above, the speed is given by
## $$ \left| \begin{pmatrix} -\sin(t) \\ \cos(t) \\ 1 \end{pmatrix} \right| = \sqrt{(-\sin(t))^2 + (\cos(t))^2 + 1^2} = \sqrt{2} $$
### In this case, the particle has constant speed $\sqrt{2}$ throughout its trajectory.

## Unit tangent vector
### Sometimes we need to find the ***unit tangent vector***, meaning a vector of unit length that is tangent to a trajectory. It is common to write $\hat(T)$ for the unit tangent vector. How do we obtain this vector given the parametric equation for $\vec{t}(t)$?
### We have already seen that the velocity vector $\vec{v} = \frac{d\vec{r}}{dt}$t is tangent to the particle's trajectory. Therefore, we can obtain the unit tangent vector by rescaling $\vec{v}$  to have unit length.
### Definition 7.1 (Unit Tangent Vector)
### If $\vec{r}$  is the position of a particle at time $t$, then the unit tangent vector to the particle's trajectory at time $t$ is given by
## $$ \hat{T}(t) = \frac{\vec{v}}{\left|\vec{v}\right|} = \frac{\frac{d\vec{r}}{dt}}{\left|\frac{d\vec{r}}{dt}\right|} $$
### Example 7.2
### For example, if $x(t)=t$ and $y(t)=-5 t^2$ then we have $\vec{v}=\begin{pmatrix}1 \\ -10t \end{pmatrix}$. This vector has length $\sqrt{1^2 + (-10 t)^2} = \sqrt{1 + 100 t^2}$, so the unit tangent vector is obtained by dividing $\vec{v}$  by $\sqrt{1 + 100 t^2}$:
## $$ \hat{T}(t) = \begin{pmatrix} \frac{1}{\sqrt{1 + 100 t^2}} \\ \frac{1}{\sqrt{1 + 100 t^2}} \end{pmatrix} $$

## Lines in 3D

### Suppose we have two points, $Q_0$ and $Q_1$, and we want to describe the motion of a particle that moves in a straight line from $Q_0$ to $Q_1$ in 3D. For example, suppose $Q_0 = (-1, 2,2)$ and $Q_1=(1,3,-1)$. Let $Q(t)$ be the position of the moving point at time $t$. If we assume $Q(0) = Q_0$, $Q(1) = Q_1$, and that the point moves at constant speed from $Q_0$ to $Q_1$, then we can say that
## $$ \vec{Q_0 Q(t)} = t \vec{Q_0 Q_1} $$
### In words, the vector from $Q_0$ to $Q_1$ is equal to $t$ times the vector from $Q_0$ to $Q_1$.
### It follows that, using the notation $Q(t) = (x(t), y(t), z(t))$,
## $$ x(t) = -1 + 2 t $$
## $$ y(t) = 2 + t $$
## $$ z(t) = 2 - 3 t $$
### This can also be written as
## $$ Q(t) = Q_0 + t\vec{Q_0 Q_1} $$
### In this form, the straight-line trajectory is more evident.


## Descriptions of Curves

### We now have several ways of describing a curve. Consider the unit circle:
![Unit Circle](img/unit-circle.png)
### The image above may be called a “graphical description". But for doing calculus, we need descriptions that use equations. We have two methods available to us:
### - As a level curve
### We can describe this curve as the solution set to $x^2 + y^2 = 1$. In fact this is a level curve of the function $g(x,y) = x^2 + y^2$.
### - As a parametric trajectory
### We can also describe this curve as a parametric trajectory. In this case, we have $\vec{r}(t) = \begin{pmatrix} \cos(t) \\ \sin(t) \end{pmatrix}$. Describing a curve as a parametric trajectory is sometimes called “parameterizing the curve."

### - Comparison
### Both methods are useful descriptions of the curve. But they don't contain exactly the same information. The “level curve" description only describes a subset of the plane, whereas the “parametric trajectory" description additionally describes the motion of a moving point (which has speed, velocity, etc.). Both descriptions give completely different methods for finding tangent vectors.
### ***Question 1***: 
### How do you find the tangent vector to a curve in the plane when it is given as a parametric equation?
### Answer: Suppose $C$ is the trajectory of $\vec{r}(t)$. To find the tangent vector at a point $(x_0, y_0)$, first find $t_0$ such that $\vec{r}(t_0) = (x_0, y_0)$. Then compute $\vec{v}(t) = \frac{d\vec{r}}{dt}$, and the vector $\vec{v}(t_0)$ will point in a direction tangent to $C$ at the point $(x_0, y_0)$.
### ***Question 2***: 
### How do you find the tangent vector to a curve in the plane when it is given as a level curve?
### Answer: Suppose $C$ is a curve described by the equation $g(x, y) = k$. To find the tangent vector at a point $(x_0, y_0)$, first compute the gradient $\nabla g(x, y)$. Then the vector $\nabla g(x_0, y_0)$ will be normal to the curve $C$ at the point $(x_0, y_0)$. Rotating this normal vector by $\pm \frac{\pi}{2}$ gives us the desired tangent vector.

### Note that, in three dimensions, the equation $g(x,y,z) = k$ describes a level surface, rather than a curve. Therefore, when working in three dimensions, we (almost) always exclusively use parametric equations to describe curves.

## Differential of y = f(x)
### We know from single-variable calculus that if $y=x^2$ then $\frac{dy}{dx}=2x$. It is common to think of $\frac{dy}{dx}$ as just the Leibniz notation for the derivative. But we sometimes also write $dy=2 x dx$, obtained by "multiplying by $dx$". Is this an abuse of notation, or is there some deeper meaning?

### In fact, there is a deeper meaning. The symbols "$dy$" and "$dx$" represent the "infinitesimal versions" of $\Delta y$ and $\Delta x$. In contrast to "$dy$" and "$dx$", the symbols $\Delta y$ and $\Delta x$ can be given numerical values, and they mean that if you change $x$ by $\Delta x$, it causes $y$ to change by $\Delta y$. But when we replace the $\Delta$'s by d's, we express the fact that we are imagining these quantities tending towards zero.

## $$ \frac{dy}{dx} = 2x $$
### means
## $$ \frac{\Delta y}{\Delta x} \approx 2 x $$
### for small values of $\Delta x$
### In the same way
## $$ dy = 2 x dx$$
### means
## $$ \Delta y \approx 2x \Delta x$$
### for small values of $\Delta x$
### Using the $dy$ and $dx$ symbols, we can state linear approximation as an equality ($=$ instead of $\approx$). This equality becomes approximate once we replace the $dx$ and $dy$ with $\Delta x$ and $\Delta y$.



## Difference between dy,dx and Δy,Δx

### Based on the above example, it might seem unnecessary to use $dx$, $dy$ when we could have written the solution using the more familiar $\Delta x$ and $\Delta y$. Why the need for a new notation if it expresses the same thing?

### The answer is that writing $dy = 2 x dx $ expresses a more precise statement than writing $\Delta y \approx 2 x \Delta x$.
### The $\approx$ symbol is flawed because its meaning is highly context-dependent. In some contexts  means "equal up to the first decimal point", but in other contexts, it could mean "equal up to the sixth decimal point". Neither of these is what we mean in this context. In particular, we mean that the accuracy of the approximation gets better and better as $\Delta x$ gets closer and closer to $0$.

### In other words, writing $dy = 2 x dx $ expresses something about a limit, whereas $\Delta y \approx 2 x \Delta x$ does not.

### Using differentials is particularly useful in multivariable calculus, because we sometimes need to keep track of multiple linear approximations at one time. The symbols $dx$ and $dy$ work as placeholders to tell us where the numerical values of $\Delta x$ and $\Delta y$ will go in the end.


## Differential of $f(x, y, z)$

### The language of differentials is particularly useful for understanding functions of several variables.

### Suppose we have a quantity $f$ that depends on $x$, $y$ and $z$, say $f = f(x, y, z)$. Then the "differential of $f$" is as follows:
## $$ df = f_x dx + f_y dy + f_z dz $$

### How to understand this notation? The equation expresses the fact that if we change $x$,$y$ and $z$ by small amounts $\Delta x$, $\Delta y$ and $\Delta z$ then it will cause a change in $f$ that is approximately equal to
## $$ \Delta f \approx f_x \Delta x + f_y \Delta y + f_z \Delta z $$

### where $f_x$, $f_y$, $f_z$ are the partial derivatives of $f$ at the starting point. Furthermore, it expresses that the approximation gets better and better as $\Delta x$, $\Delta y$ and $\Delta z$ shrink to $0$.


## Infinitesimal interpretation

### Although the notion is not completely precise, it is sometimes helpful to think of each differential as representing an “infinitesimal change." If it were possible to change $x$, $y$ and $z$ by infinitesimal amounts $dx$, $dy$, and $dz$, then, in some sense, it would cause an infinitesimal change in $f$ in the amount of $df$.



## Placeholder interpretation

### Perhaps a more useful interpretation is to think of these differentials as placeholders that track how changes in each input variable $x$, $y$ and $z$ cause changes in the output variable $f$. When we need to do an actual approximation, we will replace the differentials with numerical values, and replace the $=$ sign with an $\approx$ sign.

### When $f$ is a function of several variables, $df$ is known as a "total differential". The total differential can also be written as:
## $$ df = \frac{\partial f}{\partial x} dx + \frac{\partial f}{\partial y} dy + \frac{\partial f}{\partial z} dz $$


## Multivariable Chain Rule

### The ***multivariable chain rule*** is needed when we need to differentiate a function whose inputs are controlled by another variable. Imagine a function that depends on $x$, $y$ and $z$ such as $f = f(x, y, z)$. Now imagine that we cannot control $x$, $y$ and $z$ directly and instead they each depend on a parameter $t$. This means changing the variable $t$ will cause the function $f$ to change, and we would like to know the corresponding rate of change $\frac{df}{dt}$.

### One way of finding $\frac{df}{dt}$ is to use differentials. We know:

## $$ df = f_x dx + f_y dy + f_z dz $$

### Now “divide everything by $dt$" to obtain a formula for $\frac{df}{dt}$:

## $$ \frac{df}{dt} = f_x \frac{dx}{dt} + f_y \frac{dy}{dt} + f_z \frac{dz}{dt} $$

### There you have it: this is the multivariable chain rule (at least, one manifestation of the multivariable chain rule). It gives us a recipe for finding $\frac{df}{dt}$ in terms of the intervening rates of change.

## Example 4.1
### An example might make the idea more clear. Imagine a box of height $y$ with a square base of width $x$. Then the volume is given by $V = x^2y$. Now suppose that we cannot control the values of $x$ and $y$ directly, but they each depend on a parameter $t$. In this example, let's imagine $x = (1+t)^2$ and $y = 3 t$. Now changing the variable $t$ will cause the volume $V$ to change, and we would like to know the corresponding rate of change, that is, the value of $\frac{dV}{dt}$.

### We can use differentials. We know:
## $$ dV = V_x dx + V_y dy $$
### Now “divide everything by $dt$" to obtain a formula for $\frac{dV}{dt}$:
## $$ \frac{dV}{dt} = V_x \frac{dx}{dt} + V_y \frac{dy}{dt} $$
### We find $V_x$ and $V_y$ from $V = x^2y$. We find $\frac{dx}{dt}$ from $x = (1+t)^2$ and $\frac{dy}{dt}$ from $y = 3 t$.
## $$ \frac{dV}{dt} = \underbrace{(2xy)}_{V_x} \underbrace{(2(1+t))}_{\frac{dx}{dt}} + \underbrace{(x^2)}_{V_y} \underbrace{(3)}_{\frac{dy}{dt}} $$
### Now we write everything in terms of $t$ to obtain:
## $$ \frac{dV}{dt} = (2) (1 + t)^2 (3 t)(2(1+t))+(1+t)^4 $$
### which simplifies to
## $$ \frac{dV}{dt} = 12 t (1+t)^3 + 3(1+t)^4 $$


## Chain Rule in Pictures

### The following diagram shows how changing $x$ and $y$ causes $V$ to change.
![Chain Rule 1](img/chain-rule-1.png)

### If $x$ and $y$ depend on $t$, then the change in $t$ indirectly causes a change in $V$. The following diagram shows how changing $t$ causes $V$ to change. Notice the “chain" of variables, giving rise to the “chain rule".
![Chain Rule 2](img/chain-rule-2.png)

## What is the statement of the chain rule?
### The multivariable chain rule differs from the single-variable chain rule because, rather than a single formula, it represents a general principle. In words, the theorem behind the chain rule says ***any partial derivative can be computed by looking at the chain of transformations that take the input to the output, and forming the product of the partial derivatives for each link in the chain*** (as shown in the diagrams above). It is possible to package this statement into formulas, which are given at the end of this lecture.

## Why is the chain rule true?

### On the previous page, we claimed that you could “divide everything by $dt$" to obtain the chain rule. To justify this requires clearly interpreting differentials. However, the justifications given here are not entirely rigorous, and are just meant to give you an idea of what is going on when we write equations with differentials.

### 1st attempt
## $$dx = x'(t) dt, dy = y'(t)dt, dz = z'(t) dt $$
### By substitution, we obtain the "chain rule" statement made on the previous page.

## $$ df = f_x x'(t) dt + f_y y'(t) dt + f_z(t) z'(t) dt $$
## $$ (f_x x'(t) + f_y y'(t) + f_z z'(t)) dt $$

### Now we have an equation for $df$ in terms of only $t$ and $dt$. The coefficient on $dt$ must be the derivative of $f$ with respect to $t$

## 2nd attempt

### Another way to convince ourselves of the chain rule is to replace the $d$'s by $\Delta$'s. This removes any ambiguity about the meaning.
### We know from linear approximation that

## $$ \Delta f \approx f_x \Delta x + f_y \Delta y + f_z \Delta z$$
### Dividing by $\Delta t$ gives
## $$ \frac{\Delta f}{\Delta t} \approx \frac{f_x \Delta x + f_y \Delta y + f_z \Delta z}{\Delta t} $$
## $$ \frac{\Delta f}{\Delta t} \approx f_x \frac{\Delta x}{\Delta t} + f_y \frac{\Delta y}{\Delta t} + f_z \frac{\Delta z}{\Delta t} $$
### Now if we imagine $\Delta t$ moving towards zero, we can replace all $\Delta$'s with $d$'s to get a correct statement.

## Chain rule with more variables

### The multivariable chain rule also comes up if the inputs to the function depend on more than one parameter. Imagine a quantity $w$ is given by a function of $x$ and $y$ as $w=f(x,y)$. Now suppose we cannot control $x$ and $y$ directly, but they each depend on two variables $u$ and $v$.
### ***Question***
### How can we write $\frac{\partial w}{\partial u}$ and $\frac{\partial w}{\partial v}$ in terms of the rate of change of $w$ (derivatives $\frac{\partial w}{\partial x}$ and $\frac{\partial w}{\partial y}$) and the rate of change of $x$ and $y$ (derivatives $x_u$,$x_v$ and $y_u$,$y_v$)?
### We will give a specific example later on this page. First let's solve the problem in general.


## Using differentials

### Differentials give us an indirect way to keep track of everything. First we will write the total differential of $w$, then substitute in the total differentials of $x$ and $y$, and rewrite the equation to elicit the coefficients on $du$ and $dv$.
### We start with the total differential of $w$:
## $$ dw = f_x dx+f_y dy $$
### Next we can replace the differentials $dx$ and $dy$ by their total differentials:
## $$ dw = f_x \left( \underbrace{x_u du + x_v dv}_{dx} \right) + f_y \left( \underbrace{y_u du + y_v dv}_{dy} \right) $$
### Now recollecting terms:
## $$ dw = (f_x x_u + f_y y_u) du + (f_x x_v + f_y y_v) dv $$
### It follows that the coefficients on $du$ and $dv$ are the unknown partial derivatives:
## $$ dw = (\underbrace{f_x x_u + f_y y_u}_{\frac{\partial w}{\partial u}}) du + (\underbrace{f_x x_v + f_y y_v}_{\frac{\partial w}{\partial v}}) dv $$
### Thus we have
## $$ \frac{\partial w}{\partial u} = f_x x_u + f_y y_u $$
## $$ \frac{\partial w}{\partial v} = f_x x_v + f_y y_v $$

### Or by rewriting the above results using $\partial$ notation:
## $$ \frac{\partial f}{\partial u} = \frac{\partial f}{\partial x} \frac{\partial x}{\partial u} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial u} $$
## $$ \frac{\partial f}{\partial v} = \frac{\partial f}{\partial x} \frac{\partial x}{\partial v} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial v} $$


## Polar Coordinates

### A useful application of the chain rule would be when we need to switch between rectangular and polar coordinates.
### Suppose a quantity $f$ varies in the plane with $x$ and $y$. Perhaps we already know $\frac{\partial f}{\partial x}$ and $\frac{\partial f}{\partial y}$, but what we really want to know are the partial derivatives of $f$ with respect to the polar coordinates $r$ and $\theta$. The chain rule gives us a way to find these partial derivatives without writing $f$ explicitly in terms of $r$ and $\theta$.

### In particular, the chain rule tells us that
## $$ \frac{\partial f}{\partial r} = \frac{\partial f}{\partial x} \frac{\partial x}{\partial r} + \frac{\partial f}{\partial y} \frac{\partial y}{\partial r} $$
### This equation follows from the "more variables" version of the chain rule.
### Next, we have
## $$ x = r \cos(\theta) $$
## $$ y = r \sin(\theta) $$
### Therefore we have
## $$ \frac{\partial x}{\partial r} = \cos(\theta) $$
## $$ \frac{\partial y}{\partial r} = \sin(\theta) $$
### So if we know $\frac{\partial f}{\partial x}$ and $\frac{\partial f}{\partial y}$ we can write $\frac{\partial f}{\partial r}$ as:
## $$ \frac{\partial f}{\partial r} = \underbrace{\frac{\partial f}{\partial x}}_{f_x} \cos(\theta) + \underbrace{\frac{\partial f}{\partial y}}_{f_y} \sin(\theta) $$
### In a similar way it is possible to write $\frac{\partial f}{\partial \theta}$ in terms of $\frac{\partial f}{\partial x}$ and $\frac{\partial f}{\partial y}$.

## Statement of Chain Rule

### There are many "chain rules" in multivariable calculus, because of the many different possibilities for the number of input/output variables. Let's look at the statement in some of the most common cases that we already covered. At the end of this page we have included the most generalized statement of the chain rule.

## Example: From 1 variable to 2 variables to 1 variable

### Let's look at how the chain rule manifests when we have a quantity $z$ that depends on two variables, $x$ and $y$, which each depend on a single variable, $t$. The chain rule says that the (single-variable) derivative of $z$ is given by
## $$ \frac{\partial z}{\partial t} = \frac{\partial z}{\partial x} \frac{\partial x}{\partial t} + \frac{\partial z}{\partial y} \frac{\partial y}{\partial t} $$
### How to remember this formula? First, we imagine the following diagram which shows the dependencies between the variables.
![Chain Rule](img/chain-general-1.png)

### Then for each one of the paths from $t$ to $z$, we have a term in the formula for $\frac{\partial z}{\partial t}$. By adding up the relevant partial derivatives, we obtain the total expression for $\frac{\partial z}{\partial t}$.
### We have seen an example of this situation, where there was an output $V=xy^2$ and $x$ and $y$ each depended on a variable $t$.
## Example: From 2 variables to 2 variables to 1 variable
### Now let's look at how the chain rule manifests when we have a quantity $z$ that depends on two variables, $x$ and $y$, which each depend on two variables $a$ and $b$. The chain rule says that the partial derivatives of $z$ with respect to $a$ and $b$ are given by:
## $$ \frac{\partial z}{\partial a} = \frac{\partial z}{\partial x} \frac{\partial x}{\partial a} + \frac{\partial z}{\partial y} \frac{\partial y}{\partial a} $$
## $$ \frac{\partial z}{\partial b} = \frac{\partial z}{\partial x} \frac{\partial x}{\partial b} + \frac{\partial z}{\partial y} \frac{\partial y}{\partial b} $$
### $\frac{\partial z}{\partial y} \frac{\partial y}{\partial a}$ term is highlighted in blue to emphasize its connection to the following diagram:
![Chain Rule](img/chain-general-2.png)
### The term $\frac{\partial z}{\partial y} \frac{\partial y}{\partial a}$ arises because of the highlighted path from $a$ to $z$.
### In general we see the same pattern as before: to obtain the partial derivative of an output variable with respect to one of the input variables, we sum up each of the paths from input to output and multiply the corresponding partial derivatives.
### We have seen an example of this situation, where there was an output $V=xy^2$ and $x$ and $y$ each depended on two variables $a$ and $b$.
## More generally: From $n$ variables to $m$ variables to $1$ variable
### Both of the special cases stated above are examples of the following more general statement of the chain rule.
## Theorem (Chain Rule)
### Suppose $z$ is a quantity that depends on m variables, $y_1,\dots,y_m$ and each of the $y$'s depends on the $n$ variables $x_1,\dots,x_n$. Then the derivatives of $z$ are given by:
### For $1\leq i \leq n$
## $$ \frac{\partial z}{\partial x_i} = \sum_{j=1}^{m} \frac{\partial z}{\partial y_j} \frac{\partial y_j}{\partial x_i} $$
### When the theorem is written this way, it is often not clear how to apply it to a given problem. But you can always draw a diagram such as the ones pictured above, and then the theorem just says that the derivative of the output, with respect to one of the inputs, is found by summing up each of the m paths from the input to the output, with each path weighted by the product of the appropriate partial derivatives.
## What about more than one output?
### In each of the above examples, there was just one "output" variable $z$. If there are two or more output variables, say $z_1$ and $_2$, then you can use the above theorem for each variable separately. The overall rule is the same: we get the derivative of a given output with respect to a given input by summing up all the paths from the chosen input to the chosen output, weighted by the product of the appropriate partial derivatives.
## How to draw the diagrams?
### The diagrams above are sometimes known as “dependency graphs." They encapsulate how each variable depends on the other variables. Once you have a dependency graph, it is straightforward to use the chain rule, since we can look at each of the paths in the diagram and write down a corresponding term for the derivative.


## Generalized Chain Rule
### In its most general form, the Chain Rule is best stated in terms of Jacobian matrices. Namely, if we have transformations $\bf{T}$ and $\bf{W}$ taking input vectors to output vectors:
## $$ \underbrace{\vec{x}}_{\text{n variables}} \xrightarrow[]{\bf{T}} \underbrace{\vec{y}}_{\text{m variables}} \xrightarrow[]{\bf{W}} \underbrace{\vec{z}}_{\text{k variables}} $$
### Then the ***generalized chain rule*** says the Jacobian matrix of the transform $\bf{W}\circ\bf{T}$ is given by the product of the Jacobian of $\bf{W}$ and the Jacobian of $\bf{T}$.
### $$ \begin{array} \, \text{(Generalized chain rule)} & \text{Jacobian of}\,\bf{W}\circ\bf{T} = \text{Jacobian of}\,\bf{W} \cdot \text{Jacobian of}\,\bf{T} \end{array} $$
### Letting $\bf{J_T}$ stand for the Jacobian matrix of the transformation $\bf{T}$ (at the appropriate point), the generalized chain rule says:
## $$ \underbrace{\bf{J_{W\circ T}}}_{k\times n\text{matrix}} = \underbrace{\bf{J_W}}_{k\times m\text{matrix}} \cdot \underbrace{\bf{J_T}}_{m\times n\text{matrix}} $$
### Unpacking the statement slightly, the generalized chain rule just says that the linear approximation of a composite transformation can be done one transformation at a time. In other words, to approximate the value of $\vec{z}$  for a given $\vec{x}$ , we can first approximate the value of $\vec{y}$  (using the Jacobian of $\bf{T}$) and then use this $\vec{y}$  to make an approximation for the resulting value of $\vec{z}$  (using the Jacobian of $\bf{W}$). The matrix multiplication shows up in the chain rule because matrix multiplication corresponds to composition (applying one transformation, and then the other).
### Another way of understanding the role of matrices is to recognize all of the formulas on this page as "hidden dot products." By scrutinizing each of these dot products, one can package them all into one formula, leading to the matrix product formulation above.