# Calculus

### Learning Objectives:
- [Chain Rule](#Chain-Rule)
- [Partial Differentiation](#Partial-Differentiation)

In [2]:
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [3]:
# Calculus warm-up: testing your knowledge!

def x_squared(x_vals):
    return x_vals**2

def deriv(input_func, x_vals, dx=0.0001):
    dy = np.array([])
    for x_val in x_vals:
        current_dy = ##
        dy = ##
    return dy


# Test Code
x_vals = np.linspace(-5, 5, 1000)
y_vals = ##
dy_vals = deriv(x_squared, x_vals)

fig = go.Figure()
fig.add_trace(go.Scatter(x=x_vals, y=y_vals, name="Original Function"))
fig.add_trace(go.Scatter(x=x_vals, y=dy_vals, name="Derivative"))

SyntaxError: invalid syntax (<ipython-input-3-e7d313c01e4f>, line 9)

# Chain Rule

Sometimes, however, the functions we encounter are not as clean as the ones shown above. Often, they are what are known as __composite functions__, functions composed of sub-functions, such as the example below:

$$f(x) = sin(x)^{2}$$

We have a rule for calculating the derivative of $x^{2}$ and the derivative of $sin(x)$, but how do we go about this example? This is where the __chain rule__ comes in. When we don't know how a variable changes with respect to another, we can use the chain rule to determine this quantity by seeing how both quantities vary with some intermediary quantity.

Let us rewrite the equation, by allowing $u = sin(x)$, our intermediary quantity, getting the following:

$$f(u) = y = u^{2}$$

Let us consider an average slope calculation for now. We want to compute the average slope of $y$ with respect to $x$ $(\frac{\Delta y}{\Delta x})$. We can re-write it in the following way:

$$\frac{\Delta y}{\Delta x} = \frac{\Delta y}{\Delta x} \times 1 =\frac{\Delta y}{\Delta x} \times \frac{\Delta u}{\Delta u} = \frac{\Delta y}{\Delta u} \times \frac{\Delta u}{\Delta x}$$

Which, when we make the changes $\Delta y, \Delta x \text{ and } \Delta u$ very, very small, we get:

$$\frac{dy}{dx} = \frac{dy}{du} \times \frac{du}{dx}$$

So in the case of our example, we get the following:


$$(1) \; \frac{dy}{du} = \frac{d(u^{2})}{du} = 2u = 2sin(x)$$

$$(2) \; \frac{du}{dx} = \frac{d(sin(x))}{dx} = cos(x)$$

$$(3) \; \frac{dy}{dx} = \frac{dy}{du} \times \frac{du}{dx} = 2sin(x)cos(x)$$

What does this mean? We are using the variable $\mathbf{u}$ as an __intermediary function__ to determine the rate of change of $y$ with respect to $x$. If we know the rate of change of $y$ with respect to $u$ $(\frac{dy}{du})$, and we know exactly how $u$ changes with respect to $x$ $(\frac{du}{dx})$, we can use both rates of change to calculate the rate of change of $y$ with respect to $x$. In general, the chain rule for one intermediary variable is given as follows:

$$f'(x) = f'(g(x))g'(x)$$

$$\text{or}$$

$$\frac{dy}{dx} = \frac{dy}{du}\frac{du}{dx}$$

This is, however, not limited to only one intermediary variable, $u$, we can have multiple intermediary variables. However, we can simply apply the same rule as above multiple times. A common approach, informally known as __peeling the onion__, is the to differentiate the outer intermediary functions first, and progressively move inwards. Below we show an example with two intermediary variables:

$$y = cos(x^{2})^{4}$$

Which we can decompose as follows:
$$ y = u^{4}, \;\; u = cos(v), \;\; v = x^{2} $$

We find the respective derivatives:
$$ \frac{dy}{du} = 4u^{3}, \;\; \frac{du}{dv} = sin(v), \;\; \frac{dv}{dx} = 2x $$

Then we can determine our desired derivative:
$$\frac{dy}{dx} = \frac{dy}{du}\frac{du}{dv}\frac{dv}{dx} \\ = 4u^{3}sin(v)2x  \\ = 4cos(v)^{3}sin(v)2x \\ = 
8cos(x^{2})^{3}sin(x^{2})x    $$

### Chain-rule real-life analogy: COVID-19

Let us consider the following scenario. You are a data scientist trying to use the data you collected during the crisis to try and model the growth of COVID-19. After using certain statistical methods, you were able to determine that, without self-isolation measures or protective face masks, you can model the growth of COVID-19 deaths in early stages with the following function:

$$D(u) = e^{u} = e^{\sqrt{t}}$$
$$u(t) = \sqrt{t}$$

Where $D(u)$ is the number of COVID-19 deaths as a function of time, $t$ is time measured in days and $u(t)$ is an intermediary function.

This means that, according to our model, COVID-19 deaths grow exponentially with respect to the square root of time. But that is not sufficient analysis to base our policies on, so we decide we want to calculate __the rate of change of COVID-19 deaths as a function of time__, or in mathematical terms, $\frac{dD(t)}{dt}$. So how do we go about this? Let us first plot in two separate plots our function $D$ in terms of our intermediary function, $u$, alongside the function $u$ with respect to time, $t$.

In [4]:
# Determining our variables/functions 
t = np.linspace(1, 20, 1000)
u = np.sqrt(t)
D = np.exp(u)

In [None]:
# Visualization Code

fig = make_subplots(rows=1, cols=2)

fig.update_layout(title="COVID Deaths vs Time & Intermediary Function")
fig.update_yaxes(title_text="D(u)", row=1, col=1)
fig.update_xaxes(title_text="u", row=1, col=1)
fig.update_yaxes(title_text="u(t)", row=1, col=2)
fig.update_xaxes(title_text="t", row=1, col=2)

fig.add_trace(go.Scatter(x=u, y=D, marker_color="orange", name="$D(u)=e^{u}$"), row=1, col=1)
fig.add_trace(go.Scatter(x=t, y=u, marker_color="black", name="$u(t)=\sqrt{x}$"), row=1, col=2)

fig.update_layout(showlegend=True)
fig.show()

Now, all we need to do is simply calculate the derivative of each of the two graphs above, then according to the chain rule, we can get the rate of change of COVID-19 deaths with respect to time by their product. We can differentiate them given our little table above:

$$\frac{dD}{du} = e^{u}$$
$$\frac{du}{dt} = \frac{1}{2\sqrt{t}}$$

$$\frac{dD}{dt} = \frac{dD}{du} \times \frac{du}{dt} = \frac{e^{u}}{2\sqrt{t}} = \frac{e^{\sqrt{t}}}{2\sqrt{t}} $$

And there we have it! With this information, you are now satisfied with your model and can help Boris & co. to help tackle this crisis.

We have now shown you a nice derivation, a real-life example, but some of you may still be thinking: "Does this actually work? How can I be sure of it?", and I don't blame you, since it is quite an abstract concept. So let us bring back our _deriv(  )_ function that we created earlier. Since that Python function approximates the derivative of any function without the need of these rules and identities, we can use it to test out whether our concept works! Let us work through the example below together and prove to ourselves that our understanding is solid.

In [None]:
# CODING CHALLENGE

def covid_deaths(t):
    return ##

t = np.linspace(1, 20, 1000)
dD_chain_rule = ## (applying the mathematical result we got programatically)
dD_approx = ## (using deriv( ) function)

In [None]:
# Visualization Code

fig = go.Figure()

fig.update_layout(title="COVID-19 Deaths (Chain Rule vs. Approximation)")
fig.update_yaxes(title_text="D(t)")
fig.update_xaxes(title_text="t")

fig.add_trace(go.Scatter(x=t, y=dD_chain_rule, marker_color="orange", name="Chain Rule"))
fig.add_trace(go.Scatter(x=t, y=dD_approx, marker_color="black", name="Approximation", line=dict(dash="dash")))

fig.update_layout(showlegend=True)
fig.show()

# Partial Differentiation

So far we have only considered derivatives of functions with one variable. But we all know that this is not realistic: real-life quantities always depend on more than one variable. Gravitational pull depends on your weight _and_ how far from the Earth you are; how many episodes of Netflix you watch in one day depends on your availability, quality of shows, boredom; and even the motion of flying insects depends on much more than just 'orientation'. This leads us to the realm of __multivariable calculus__, or in more informal terms, how one quantity changes with respect to two or more other quantities.

So how do we determine how one variable changes with respect to another when it depends on multiple other variables? If we make all other variables constant except for one, we can determine that particular rate of change. This type of derivative is known as a __partial derivative__. Consider the 2-D function below:

$$f(x,y) = 2xy - x^{2}$$

In [None]:
# Visualization Code

def our_func(x, y):
    return 2*x*y - (x**2)

def surface_maker(x_vals, y_vals, func):
    rows = len(y_vals)
    cols = len(x_vals)
    z = np.zeros((rows, cols))
    for i in range(rows):
        for j in range(cols):
            z[i,j] = func(x_vals[j], y_vals[i])
    return z
            
x, y = np.linspace(0, 1, 100), np.linspace(0, 1, 100)
z = surface_maker(x, y, our_func)
fig = go.Figure(data=[go.Surface(z=z, x=x, y=y, surfacecolor=np.ones(len(x)), showscale=False)])
fig.update_layout(title='Multivariable Function', autosize=False,
                  width=1000, height=500,
                  margin=dict(l=65, r=50, b=65, t=90))
fig.show()

If we consider the function at $f(x, y=a)$, for some constant $a$:
$$f(x, y=a) = f(x) = 2ax - x^{2}$$
The derivative with respect to $x$ becomes:
$$\frac{df(x)}{dx} = 2a - 2x$$
The same applies for $f(x, y=b)$, for some constant $b$:
$$\frac{df(x)}{dx} = 2b - 2x \implies \frac{df(x)}{dx} = 2b - 2x$$

We can generalize this to any point along our surface:

$$\frac{\partial f(x, y)}{\partial x} = 2y - 2x$$

For any value of $y$, where the '$\partial$' symbol is known as the __curly d__, and represents __partial differentiation__. The same logic can be applied to the partial derivative of $x$. Let us first consider the function at $f(x=a, y)$:
$$f(x=a, y) = f(y) = 2ay - a^{2}$$
The derivative with respect to $y$ becomes:
$$\frac{df(y)}{dy} = 2a$$
Which we can generalize for any given value of $x$:
$$\frac{\partial f(x,y)}{\partial y} = 2x$$




So what do partial derivatives mean? Before for one variable, we saw that a derivative told us the gradient/slope at every given point, or the steepness of the function. In the case of partial derivatives, __we are determining the steepness in the direction of the variable we differentiate the function by.__ Consider the surface plot below:

In [None]:
# Visualization Code

x, y = np.linspace(0, 1, 100), np.linspace(0, 1, 100)
z = surface_maker(x, y, our_func)
fig = go.Figure(data=[go.Surface(z=z, x=x, y=y, surfacecolor=np.ones(len(x)), showscale=False)])
# Adding traces!
fig.add_trace(go.Scatter3d(z=[0.3, 0.2375], x=[0.5, 0.75], y=[0.5, 0.5], marker_color="orange", name="x-steepness"))
fig.add_trace(go.Scatter3d(z=[0.3, 0.55], x=[0.5, 0.5], y=[0.5, 0.75], marker_color="orange", line=dict(dash="dash"), name="y-steepness"))
fig.update_layout(title='Multivariable Function', autosize=False,
                  width=1000, height=500,
                  margin=dict(l=65, r=50, b=65, t=90))
fig.show()

In this case, the dashed orange trace represents the direction of $\frac{\partial f}{\partial y}$ and the solid orange trace represents the direction of $\frac{\partial f}{\partial x}$. Now that we understand the mechanisms behind partial differentiation, let us now explore the chain rule in multivariable calculus!

### Multivariable Chain Rule

Now that we have covered the meaning of calculus, the chain rule and partial differentiation, we will cover our last topic: __multivariable chain rule__. As the name implies, it is an extension of the chain rule that applies to functions dependent on multiple variables. In this case, __we can have multiple intermediary variables.__ Consider the example below:

$$f(x(t), y(t)) = x + y^{2}$$

$$x(t) = sin(t), \;\;\; y(t) = cos(t)$$

We can compute $\frac{df}{dt}$ by substituting the functions for $x(t)$ and $y(t)$, which results in:

$$f(x(t), y(t)) = sin(t) + cos^{2}(t) \implies \frac{df(x(t), y(t))}{dt} = cos(t) - 2sin(t)cos(t)$$

Something worth noting, however, is that the following formula returns the same result:
$$\frac{df(x(t), y(t))}{dt} = \frac{\partial f}{\partial x}\frac{dx}{dt} + \frac{\partial f}{\partial y}\frac{dy}{dt} = 1\times cos(t) + 2y\times(-sin(t)) = cos(t) - 2sin(t)cos(t)$$ 

Why does this identity work? Well, if we do not know how the rate of change of $f$ with respect to $t$, and we have two intermediary variables dependent on $t$, we need to determine how changing $t$ affects both those intermediary variables. This is more clearly visualized in the diagram below:

<img src="images/multi_var_change.png" alt ="calc_tree"
     width="500px" height="500px"/>

This same logic can be applied to functions with any number of intermediary variables, that themselves can have any number of variables! Consider, for instance:

$$z = f(x, y), \;\;\; x = g(s, t), \;\;\; y = h(s, t)$$

We can compute the derivatives $\frac{dz}{dt}$ and $\frac{dz}{ds}$:

$$\frac{dz}{dt} = \frac{\partial z}{\partial x}\frac{dx}{dt} + \frac{\partial z}{\partial y}\frac{dy}{dt}$$

$$\frac{dz}{ds} = \frac{\partial z}{\partial x}\frac{dx}{ds} + \frac{\partial z}{\partial y}\frac{dy}{ds}$$

However, as you can see, the more intermediary variables, as well as layers of intermediary variables we have, the harder to visualize it becomes. Therefore we can use a __tree diagram__ to represent the network of variables, as shown below for our example:

<img src="images/calculus_tree.png" alt ="calc_tree"
     width="500px" height="500px"/>