# Introduction to Derivatives

## Motion and its paradoxes

### Types of quantities involved in motion

#### Scalar vs Vector quantities

Let's talk about movement. Imagine that we are riding a bicycle, and we are at the point $A$ in space. We wish to reach the point $B$ in space. 

<img src="img/derivatives-01.png" alt="drawing" width="400"/>

Two two types of quantities are involved with such representation of our future movement. One is geometrical, called __displacement__, and it desribes the movement from $A$ to $B$ in space (in our case this is the 2D space of the screen where $B$ is on the right of $A$). Displacement is a __vector quantity__ which is usually represented with an arrow. All the displacements represented on the image below are different from each other, as they have different directions. 

<img src="img/derivatives-02.png" alt="drawing" width="400"/>

If we disregard the direction of the movement, we are talking about __distance__. Distance abstracts from the displacement by focusing only on how far you go (in any direction), or how far things are apart. Distance is a __scalar quantity__ which we describe with a number. Each vector quantity also involves a scalar quantity. The scalar quantity of a vector (of an arrow) is its length. On the image below, our vector's direction is indicated with an arrow pointing from $A$ to $B$. Its length is 300m, a scalar quantity corresponding to the __distance__ between points $A$ and $B$.

<img src="img/derivatives-03.png" alt="drawing" width="400"/>

#### Time

Getting from $A$ to $B$ cannot happen in an instant but requires some time to pass. Thus the idea of motion couples time and space. We describe it with a ratio, whose elements can be vector or scalar quantities. If we take into account the direction in which we are travelling, then we make a ratio of displacement over time. Since the displacement is a vector quantity, the ratio between displacement and time is also a vector quantity, called __velocity__. If we disregard the direction in which we are travelling, we should take a ratio of distance over time, called __speed__. Unlike velocity, speed is a scalar quantity.<br>
Let's try to describe an interaction between space and time through motion. We will be using scalar quantities: distance, time and speed.

### Average speed

If our ride from $A$ to $B$ took 20 seconds and $A$ and $B$ are 300 meters apart, the only thing we can express in terms of speed is the average speed of our ride which is 300m per 20s. As we have no other information about our trip, the graph displaying distance over time is a straight line passing through the origin. The consequence of this linearity is that the average speed of 300 meters in 20 seconds is the same as the average speed of 150 meters in 10 seconds. Although both of these ratios are valid ways to describe speed, a different convention is usually used. We express this ratio as a __rate__, which means that the denominator of the ratio should be one unit of quantity. To express speed as a rate, we need to find the distance for which the time is 1s. So we divide 300 meters by 20 seconds and get that the average speed between A and B is 15 meters per (one) second or 15m/s. We can say that the average __rate of change__ of our ride is 15m/s. 

<img src="img/derivatives-04.png" alt="drawing" width="500"/>

If we plot the average speed in meters per second (m/s) over the time of our ride, the speed becomes a horizontal line with a constant value of 15m/s over the whole time interval.

<img src="img/derivatives-05.png" alt="drawing" width="500"/>

### Average speed in different intervals

If we had available some kind of an instrument that could help us to precisely capture the relation between time and distance of our ride, its graph would not be a straight line but rather look something more like this:

<img src="img/derivatives-06.png" alt="drawing" width="500"/>

The average speed considered on the interval between $A$ and $B$ remains the same, as we still traverse 300 meters in 20 seconds. However, the curvature tells us that if we concentrate on smaller intervals, our speed within these 20 seconds varies. Our bicycle will at first not move at all; then it will speed up covering  around 20 meters in the first 5 seconds; then it would accelerate further, covering more than 100 meters in 5 seconds; finally, it would slow down before completely stopping. 

By considering different time intervals within our ride from $A$ to $B$ we can characterise the same ride by several of different average speeds. Therefore, if we subdivide our ride into 4 time-intervals, we will obtain 4 values of average speed, expressible in meters per second. 

<img src="img/derivatives-07.png" alt="drawing" width="500"/>

If we isolate the grey triangles from the graph above and focus on the lines that pass through their diagonals, we get another interesting insight. The greater the angle the diagonal line makes with the horizontal axis, the greater the average speed. Furthermore, the ratio between the distances and the fixed time interval gives us the __slope__ of the line, which is usually described as $m=\frac {rise}{run}$. The larger the slope, the greater the speed and vice versa.

<img src="img/derivatives-08.png" alt="drawing" width="500"/>

By dividing the distance by time for each interval (by calculating slope), we obtain an average speed for each interval. We can plot our 4 average speed values over the time of our ride and achieve a better approximation than what we had previously. 

<img src="img/derivatives-09.png" alt="drawing" width="500"/>

Here it is clear that in the first and the last 5 seconds of our ride, our speed was considerably lower than in the middle of the ride. 

### Improving the approximation

The more we subdivide our 20 seconds time interval, the more average speed values we obtain. On the image below, we have subdivided it into 20 one second intervals. Notice that all the values we obtain are represented in m/s: the same unit m/s is used to represent average speed over the interval of the whole ride, and to describe an average speed on a very short interval. The more time intervals we have, the more updates of the average speed we obtain.

<img src="img/derivatives-10.png" alt="drawing" width="500"/>

Notice how the height of the triangle changes, increasing the angle of the diagonal, while the width of the base remains constant. This indicates a change in the average speed over fixed time intervals.

<img src="img/derivatives-11.png" alt="drawing" width="550"/>

With the time interval of 1 second, we are getting a much better approximation of how fast we are riding around each moment in time. Although our graph is still composed of straight lines, we can imagine that if we were to reduce the interval even further, our graph would better approximate a smooth curve.

<img src="img/derivatives-12.png" alt="drawing" width="500"/>

### Towards instantaneous speed

Now, things are becoming interesting. You might already have an idea that if you take a really short interval, for example, 1/100 of a second, this would be __almost__ the same as your instantaneous speed at that point of time. This is true—the more you subdivide the time (thus obtain more intervals), the better approximation you achieve of your so called __instantaneous speed__.

At first, this seems to be a reasonable way of thinking. But with taking a closer look at it, you might get confronted with a very challenging question:
> *Can there be such a thing as an instantaneous speed?*

From our current perspective of thinking about speed as an average between two points in time, the idea of an instantaneous speed introduces a paradox. 
We described speed as a rate of change, and for something to change, time needs to pass. But instantaneous, on the contrary, means that we are interested in a single point in time, not an interval which requires at least 2 points in time! Also, a single point has no length, which is the same as zero length. So to find this so-called instantaneous speed, we would need to divide the distance by zero time, which mathematics does not allow us to do. On the other side, our intuition tells us that if we move, there should be a speed associated with our movement at any moment in time.

## Infinitesimal calculus

Before the 17th century, this paradox was unsurpassable in terms of computation. The tools mathematicians had at their disposal had not been sufficient to deal with this and similar paradoxes. However, in the mid 17th century, Isaac Newton and Gottfried Leibniz realised that there is a different way to think about paradoxes like this. They introduced a purely symbolical quantity which stands for something larger than zero but smaller than any number that you can concretely specify. They called in an __infinitesimal__. When running into problems that involve values that eventually end up leading to zero, one could use this symbolic quantity to represent them, thus bypass the paradox and the division by zero. The invention of this quantity has opened up new ways to think about modelling the world around us, and many of our fundamental concepts such is speed changed dramatically. Their study became known as __infinitesimal calculus__, and its methods allow us to address concepts such are instantaneous speed. It allows us to consider a change within a symbolically constructed virtual interval which is as close as possible to zero, but not zero. Calculus allows us to consider the idea of instantaneous speed as meaningful and operable.

Let's see how we can introduce the idea of an infinitesimal to our previous example. First, we will zoom into the marked region of the graph below. Notice that the more we zoom in (the same as choosing a smaller interval), the red curve appears to straighten out. Let's pick a point in time within our interval. Instead of using a specific point like 11.6 seconds we will use a variable $t$, meaning that $t$ will serve as a placeholder for _any_ value of time on our graph, and we can slide it left and right as we wish.

For our time value $t$, there is a corresponding distance value. Since $t$ is not a specific value, but a variable (its value can vary), the corresponding distance must also be a variable, that needs to vary in accordance the value of time at $t$. We could name this variable $s$ or use any other symbol, but let's call it $s(t)$ for two good reasons. The first is that $s(t)$ clearly indicates that the time and distance are coupled together. The second reason is that in mathematics $s(t)$ means that the distance is represented as a __function of time__, so that for each time value, there is a corresponding distance value. This representation allows us to use any function we wish to model the relation between distance and time, so we can now put aside our imaginary instruments that allowed us to capture the time and distance of our ride. 

<img src="img/derivatives-13.png" alt="drawing" width="900"/>

### Numerical approximation

Now, if we wanted to numerically approximate the speed __around__ the point t, we could do the following:
- choose a small time interval, for example, 0.001s (marked with a purple line on the illustration above)
- add it to the value of $t$ so that we now have two points in time: 
    - the first point in time is $t$ 
    - the second point in time is 0.001 seconds later which can be described as $t+0.001$
- find the corresponding distance value for both points in time:

    - for the time value of $t$, the corresponding distance value is $s(t)$. This will be some concrete number.
    - for the time value of $t+0.001$, the corresponding distance value is $s(t+0.001)$ This will also be some concrete number.

Now we are interested in how much did the distance value change as the time passed from the moment $t$ to the one 0.001 seconds after it. To find this, we can simply subtract the distance at the moment of time $t$ (before) from the distance at the moment of time $t+0.001$ (later):
\begin{eqnarray} \\ s(t+0.001)-s(t) 
\end{eqnarray}

This value corresponds to the green line in the drawing above.

To approximate the speed around the moment in time $t$ we need to divide the change in distance with the corresponding change in time that has passed. We have already specified that our change in time is 0.001s, therefore, our approximate speed around $t$, given the approximation interval of 0.001 seconds is equal to:

\begin{eqnarray} \ 
\frac{s(t+0.001)-s(t)}{0.001}
\end{eqnarray}


The value in the denominator defines the change in time, while the expression in the numerator expresses the corresponding change in distance.

#### A concrete example of numerical approximation

Before we go further, let's see a short example of how numerical approximation works. For the sake of simplicity, let's say that our function relating time and distance is a simple parabola described by the formula $s(t)=t^2$. This means that in 1 second, we will traverse 1 meter, in 2 seconds 4 meters, in 4 seconds 16 meters and so on. This is not very realistic but it is easy to compute and illustrate.

<img src="img/derivatives-14.png" alt="drawing" width="500"/>

Here we create a python function `s(t)` that outputs the value of $s(t)=t^2$ for any input $t$:

In [1]:
def s(t):
    return t*t

For the input value of 4, the function will output the squared value which should be 16:

In [3]:
s(5)

25

Let's introduce a variable `interval` which will hold the current time interval we are considering. Let us first set its value to 3 seconds.

In [5]:
interval = 3 #interval in seconds

To approximate speed around the point in time $t=4$ we simply need to compute `( s(4+interval) - s(4) ) / interval` which is the same as:

\begin{eqnarray} \ 
\frac{s(4+3)-s(4)}{3}
\end{eqnarray} 

In [6]:
(s(4+interval) - s(4)) / interval

11.0

At the moment in time $t=4s$, considering the time interval of 3 seconds, we obtain the approximate speed around $t$ to be 11m/s. Now, let's make the interval smaller and smaller, and see what happens to the speed approximation.

In [7]:
for interval in [3, 2, 1, 0.1, 0.01, 0.001, 0.0000001]:
    speed = (s(4+interval)-s(4)) / interval
    print (f'for the time interval of {interval}s, the approximate speed is {speed}m/s ')

for the time interval of 3s, the approximate speed is 11.0m/s 
for the time interval of 2s, the approximate speed is 10.0m/s 
for the time interval of 1s, the approximate speed is 9.0m/s 
for the time interval of 0.1s, the approximate speed is 8.099999999999987m/s 
for the time interval of 0.01s, the approximate speed is 8.009999999999806m/s 
for the time interval of 0.001s, the approximate speed is 8.0010000000037m/s 
for the time interval of 1e-07s, the approximate speed is 8.000000129015916m/s 


What we can notice, is that as the time interval becomes smaller, the average speed value becomes closer to the value of 8m/s.

However, if we reduce our interval to zero, we cannot compute the speed anymore as it would mean dividing by zero.

\begin{eqnarray} \ 
\frac{s(x+0)-s(x)}{0} = \frac{s(x)-s(x)}{0} = \frac {0}{0}
\end{eqnarray}

If we set the variable `interval` to `0` the function will break and output an error:

In [8]:
interval = 0
(s(4+interval)-s(4)) / interval

ZeroDivisionError: division by zero

With these two cases in mind, we can finally properly address the question at stake in this case:<br>
<br>
__Considering that our time interval cannot be zero, what would be _the best possible approximation_ of the speed around the moment in time $t$?__

## Derivative

### Symbolising the idea of infinitely small

To find the best approximation, we cannot any longer use a concrete value to represent the small change in time, such is 0.001s or 0.0000001s. First, we need to turn this value in into a variable. For historical reasons, we will start by using the symbol $dt$ to represent a very small value that is larger than 0 but smaller than any decimal number we can concretely specify. 

Now, we have two values in time $t$ and $t+dt$ with the corresponding values of distance $s(t)$ and $s(t+dt)$. <br>
To compute the speed around the point t, we need to divide the change in the distance by the change in time that caused it:

\begin{eqnarray} \ 
\frac{s(t+dt)-s(t)}{dt}
\end{eqnarray}

This formula is illustrated below, and it simply means dividing the length of the green line with the length of the purple line.

<img src="img/derivatives-15.png" alt="drawing" width="350"/>

This formula __almost__ alows us to compute the best approximation of speed around the point $t$ but it's not quite  there yet.

### The idea of a variable approaching a value

In historical context, the $d$ in $dt$ has a special meaning which allows us to transition from an approximation of some value towards its precise value. This very transition is very much what calculus is all about. 

The idea of this historical $dt$ is that it is a __dynamic__ variable in the sense that we can imagine it moving towards a certain value. In our case where we need the time intervals to become smaller and smaller (as this improves the approximation of the speed), we can let the value of the time interval $dt$ move towards zero (purple line below). Since $dt$ is now getting closer and closer to zero but never actually reaching it, we say that the infinitesimal __$dt$ is approaching zero.__

![SegmentLocal](img/derivatives-16.gif "segment")

Now, let's see how this dynamism changes our equations. First, let's look once again at our expression for approximating the speed at the point in time $t$:

\begin{eqnarray} \ 
\frac{s(t+dt)-s(t)}{dt}
\end{eqnarray}

Written like this, our expression still has a rather static character.

By defining $dt$ as a dynamic variable that approaches zero, the expression in the numerator $s(t+dt)-s(t)$, whose value is directly dependent on the value of $dt$ needs to become a dynamic variable too! For the same reason that we have named the "tiny" change in time $dt$, we can now define $ds$ (green line) as the the corresponding "tiny" change in distance $s(t+dt)-s(t)$.

 __The main point to understand here is that while $dt$ is approaching zero, $ds$ will (in most cases) also approach some value!__ The value that the ratio between the tiny change of distance $ds$ and the tiny change of time $dt$, symbolised as $\frac{ds}{dx}$ __is approaching__, as $dt$ approaches zero is called a __derivative__ of the function $s(t)$. The derivative of the function $s(t)$ gives us, in fact, _the best possible approximation of the speed_ at any single point in time $t$! 

#### Caution!

If we now wished to write down mathematically what derivative is, we might be tempted to write the previous statement as:

\begin{eqnarray} \ 
\frac{ds}{dt} = \frac{s(t+dt)-s(t)}{dt}
\end{eqnarray}

but the formula above is not what derivative is! The formula above is an equation. The equation states what the ratio $\frac{ds}{dx}$ is __equal to__. However derivative written in this way is not about equality of two expressions!

### The true derivative of the function $s(t)$

__The true derivative of speed is what the ratio $\frac{ds}{dt}$ is approaching while ${dt}$ is approaching zero.__ It is about what one thing is approaching ($ds$) while another thing ($dt$) is approaching something else, given that the second thing is dependent on the first one. As you can see, this idea is quite intricate. Also, now you might see another difficulty more clearly: we simply have no means to precisely write down the "approaching" aspect of the derivative. To bring it even further: although we might have a pretty clear idea what a variable approaching some value means, it is something entirely different to define it mathematically. 

The idea of having a variable approaching some value was mathematically formalised in 1821 by Augustin-Louis Cauchy, who was followed by Karl Weierstrass. They introduced a notion of a function's __limit__ which allows us to precisely write down the derivative of any function. To see how they achieved this you can check out the notebook 'Epsilon-delta'.

For our current purposes having an idea of a variable approaching some value is enough, but let's see how we can describe this idea rigorously by using the __limit notation__. In our case the derivative of the function $s$, is rigourosly written as: 

\begin{eqnarray} \ 
\frac{ds}{dt} \; \equiv \; \lim_{dt\to 0}\frac{s(t+dt)-s(t)}{dt}
\end{eqnarray}

We read this as: "The derivative of the function $s$ in terms of $t$ is a limit of $\frac{s(t+dt)-s(t)}{dt}$ as $dt$ approaches zero. <br> This has the same meaning as: The derivative of the function $s$ in terms of $t$ is what $\frac{s(t+dt)-s(t)}{dt}$ is approaching as $dt$ approaches zero.

There is another notational change that we might want to adopt. The infinitesimal was historically considered to be a special kind of variable, one that unlike the others, approaches but never reaches some value. For that reason, every time we used $dx$ or $dt$ in a concrete equation and computed with it, it would, change the meaning of our typical equation into something dynamic, something of a different nature. The epsilon-delta definition of a limit and the limit notation changes this. By using limit notation we are introducing beforehand that we are no longer working with a typical static equation. The expression written below the $\lim$ (in our case $dt\to 0$) tells us which variable in our expression will be approaching what value. In this way, the dynamic aspect of infinitesimal is taken away from the actual variable and become an instrument of the limit itself. For this reason, we do not any longer need to use variable such is $dt$ when using limit notation. We can, in fact, use any variable! Let's rewrite the same definition by using the "normal" variable $h$:

\begin{eqnarray} \ 
\frac{ds}{dt} \; \equiv \; \lim_{h\to 0}\frac{s(t+h)-s(t)}{h}
\end{eqnarray}

By making this change, we can now think of the expression $\frac{ds}{dt}$ on the left as an _operator_. This operator indicates that we are interested in a derivative of the function $s$ in terms of $t$. The operator itself maps one function to another function which is a derivative of it. The operator itself can be written in more than one way, and they are all equivalent. 

\begin{eqnarray} \ 
\frac{ds}{dt} \; \equiv \; \frac{d(s)}{dt} \; \equiv \; \frac{d}{dt}s(x) \; \equiv \; s'(x)
\end{eqnarray}

Later on, we will see how thinking of the symbol such is $\frac{ds}{dt}$ as an operator allow us to simplify many a calculation.

#### A concrete example of computing a derivative at a point

Let's say that our function that relates time and distance is again the parabola $s(t)=t^2$. Let's find its derivative at the moment $x=4$

\begin{align*} 
\require{cancel}
\frac{ds}{dt}(4)&=\lim_{h\to 0}\frac{s(4+h)-s(4)}{h} \\\\
\frac{ds}{dt}(4)&=\lim_{h\to 0}\frac{(4+h)^2-4^2}{h} \\\\
\frac{ds}{dt}(4)&=\lim_{h\to 0}\frac{(4^2+2 \cdot 4 \cdot h+ h^2)-16}{h} \\\\
\frac{ds}{dt}(4)&=\lim_{h\to 0}\frac{\cancel{16} + 8h + h^2-\cancel{16}}{h} \\\\
\frac{ds}{dt}(4)&=\lim_{h\to 0}\frac{h(8 + h)}{h} \\\\
\end{align*}

In a normal equation, we wouldn't be able to simply cancel two $h$'s, because when the $h=0$, the function would be undefined. However, in our case _$h$ is not and will never be equal to zero_, thus we can cancel $h$'s like it is any other number.


\begin{align*} 
\require{cancel}
\frac{ds}{dt}(4)&=\lim_{h\to 0}\frac{\cancel{h}(8 + h)}{\cancel{h}} \\\\
\frac{ds}{dt}(4)&=\lim_{h\to 0} (8 + h) \\\\
\end{align*}

Now comes the most important part which is characteristic of calculus. To evaluate the limit, the question we should be asking is __what will the expression $(8+h)$ approach as $h$ approaches zero?__ As $h$ is becoming smaller and smaller but not zero, $(8+h)$ is becoming closer and closer to 8. Therefore we can say that $(8+h)$ is approaching 8, or that the limit of $(8+h)$ as $h$ approaches __is equal to 8.__ Once again: the expression $(8+h)$ will never be equal to 8, it will only approach 8. Since the expression approaches 8, the limit is equal to 8. These two statements are equivalent.


\begin{align*} 
\require{cancel}
\frac{ds}{dt}(4)&=\lim_{h\to 0} (8 + \cancelto{0}{h}) \\\\
\frac{ds}{dt}(4)&=8
\end{align*}

By using calculus, we have computed the __instantaneous__ speed at the point $x=4$ to be 8m/s. It is equal to the _derivative_ of the function $s(x)$, $\frac{ds}{dt}$ evaluated at $x=4$.

### Computing a function's derivative

We have just used the derivative to compute the instantaneous speed of the function defined as $s(t)=t^2$ at the point $t=4$. What derivative allows us to do is much more powerful than this. It allows us to compute a function that gives us the instantaneous speed for any value of time $t$! Therefore, our input will be a function, and the derivative will give us another function in return. Let's see how this works.

Again we are starting with the function $s(t)=t^2$:

\begin{align*} 
\require{cancel}
\frac{ds}{dt}&=\lim_{h\to 0}\frac{s(t+h)-s(t)}{h} \\\\
\frac{ds}{dt}&=\lim_{h\to 0}\frac{(t+h)^2-t^2}{h} \\\\
\frac{ds}{dt}&=\lim_{h\to 0}\frac{(t^2+2th+ h^2)-t^2}{h} \\\\
\frac{ds}{dt}&=\lim_{h\to 0}\frac{\cancel{t^2} + \cancel{h}(2t+h)-\cancel{t^2}}{\cancel{h}} \\\\
\frac{ds}{dt}&=\lim_{h\to 0}(2t+h) \\\\
\end{align*}

Again, at this stage we need to ask, __what will the expression $(2t+h)$ approach as $h$ approaches zero?__ As $h$ is becoming smaller and smaller (but not zero), $(2t+h)$ is becoming closer and closer to $2t$. Since $(2t+h)$ is approaching $2t$, the limit of  $(2t+h)$ as $h$ approaches zero is equal to $2t$. Therefore, we can write our derivative expression as:

\begin{align*} 
\frac{ds}{dt}&=2t \\\\
\end{align*}

Since our output function is equal to $2t$, we can now plug any value for time, and get the instantaneous speed at that point!

In [1]:
for i in range(10):
    print (f'the instantaneous speed after {i} seconds is equal to {2*i}m/s')

the instantaneous speed after 0 seconds is equal to 0m/s
the instantaneous speed after 1 seconds is equal to 2m/s
the instantaneous speed after 2 seconds is equal to 4m/s
the instantaneous speed after 3 seconds is equal to 6m/s
the instantaneous speed after 4 seconds is equal to 8m/s
the instantaneous speed after 5 seconds is equal to 10m/s
the instantaneous speed after 6 seconds is equal to 12m/s
the instantaneous speed after 7 seconds is equal to 14m/s
the instantaneous speed after 8 seconds is equal to 16m/s
the instantaneous speed after 9 seconds is equal to 18m/s


But now we can conclude more generally. Any function of the form $s(t)=t^2$

<img src="img/derivatives-14.png" alt="drawing" width="500"/>

 will have a unique derivative:

\begin{align*} 
\frac{d}{dt}t^2=2t
\end{align*}

which is a linear function whose plot is a straight line passing through the origin.

<img src="img/derivatives-18.png" alt="drawing" width="500"/>

We have computed it once, and now we can reuse it in every future case that involves derivatives of quadratic functions!