## Limit

- natural definition 
  - given a function $f(x)$, "nudge" the input arround a given value $a$
    - as a result, the function value changes
  - limit of $f(x)$ at the point $x = a:$ what $f$ approches as $x$ approches $a$
- notation: $\lim_{x\to a} f(x) = L$
- mathematical definition
  - give us a nice way to define "approching a value"
  - for any possitive $\delta$ and $\varepsilon$
    - if $0 < |x - a| < \delta $
    - then $|f(x) - L| < \varepsilon$
  - also called "epsilon-delta" definition
  - what are these numbers? Arbitrary they only need to be positive
    - it's very useful to make the really small

## Limits in Python

- to find a limit of a function at a point, just apply the definition
  - generate several values of $x$ arround $a$
    - don't forget to include positive and negative "nudges"
  - print the function values at those points
  
```python
def get_limit(f, a):
    epsilon = np.array([
    10 ** p
        for p in np.arange(0, -11, -1, dtype=float)])
    
    x = np.append(a - epsilon, (a + epsilon[::-1]))
    y = f(x)
    
    return y

print get_limit(lambda x: x ** 2, 3)
print get_limit(lambda x: x ** 2 + 3 * x, 2)
print get_limit(lambda x: np.sin(x), 0)
```

## More Limits

- some functions dont have values on certain points
  - but they are defined "arround" this points
  - the limit exist even though the function value doesn't  
  $\lim_{x\to 0} \frac{sin(x)}{x} = 1 $
- some limits can be infinite: $\lim_{x\to\infty} x^2 = \infty$
- some functions "jump"
  - the limit "from the left" and "from the right" are different
    - therefore the limit is not defined
    - we say that the function is not continuous at that point
- example: 
  - in this case $f(0) = 0$ but the limit does not exist  
  $f(x) = \begin{cases} -1, x<0 \\ 0, x=0\\1, x=1\end{cases}$  
  $\lim_{x\to 0^-} f(x) = -1$  
  $\lim_{x\to 0^+} f(x) = 1$
  
  ![Limits](limits.png)
  
  More information: http://xaktly.com/MathLimits.html

## Calculus Motivation

- say you want to complete an area of a circle
  - it is $\pi R^2$ but why?
  - remember how you can divide a shape into simpler shapes and sum their areas to get the total area
    - one way: cut it like cake
    - another way: concentrics rings
  - if you "cut" and "straighten" each ring, you will get a trapezoid
    - if your ring is very, very thin - it will actually close to rectangle
    ![Trapezoid in Calculus](trapezoid-calculus.png)
    - set the difference to be very, very small: $r_3 - r_2 \rightarrow 0$
    - and you get calculus
- even in this simple example, there are the notion about derivatives and integrals, even the funamental theorem of calculus.

## Derivatives and velocity

- we all know that $v = \frac{s}{t}$
  - but it is almost useless
  - traveling is not done at a uniform velocity, it's not a fixed number but a function of a time $v = v(t)$
- instantaneous velocity: $v(t_0) = v(t)|\small{t=t_0}$
- computing instantaneous velocity from travelled distance
  - say, $s(t)=t^2$, say that we start at $t=0$s and finish at $t=5$s
    - final distance $s(5) = 5^2 = 25$m
  - avarage speed: $\frac{25}{5} = 5\frac{m}{s}$
  - but we cover different distance for the same time:
    - from $0 \leq t \leq 1: s(1) - s(0) = 1 - 0 = 1m$ 
    - from $3 \leq t \leq 4: s(4) - s(3) = 16 - 9 = 7m$ 
    - from $4 \leq t \leq 5: s(5) - s(4) = 25 - 16 = 9m$ 
    - and neither of this is even close to the avarage speed
- let's calculate the instantaneous velocity
  - fix time at $t = 3$
  - but how can we move if the time is fixed?
- let's apply the idea of the limits
  - nudge time a tiny bit and see how the distance changes
    - $t=3.01: v \approx \frac{s(3.01) - s(3)}{3.01 - 3} = \frac{3.01^2 - 3^2}{0.01} = 6.01 \frac{m}{s}$
    - $t=3.00001: v \approx \frac{s(3.00001) - s(3)}{3.00001 - 3} = \frac{3.00001^2 - 3^2}{0.00001} = 6.0001 \frac{m}{s}$
  - more generally: if wee nudge the time from $t = t_0$ to $t = t_0 + \Delta$, we will get an approximation of the instantaneous velocity:  
  $v \approx \frac{s(t + \Delta t) - s(t)}{t + \Delta t - t} = \frac{s(t + \Delta t) - s(t)}{\Delta t}$
  - this approximation will get increasingly **more accurate** if $\Delta t$ becomes **smaller**
  - smaller  $\Delta t \Rightarrow $ better approximation of $v$
- how does the velocity behave as $\Delta t \rightarrow 0$?
  - note that we can not set $\Delta t = 0$, this will freeze time
  - math notation, if $\Delta t \rightarrow 0$ we write it as $dt$:
  $vt = \lim_{dt\to 0} = \frac{s(t + dt) - s(t)}{dt}$
- **the rate of change of a function $f(x)$ as its argument changes is called the first derivative of $f(x)$**
- geometrically, the derivative at a give point is equal to the slope of the tangent line to the function of that point
  

## Calulating Derivatives

- note that we have two dimensions:
  - derivative of $f(x)$ at a fixed point x (e.g. x=5), this is a number
  - derivative of $f(x)$ at any point, this is another function
- calculate the derivative of $3x^2 +5x -8$ at $x=3$
  - we are doing a numerical approximation
  - we can't work with infinitesmally small $h$ but we can get away with something quite small
  ```python
    def calculate_derivatives(f, a, h = 1e-7):
        return (f(a+h) - f(a)) / h
    
    print(calculate_derivatives(lambda x: 3 * x**2 + 5*x - 8, 3)) 
    #23.00000026878024
  ```
  - we can do this **analyticaly** - a fancy term for "with pen and paper" 
  
  [Table Derivatives](http://www.math.com/tables/derivatives/tableof.htm)

## Properties of derivatives 

- the derivative of constant (f(x) = c) is 0
- derivatives are linear:
  - $(f \pm g)\prime = f\prime \pm g\prime$
  - $(\lambda f)\prime = \lambda\prime f\prime$
- product rule
  - $(f.g)\prime = f\prime.g + f.g\prime$
  - $(\frac{f}{g})\prime = \frac{f\prime.g + f.g\prime}{g^2}$
- derivative of function composition
  - also called a chain rule:
    - $(f(g(x))\prime = f\prime(g(x)).g\prime(x)$
  - looks better in the other notation:
    - $\frac{df}{dx} = \frac{df}{dg} . \frac{dg}{dx}$
- we can proove this using the geometric intuition or the definition 

## Higher Order Derivatives

- the second derivative of a function is the first derivative of its first derivative
  - interpretation: "rate of change of the rate of change"
  - ... a.k.a. acceleration
  - notation
    - $f\prime\prime(x) = (f\prime(x))\prime, \frac{d^2 f}{d x^2} = \frac{d}{dx} (\frac{df}{dx})$
- this can be applied arbitrary many times
  - e.g. rate of change of acceleration: third derivative
  - third, fourth, etc. derivatives; n-th derivative

## Function Exrema

- even if we don't know the function, its derivatives give us useful information
- consider the drawn function
  - the smallest value of $f(x)$ is called a global minimum 
  - conversely, largest value: global maximum
- these are collectively called extrema (plural of extremum)
- smallest / largest value of f (x) in a tiny range: local min / max
- more formally, we say $f(x)$ has a maximum at, say, $x = 5$ if the function value $f(5)$ is bigger than the function values immediately to the left and right
  - the complete definition involves limits
  - the points x of min / max (e.g. x = 5) are called critical points
  
![Function Extrema](function-extrema.png)  

- notice how the tangent line behaves
  - at max / min, $f\prime = 0$
  - around max / min, $f\prime$ changes its sign
- also notice that if $f\prime(x) > 0$ in a given interval, the function increases
  - if $f\prime(x)<0$, the function decreases
- therefore, if f behaves like this:
  - increasing; stop; decreasing => local maximum
  - decreasing; stop; increasing => local minimum 

![Tangent Lines Functions](tangents-lines-functions.png)
  
- the second derivative gives us more information about whether the function is "concave up" or "concave down"
  - more specifically, its sign
  - these are sometimes called convex and concave functions 
  
![Convex and Concave Functions](convex-concave-functions.png)

## Integrals
### Area under a Function

- look back to the motivating example
- how can we find the area S "under" a curve given by a function?
  - what is the shaded area (S < 0 if I < 0)?
- approach: approximate and zoom in
- divide the x-axis into equal intervals $\Delta x$
- approximate the area with trapezoids
  - $S = \sum\limits_i{S\tiny{i}} $
- if the intervals in $x$ are really small, the trapezoids will look like rectangles
  - ${S\tiny{i}} = f(x\tiny{i}\normalsize)\Delta x$
- smaller $\Delta x$ => better approximation 
![Integral](integral.png)


## Integral of a function

- at the limit $\Delta x \rightarrow 0$, so we write $dx$
- indefinite integral: the same as definite, without the end points
  - like derivatives, the definite integral is a number
  - the indefinite integral is a function of x
- calculating integrals
  - analytically - very difficult (unlike derivatives)
  - numerically - apply the trapezoidal rule
    - use a small number $dx$, like before 

## Antiderivatives

- the antiderivative $F(x)$ of a function $f(x)$ is such a function that $F\prime(x) = f(x)$
  - it's also called the primitive function of $f(x)$
  - note that since the derivative of a constant is zero, there are many antiderivatives: $F\prime(x) + C = f(x)$
  - therefore, we can know the antiderivative only up to an arbitrary additive constant
- if we do definite integrals, the + C does not apply - we know the area exactly
- if we do indefinite integrals, we must always add the constant 

## Fundamental Theorem of Calculus

- the indefinite integral of a function is related to its antiderivative and can be reversed via differentiation
- the definite integral of a function can be computed using one of its infinitely many antiderivatives 
- simply, differentiation and integration are inverse functions
- proof: [Khan Academy](https://www.khanacademy.org/math/ap-calculus-ab/ab-integration-new/ab-integration-optional/v/proof-of-fundamental-theorem-of-calculus)
- intuition
  - The sum of Infinitesimal changes In a quantity over time adds up to the net change in quantity
  - Think about distance and velocity again

## Generalization

- the notions of derivatives and integrals generalize to more dimensions
- derivatives: take the derivative w.r.t. one variable, treat the other variables as "parameters" -> **partial derivatives**
![Partial Derivatives](partial-derivatives.png)
- integrals: 1D intervals [a;b] can become curves or planes
  - apply the zooming in technique
![Multiple Integral](multiple-integral.png)  

## Gradient Descent

- optimization method
  - used for finding local extrema
- gradient: grad(f) or $\nabla f$
  - a combination of vector and derivative
    - multi-dimensional derivative
    - a vector whose components are the partial derivatives w.r.t. every variable
    - shows where the steepest rise in slope is
- if we follow the gradient, well arrive at a maximum
  - conversely, negative gradient takes us to a minimum
- iterative procedure
  - continue to apply until close enough
- not guaranteed to find global extrema
  - may get "stuck" in a local extremum 

## Example Gradient Descent

- find a local minimum of a function: $ f(x) = x^4 - 3x^3 + 2$
- start at $x=6$

```python
x_old = 0
x_new = 6
step_size = 0.01
precision = 0.00001

def df(x):
    # f'(x^4 - x^3 + 2) = 4x^3 - 9x^2
    y = 4 * x ** 3 - 9 * x **2
    return y

while abs(x_new - x_old) > precision:
    x_old = x_new
    x_new += - step_size * df(x_old)
    
print("The local minimum occurs at ", x_new)
```