# Introduction to Mathematical Differentiation

Differentiation is the action of computing a derivative. The derivative of a function y = f(x) of a variable x is a measure of the rate at which the value y of the function changes with respect to the change of the variable x. It is called the derivative of f with respect to x. If x and y are real numbers, and if the graph of f is plotted against x, the derivative is the slope of this graph at each point.


Rough Graphical Meaning for derivative is shown below

![Slope of Tagent Line at at a specified point is viewed as the first derivative of the function](img/Tangentcalculus.png)



Rigorous Definition

The most common approach to turn this intuitive idea into a precise definition is to define the derivative as a limit of difference quotients of real number.

We can interprete the derivative as the limit at a specified point

![limit definition of derivative](img/limitderivative.png)

# Implementing derivative in Python

Example 1: Simple derivative

$f(x) = x^{2}$

$f'(x) = 2x$

In [4]:
def xSquared(x):
    return x**2

def getDeriv(func, x):
  h = 0.0001
  return (func(x+h) - func(x)) / h


# f'(3) = 6.0001

x = 3
derivative = getDeriv(xSquared, x)
actual = 2*x

#derivative, actual = 6.0001, 6

In [7]:
print(derivative, actual)

6.000100000012054 6


Example 2: Scipy derivative

In [10]:
from scipy.misc import derivative
def f(x):
    return x**3 + x**2

derivative(f, 3.0, dx=1e-6)  # f'(3) with h = 0.0000001

33.00000000550085

# Derivative in machine learning

Machine learning uses derivatives to find optimal solutions to problems. It’s useful in optimization functions like Gradient Descent because it helps us decide whether to increase or decrease our weights in order to maximize or minimize some metrics (e.g. loss)

It also helps us model nonlinear functions as linear functions (tangent lines), which have constant slopes. With a constant slope we can decide whether to move up or down the slope (increase or decrease our weights) to get closer to the target value (class label).

# Khan Academy Video on Derivative

https://www.khanacademy.org/math/ap-calculus-ab/ab-derivative-intro/ab-derivative-intuition/v/derivative-as-a-concept

# Gradient

A gradient is a vector that stores the partial derivatives of multivariable functions. It helps us calculate the slope at a specific point on a curve for functions with multiple independent variables. In order to calculate this more complex slope, we need to isolate each variable to determine how it impacts the output on its own. To do this we iterate through each of the variables and calculate the derivative of the function after holding all other variables constant. Each iteration produces a partial derivative which we store in the gradient.

Example of gradient for multivariate function

$f(x,z) = 2z^{3}x^{2}$

$\frac{\text{d}f}{\text{d}x}(x,z) = 4z^{3}x$

$\frac{\text{d}f}{\text{d}z}(x,z) = 6z^{2}x^{2}$


The partial derivatives are then stored in the following format called as gradient

![gradient](img/gradient.png)

# Python Implementation on gradient

In [18]:
import numpy as np

f = np.array([1,2,3,5], dtype=float)
print(np.gradient(f))
print(np.gradient(f, 2))

[1.  1.  1.5 2. ]
[0.5  0.5  0.75 1.  ]


# Directional Derivative

Another important concept is directional derivatives. When calculating the partial derivatives of multivariable functions we use our old technique of analyzing the impact of infinitesimally small increases to each of our independent variables. By increasing each variable we alter the function output in the direction of the slope.

But what if we want to change directions? For example, imagine we’re traveling north through mountainous terrain on a 3-dimensional plane. The gradient we calculated above tells us we’re traveling north at our current location. But what if we wanted to travel southwest? How can we determine the steepness of the hills in the southwest direction? Directional derivatives help us find the slope if we move in a direction different from the one specified by the gradient.

# Khan Academy Video on Directional Derivative

https://www.khanacademy.org/math/multivariable-calculus/multivariable-derivatives/partial-derivative-and-gradient-articles/a/directional-derivative-introduction
    

# Gradient and Directional Derivative in Deep Learning

Gradients are used in neural networks to adjust weights and optimize cost functions.

Gradient Descent
The relevant concept in deep learning is called gradient descent. In gradient descent, neural networks use property #1 above to determine how to adjust their weights for each variable (feature) in the model. Rather than moving in the direction of greatest increase, as specified by the gradient, neural networks move in the opposite direction to minimize a loss function, like error percent or Log Loss. After adjusting their weights, neural networks compute the gradient again and move in the direction opposite to the one specified by the gradient.

Adjusting Weights
Neural networks use the concept of directional derivatives to adjust their weights. After computing the gradient of the current weights (features), neural networks identify the direction of greatest decrease to the loss function, and then multiply the current weights by a vector (or matrix) containing the direction and magnitude that minimizes the loss function. The networks can change their weights with varying magnitudes, but the changes must be proportional to maintain the proper direction of greatest decrease.

# Chain Rule

The chain rule is a formula for calculating the derivatives of composite functions. 
Composite functions are functions composed of functions inside other function.

Composite Function and its relationship between Chain Rule is demonstrated in the following graph

![Composite Function and Chain Rule](img/compositechainrule.png)



# Khan Academy Video on Chain Rule

https://www.khanacademy.org/math/ap-calculus-ab/ab-derivative-rules/ab-chain-rule/v/chain-rule-introduction    

In [None]:
# Reference:
# http://wiki.fast.ai/index.php/Calculus_for_Deep_Learning
# https://en.wikipedia.org/wiki/Derivative
# https://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.derivative.html
# https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.gradient.html