# Derivatives

## Definition
Put simply, a derivative is the rate of change in a function with respect to changes in its arguments. Derivatives can tell us how rapidly a loss function would increase or decrease were we to increase or decrease each parameter by an infinitesimally small amount.

## Formal Definition
Formally, for functions f: ℝ → ℝ, that map from scalars to scalars, the derivative of f at a point x is defined as:

f'(x) = lim(h→0) [f(x + h) - f(x)] $/$ h

This term on the right hand side is called a limit and it tells us what happens to the value of an expression as a specified variable approaches a particular value. This limit tells us what the ratio between a perturbation h and the change in the function value f(x + h) - f(x) converges to as we shrink its size to zero.

## Differentiability
When f'(x) exists, f is said to be differentiable at x; and when f'(x) exists for all x on a set, e.g., the interval [a, b], we say that f is differentiable on this set. Not all functions are differentiable, including many that we wish to optimize, such as accuracy and the area under the receiving operating characteristic (AUC). However, because computing the derivative of the loss is a crucial step in nearly all algorithms for training deep neural networks, we often optimize a differentiable surrogate instead.

## Interpretation
We can interpret the derivative f'(x) as the instantaneous rate of change of f(x) with respect to x. Let's develop some intuition with an example. Define u = f(x) = 3x² - 4x.

In [3]:
import numpy as np

In [1]:
def f(x):
    return 2*x**2 - 4*x

# setting x = 1, we see that (f(1+h)-f(1))/h approaches 2 as h approaches 0

In [4]:
for h in 10.0**np.arange(-1, -6, -1):
    print(f'h={h:.5f}, numerical limit={(f(1+h)-f(1))/h:.5f}')

h=0.10000, numerical limit=0.20000
h=0.01000, numerical limit=0.02000
h=0.00100, numerical limit=0.00200
h=0.00010, numerical limit=0.00020
h=0.00001, numerical limit=0.00002


## Notation Conventions
There are several equivalent notational conventions for derivatives. Given y = f(x), the following expressions are equivalent:

f'(x) = y' = dy/dx = df/dx = d/dx f(x) = D f(x) = Dₓ f(x),

where the symbols d/dx and D are differentiation operators.

## Common Function Derivatives
Below, we present the derivatives of some common functions:

d/dx C = 0 for any constant C
d/dx xⁿ = n xⁿ⁻¹ for n ≠ 0
d/dx eˣ = eˣ
d/dx ln x = 1/x

## Differentiation Rules
Functions composed from differentiable functions are often themselves differentiable. The following rules come in handy for working with compositions of any differentiable functions f and g, and constant C:

d/dx [C · f(x)] = C · d/dx f(x)                    (Constant multiple rule)
d/dx [f(x) + g(x)] = d/dx f(x) + d/dx g(x)        (Sum rule)
d/dx [f(x) · g(x)] = f(x) · d/dx g(x) + g(x) · d/dx f(x)  (Product rule)
d/dx [f(x) / g(x)] = [g(x) · d/dx f(x) - f(x) · d/dx g(x)] / g²(x)  (Quotient rule)

## Example Application
Using this, we can apply the rules to find the derivative of 3x² - 4x via:

d/dx [3x² - 4x] = 3 · d/dx x² - 4 · d/dx x = 6x - 4.

Plugging in x = 1 shows that, indeed, the derivative equals 2 at this location. Note that derivatives tell us the slope of a function at a particular location.