# Common math used in ML

This guide will cover the most common non-linear and non calculus functions in Machine Learning, and how to express them  
in numpy and pytorch.  It will also cover why they are useful

- Activation functions
    - Softmax/Sigmoid
    - ReLU
- Entropy and Cross-Entropy
- Seeding
- Minmax
- Mean and variance
- Sampling
- T-test


In [None]:
## Imports we need

import torch as tch
import torch.nn.functional as F

## Activation Functions

An activation function in deep learning is used to turn a linear equation into a non-linear one.  Without activation  
functions, we would not be able to model things statistically as we do, and problems would boil down to a system of  
linear equations.

The two most common activation functions are Sigmoid (or softmax) and ReLU (Rectified Linear Unit)

$\sigma_i = \frac{e^{z_i}}{\Sigma e^z}$

In [None]:
def softmax(z: tch.Tensor):
    """Generate non-linear mapping of input to output of probability

    the z values are the input values, which get mapped to a number representing the probability.  This
    is often used for classification.  The sum of the output values will always = 0

    Parameters
    ----------
    z : tch.Tensor
        input values

    Returns
    -------
    tch.Tensor
        mapped probability of ith values
    """
    num = z.exp()  # e^z[i] for each element in z
    denom = tch.sum(num)
    return num / denom

z = tch.rand(3)
print(z)
softmax(z)

## Entropy and Cross Entropy

Entropy is a measure of "surprise" or conversely, how much we don't know the probability of something.  50/50 odds are  
the most "surprising" and the highest entropy, because we don't know what outcome is more likely.  When something has a  
90% chance or 10% chance then the outcome (whether for or against) are better known, and thus have low entropy.  Another  
way to think about entropy is that low entropy provides less information and high entropy provides more.

Entropy is measured as:

$ H = - \sum_{i}^{N} p(x_i) \log_{2}(p(x_i)) $

Where 
- `x` are the data values
- `p` is the probability

The sum of p should equal 1

In [None]:
def entropy(x: tch.Tensor):
    return -1 * tch.sum(x * tch.log(x))

x = tch.tensor([.25, .75])
entropy(x)