The intent of this notebook is to document different types of machine learning approaches and algorithms.

# Supervised learning
Wikipedia has an excellent definition:

**Supervised learning** is the machine learning task of inferring a function from labeled training data.
The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used for mapping new examples. An optimal scenario will allow for the algorithm to correctly determine the class labels for unseen instances. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way 

## Classification ##
### Support Vector Machines (SVMs) ###
SVMs are a set of learning algorithms for to do classification. In a nutshell, SVMs try to find a hyperplane that maximes the distance between clusters of data.

Visually:

![SVM Example](svm/svm-example.png)

The points/vectors that define where the hyperplane fits are called the **support vectors** (they're "supporting" the maximum distance between the clusters and the plane).


![Support Vectors](svm/support-vectors.png)


Computing SVMs comes down to solving a differential equation: finding a hyperplane that minimizes the distance between the hyperplane and all potential support vectors in their respective clusters. There are multiple algorithms for solving this.

![Support Vectors](svm/svm-minimize.png)

Note that a tricky part of defining an SVMs is often finding the right feature space. What might be hard in one dimension, might become easy in a higher dimension that represents the right features:

![Support Vectors](svm/input-feature-space.png)




### Neural Networks ###

Neural network example, with input, output and hidden layers:

![Neutral Network](nn/neural_net.png)

Individual **neurons** are just sum or **"transfer"** functions:

![Sum Function](nn/sum-function.png)

In most cases they have activation functions that will output constant or variable values if the output of the neuron reaches a certain threshold:

![Activation Function](nn/activation-function.png)

Note that $w_{ij}$ represents the weights for input $i$ in neuron $j$ which can be represented as a matrix.


There are multiple types of activation functions. Some popular ones:
 - [Sigmoid function](https://en.wikipedia.org/wiki/Sigmoid_function), ```σ(x)σ(x)```: squashes numbers into the range (0, 1)
 - The [hyperbolic tangent](https://en.wikipedia.org/wiki/Hyperbolic_function#Standard_analytic_expressions): ```tanh(x)```, which squashes numbers into the range (-1, 1)
 - The [rectified linear unit](https://en.wikipedia.org/wiki/Rectifier_(neural_networks)), ```ReLU(x)=max(0,x)```.


#### Types ####
There are different neural network types:

![Neural Network Types](nn/neural-network-types.jpg)
    
    
**Perceptron**: no hidden layers, only input and output.

**Feed Forward**: No cycles or loops in the network.

**Deep Neural Networks**: neural networks that contain more than one hidden layer. 

**Recurrent Neural Network (RNN)**: also propagate data from later processing stages to earlier stages.
    

## Recurrent Neural Networks##

A RNN maintains internal memories about the world (weights assigned to different pieces of information) to help perform its classifications. For example, when classifying activities in movie clips, it will "remember" what has happened in previous clips.

![Recurrent Neural Network](nn/rnn.png)


In this image, $\phi$ is the activation function, $W$ is the weights matrix associated with the current state, $U$ is the weights matrix associated with the previous state.

This can then in programming terms be interpreted as running a fixed program with certain inputs and some internal variables. 

A very simple implemtentation of an RNN might look like this (from http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

In [4]:
class RNN:
  # ...
  def step(self, x):
    # update the hidden state
    self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x))
    # compute the output vector
    y = np.dot(self.W_hy, self.h)
    return y

Note how this compares to the picture above. In particular, note the following similarilites:
- the ```np.tanh``` activation function, which squashes the output between -1 and 1
- $h_t$ from the previous is called ```self.h```
- The terms in the sum inbetween ```np.tanh(...)``` are switched here, it's basically: ```np.tanh(prev state + current state)```, while the image above does $\phi (current + prev)$
- $W$ is called ```self.W_xh```, $U$ is called ```self.W_hh```
- in math terms, the code really does: $h_t = \tanh ( W_{hh} h_{t-1} + W_{xh} x_t ))$

## Long short-term memory Networks (LTSM) ##

A lot of the info that follows is based off this blogpost: http://blog.echen.me/2017/05/30/exploring-lstms/

Whereas an RNN can overwrite its memory at each time step in a fairly uncontrolled fashion, an LSTM (specific type of RNN) transforms its memory in a very precise way: by using specific learning mechanisms for which pieces of information to remember, which to update, and which to pay attention to. This helps it keep track of information over longer periods of time.

In the code above, and LTSM would make the computation of ```self.h``` more complicated.