In [2]:
import numpy as np
from matplotlib import pyplot as plt

# Recap from Week 1

- In this thread, we are interested in systems that exhibit <u>intelligent behaviour</u>

- The modern paradigm of doing so is called <u>machine learning</u>

- The standard machine learning pipeline is:
    1. Define target behaviour in terms of inputs and outputs
    2. Collect data of target behaviour
    3. Initialise a **model**
    4. **Train modelon** data to predict the **correct** outputs from inputs
    5. **Evaluate** model performance

Our question this week is:
> How do we build a **model**?

# How to build a model?

* A model can be thought of as a function $\mathbf{y} = f(\mathbf{x})$. It takes in an input $\mathbf{x}$ and spits out an output $\mathbf{y}$.

* We want the model to <u>approximate</u> our data, predicting the correct output $\mathbf{y}$.

> The challenge is to a come up with a **general architecture** that can **approximate any function**.

## Neural Networks

* Neural networks are one example of such *general architectures*. 

* They are composed of <u>perceptrons</u>.

![Deep Neural Network](https://www.ibm.com/content/dam/connectedassets-adobe-cms/worldwide-content/cdp/cf/ul/g/3a/b8/ICLH_Diagram_Batch_01_03-DeepNeuralNetwork.png)

*Image courtesy of IBM*

## Perceptron

Suppose you have a collection of inputs $\mathbf{x} = (x_1, x_2, x_3, ..., x_n)$.

A perceptron multiplies each input by a <u>weight</u> and sums the products:
$$
y = w_0 + w_1 x_1 + w_2 x_2 + ... + w_n x_n
$$

> **Note:** *The* $w_0$ *term is called the* <u>bias term</u>.

![A perceptron](https://miro.medium.com/v2/resize:fit:1400/1*n6sJ4yZQzwKL9wnF5wnVNg.png)

*Image courtesy of "What the Hell is Perceptron? The Fundamentals of Neural Networks", SAGAR SHARMA, Towards Data Science, Sep 9, 2017*

## Many Perceptrons

We can scale up to many perceptrons and many layers, where each perceptron in a layer takes in all outputs from the previous layer as input. This is called a <u>multilayer perceptron (MLP)</u>, and is one type of <u>deep neural networks</u>:

![Multilayer perceptron](https://d3i71xaburhd42.cloudfront.net/2589b72d4e928896dc668a24839de4f2adcc6726/11-Figure3-1.png)

*Image courtesy of Shao, Changpeng. “A Quantum Model for Multilayer Perceptron.” arXiv: Quantum Physics (2018): n. pag.*

### In vector form...

$$
y = \mathbf{W}\mathbf{x}
$$

But successive perceptron layers is identical to just one!

$$
z = \mathbf{W_n}...\mathbf{W_2}\mathbf{W_1}\mathbf{x} = (\mathbf{W_n}...\mathbf{W_2}\mathbf{W_1})\mathbf{x} = \mathbf{W_{supreme}}\mathbf{x}
$$

The solution? <u>Activation functions!</u>

## Activation Functions

These inject <u>non-linearity</u> into the perceptron output, so that the MLP can learn more complex relationships:

$$
y = \sigma(\mathbf{W}\mathbf{x})
$$

## Universal Approximation Theorem

Neural networks have some very interesting <u>universal approximation</u> results: