# Neural Network

## 1.1 Why we need  machine learning?

### Some examples of tasks best solved by learning
- Recognizing patterns
	- objects in real scenes
	- facial identities or facial expressions
	- spoken words
- Recognizing anomalies
	- unusual sequences of credit card transactions
	- unusual patterns of sensors reading a nuclear power plant
- Prediction
	- future stock prices or currency exchanges 

## 1.2 What are neural networks?
(some examples of biological neurons)

## 1.3 Some simple models of neurons

### Idealized neurons
- To model thins we have to idealize them (e.g. atoms)
	- Idealization removes complicated details that are not essential for understanding the main principles
	- It allow us to apply matematics and to make analogies to other, familiar systems
	- Once we understand the basic principles, its easy to add complexity to make the model more faithful
- It is often worth  understanding models that are known to be wrong (but we must not forget that they are wrong!)
	- E.g neurons that communicate real values rather than discrete spikes of activity


### Linear neurons
- These are simple but computationally limited
	- if we can make then learn we *may* get insight into more complicated neurons
	- $y = b + \displaystyle{\sum_{i}x_iw_i}$

### Binary threshold neurons
- McCulloch-Pitts (1943): *influence by Von Neumann*
	- First compute a weighted sum of inputs
	- Then send out fixed size spike of activity if the weighted sum exceeds a threshold
	- McCulloch and Pitts thought that each spike is like the truth value of a proposition and each neuron combines truth values to compute the truth value of another proposition.
- There are two equivalent ways to write the equations for a binary threshold neuron:

| |  | |
|----------|----------|---------|
| $z = \displaystyle{\sum_{i}x_iw_i}$ | $ \theta = -b $ | $z = \displaystyle{b + \sum_{i}x_iw_i}$ |
| $ y = \begin{cases} 1 & \quad \text{if } z \geq \theta \\ 0 & \quad \text{otherwise} \end{cases} $ | | $ y = \begin{cases} 1 & \quad \text{if } z \geq 0 \\ 0 & \quad \text{otherwise} \end{cases} $ |

### Rectifier Linear Neurons (sometimes called threshold neurons)
They compute a *linear* weighted sum of their inputs.
The output is a *non-linear* function of the total input

$z = b + \displaystyle{\sum_{i}x_iw_i} $

$ y = \begin{cases} z & \quad \text{if } z = 0 \\ 0 & \quad \text{otherwise} \end{cases} $ 


### Sigmoid neurons
- These give a real-valued output that is a smooth and bounded function of their total input
	- Typically they use the use the logistic function
	- They have nice derivatives which make learning easy

$ z = b + \displaystyle{\sum_{i}x_iw_i} $

$ y = \frac{1}{1 + e^{-z}} $


### Stochastic binary neurons

- These use the same equations as logistics units:
 - But they treat the output of the logistic as the *probability* of producing a spike in a short time window
- We can do a similar trick for rectifier linear units:
	- The output is treated as poisson rate for spikes (????)

$z = b + \displaystyle{\sum_{i}x_iw_i} $

$ p(s=1) = \frac{1}{1 + e^{-z}} $

![](http://image.slidesharecdn.com/2-typesofneurons-151231003446/95/neural-networks-types-of-neurons-13-638.jpg?cb=1451522241) (revisit)

## 1.4 A simple example of learning

## 1.5 Three types of learning
### Types of learning tasks
- Supervised learning
	- Learn to predict an output when given an input vector
- Reinforced learning
	- Learn to select an action to maximized payoff
- Unsupervised learning
	- Discover a good internal representation of the input

### Two types of supervised learning
- Each training consists of an input vector $x$ and  target output $t$
- _Regression:_ The target output is a real number or a whole vector of real numbers.
	- The price of a stock in 6 months time.
	- The temperature at noon tomorrow.
- _Classification:_ The target output is a class label
	- The simplest case is a choice between 1 and 0
	- We can also have multiple alternative labels


### How supervised learning typically works
- We start choosing a _model class_: $y = f(x;W)$
	- A model-class, $f$, is a way of using some numerical parameters, $W$, to map each input vector, $x$, into a predicated output $y$.
- Learning usually means adjusting the parameters to reduce the discrepancy between the target output, $t$, on each training case and the actual output, $y$, produced by the model.
	- For regression, $\frac{1}{2}(y - t)^2$ is often a sensible measure of the discrepancy
	- For classification there are other measures that are generally more sensible (they also work better)


### Reinforced Learning
- In reinforced learning, the output is an action or sequence of actions and the only supervisory signal is an occasional scalar reward
	- The goal in selecting each action is to maximize the expected sum of future rewards
	- We usually use a discount factor for delayed rewards so that we don't have to look too far in the future
- Reinforced learning is difficult:
	- The rewards are typically delayed so its hard to know where we went wrong (or right)
	- A scalar reward dos not supply much information

### Unsupervised Learning
- For about 40 years, unsupervised learning was largely ignored byt the machine learning community
	- Some widely used definitions of machine learning actually excluded it.
	- Many researchers thought that clustering was the only form of unsupervised learning
- It is hard to say what the aim of unsupervised learning is
	- One major aim is to create an internal representation of the input that is useful for subsequent supervised or reinforcement learning
	- You can comppute the distance to a surface by using the disparity between two images. But you don't want to learn to comppute disparities by stubbing your toe thousands of times

### Other goals for unsupervised learning
- It provides a compact, low-dimensional representation of the input
	- High-dimensional inputs typically live on or near a low-dimensional manifold (or several such manifolds)
- It provides an economical high-dimensional representation of the input in terms of learned features.
	- Binay features are economical.
	- So are real-valued features that are nearly all zero.
- If finds sensible clusters in the input.
	- This is an example of a *very* sparse code in which only one of the features is non-zero.

_Note:_ The goal of unsupervised learning would be to transform inputs such as these binary images into another representation that makes them easier to deal with in some way. For example, making the digits easier to classify or discovering some interesting semantic properties such as dictionary or strokes that can be used for form each digit.