# Algorithms: neural network, feedforward, and accuracy

![Creative Commons License](https://i.creativecommons.org/l/by/4.0/88x31.png)  
This work by Jephian Lin is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).

In [2]:
import numpy as np

## Neural network
The neural network algorithm  
can be devided into three major parts:  
feedforward, backpropagation, and update weights.  

Here is an example of using Keras.

In [21]:
from sklearn.datasets import load_iris
iris = load_iris()
X = iris['data']
y = iris['target']
print(X.shape)
print(y.shape)

(150, 4)
(150,)


In [4]:
N = X.shape[0] ### 150

y_oh = np.zeros((N, 3))
y_oh[np.arange(N), y] = 1

In [5]:
print("y =")
print(y[::25])
print()
print("y_oh =")
print(y_oh[::25])

y =
[0 0 1 1 2 2]

y_oh =
[[1. 0. 0.]
 [1. 0. 0.]
 [0. 1. 0.]
 [0. 1. 0.]
 [0. 0. 1.]
 [0. 0. 1.]]


In [7]:
from keras.models import Sequential
from keras.layers import Dense
model = Sequential()
model.add(Dense(4, activation='sigmoid', input_shape=(4,)))
### adding one more layer is not necessarily good
# model.add(Dense(3, activation='sigmoid')) 
model.add(Dense(3, activation='softmax'))
model.compile(optimizer='sgd', loss='categorical_crossentropy', metrics=['accuracy'])

In [8]:
model.fit(X, y_oh, steps_per_epoch=1000, epochs=10)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.callbacks.History at 0x7f4b0a0a72e8>

In [9]:
W1, b1, W2, b2 = model.get_weights()

In [10]:
print('W1.shape =', W1.shape)
print('b1.shape =', b1.shape)
print('W2.shape =', W2.shape)
print('b2.shape =', b2.shape)

W1.shape = (4, 4)
b1.shape = (4,)
W2.shape = (4, 3)
b2.shape = (3,)


Note:  If Keras or sklearn is not available for you  
you may randomly initiate the weights  
to do the exercises below.
```Python
W1 = np.random.arange(4,4)
b1 = np.random.arange(4,)
W2 = np.random.arange(4,3)
b2 = np.random.arange(4,)
```

### Feedforward
Feedforward sends the data through the network  
and get the prediction.  

If follows from some recurrsive formulas.
1. $z_i = a_{i-1}W_i + b_i$ for $i=1,\ldots$
2. $a_i = \sigma(z_i)$

Here $a_0$ is the input.  
That is, rows of `X`.

Each layer contains  
a **weight** `Wi` and  
a **bias** `bi`.

The **sigmoid function** is  $\frac{1}{1 + e^{-x}}$.  
When it is applied to an array,  
the function is applied to each entry.

The **activation function** refers to  
the sigmoid function and the softmax function.  

The softmax function matters  
only for the training part,  
so we ignore it here.  

Let's use $\sigma$ for the activation function.  
It varies from each layer,  
though here it is only the sigmoid function.

### One-hot encoding
`0 -> [1, 0, 0]`  
`1 -> [0, 1, 0]`  
`2 -> [0, 0, 1]`  
In one-hot encoding,  
the distance between two categories  
is the same for every pair of categories.

One way to transform  
the label to one-hot encoding  
is by **fancy indexing**.

In [19]:
a = np.arange(9).reshape(3,3)
a

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [23]:
a[[0,1,2,0,1,2,0,1,2], [0,0,0,1,1,1,2,2,2]]

array([0, 3, 6, 1, 4, 7, 2, 5, 8])

In [20]:
a[np.arange(3), np.arange(3)]

array([0, 4, 8])

Now review how we did that 

```Python
N = X.shape[0] ### 150

y_oh = np.zeros((N, 3))
y_oh[np.arange(N), y] = 1
```

### Prediction
When $a_2$ (the output of the last layer)  
is generated,   
the predicted answer is `np.argmax(a2)`.  
That is, the index for the maximum entry.  
(Or the one-hot encoding of `np.argmax(a2)`,  
depending on the structure of the target.)

In the case of softmax,  
`np.argmax(a2) == np.argmax(z2)`,  
so it is enough to deal with `np.argmax(z2)`.

##### Exercise
Let `a0 = X[0]`, the first row of `X`.  
Compute `z1 = np.dot(a0, W1) + b1`.

In [1]:
### your answer here


##### Exercise
Use `lambda` syntax to write the sigmoid function.  
Then define `sigmoid` as the vectorized sigmoid function.

In [1]:
### your answer here


##### Exercise
Compute `a1 = sigmoid(z1)`.

In [1]:
### your answer here


##### Exercise
Compute `z2 = np.dot(a1, W2) + b2`.  
(As discussed, let's not compute `a2`.)

In [1]:
### your answer here


##### Exercise
Write a function `one_hot(k)`  
that returns an array of shape `(3,)`  
with a `1` at index `k` while  
other entries are `0`.

In [1]:
### your answer here


##### Exercise
Use `np.all`  
to tell whether  
`one_hot(np.argmax(z2))` and `y_oh[0]`  
are the same.

In [1]:
### your answer here


##### Exercise
Combine everything above.  
Use `for` loop to iterate every row in `X`  
and compute the accuracy of the prediction.  

Your answer should be the same as the output of Keras.

In [1]:
### your answer here


##### Exercise
Apply the idea of vectorization as possible.  
Can you generate the answer of the previous question  
without using a `for` loop?

In [1]:
### your answer here


##### Sample code for accuracy (without `for` loop)

In [26]:
### vectorized sigmoid function
sigmoid = np.vectorize(lambda x: 1 / (1+np.exp(-x)))

z1 = np.dot(X, W1) + b1
a1 = sigmoid(z1)
z2 = np.dot(a1, W2) + b2
### again, computing a2 is not necessary in this case

ans_label = np.argmax(z2, axis=1)

### create one-hot encoding
N = X.shape[0]
ans_oh = np.zeros((N, 3))
ans_oh[np.arange(N), ans_label] = 1
acc = np.all(ans_oh == y_oh, axis=1).sum() / N
print(acc)

0.9733333333333334
