# Simple Neural Networks
Neural Networks are another algorithm we can use to learn complicated,
nonlinear relations among our data, and they work best with large amounts of 
data. 

Every neural net has an input layer, zero or more hidden layers, and one output
layer. The input layer accepts the training examples, each hidden layer
performs the calculation $\sigma\left(w^TX + b\right)$, and the output layer
performs the same calculation one last time. 

The simplest 1-layer neural net has only an input and output layer.
Linear and logistic regression are also instances of 1-layer neural nets
because they have an input layer that accepts training examples and an output
layer that performs a calculation and outputs a value. 

## Infrastructure
We'll load all libraries and pre-built functions here:

In [1]:
from keras.layers import Dense
from keras.models import Sequential
from sklearn.datasets import load_boston
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

def norm_data(X):

    scaler = StandardScaler().fit(X)
    X_norm = scaler.transform(X)

    return X_norm

def pca_transform(n, data, inv = False):
    pca = PCA(n_components = n, random_state = 0)
    data_norm = norm_data(data)

    if inv:
        Z = pca.inverse_transform(Z)*data.std(axis=0) + data.mean(axis=0)
    else:
        Z = pca.fit_transform(data_norm)

    return Z

Using TensorFlow backend.


## Linear and Logistic Regression as Neural Nets
Let's take an algorithm we know-logistic regression-and describe it in terms of 
a neural net. 

![Logistic Regression as Neural Net source: deeplearning.ai](log_reg_as_nn.png){width=50%}

We've seen previously that logistic regression takes the input examples (xi and
yi), learns optimal parameters(wi and b), finds the output $w^TX + b$, 
squeezes the value into (0,1) via the sigmoid function, and finally
classifies each example as 1 (pass) or 0 (fail) based on a typical 0.5
probability threshold. 

Interpreting this as a neural net, the training examples (xi) are the input layer,
and the "neuron" that computes $\sigma\left(w^TX + b\right)$ is the output layer.
In this way, we say logistic regression is also a 1-layer neural network. 

For linear regression, the interpretation of input and output layers is the
same, except $w^TX + b$ is not passed through an activation function like the
sigmoid function, $\sigma$. 

An activation function changes the nature of the values $w^TX + b$. For
example, if we want a probability, we send these values through the sigmoid 
activation function, which returns a value in (0,1). 

## Intro to Neural Network

![Basic Neural Net source: Stanford CS231n](nn.png){width=50%}

The only difference between linear and logistic regression and neural networks
is the number of hidden layers. The smallest neural network has one hidden
layer. The hidden layer computes $w^TX + b$ and passes it through an activation 
function. In the simplest case, these values are passed to the ouput layer, 
where another set of $w^TX + b$ values are computed and passed through another 
activation function such as the sigmoid function. These values are then
classified as 1 (pass) or 0 (fail). 




## Linear Classification with Keras
Keras is a pre-built library which provides us a simple interface to implement
neural networks. Let's try linear classification on the first two principal
components of Boston Housing.

In [2]:
boston = load_boston()
X, y = boston.data, boston.target
y_mean = np.where(y>y.mean(), 1, 0)
X_2D = pca_transform(2, X)

# lin. classification in terms of neural net
model = Sequential()
# this adds the input and output layers
# relu is a different choice than sigmoid for activation function
model.add(Dense(1, input_dim=2, activation='relu'))
# mse (mean squared error) is the loss function used for linear regression
# sgd (stochastic gradient descent) is the optimization algorithm used to 
# find the optimal parameters W
model.compile(loss='mse', optimizer='sgd', metrics=['accuracy'])
# epochs are the number of iterations of sgd to run
# batch size is used to control how many examples are trained at a time
model.fit(X_2D, y_mean, epochs=100, batch_size=10)
loss, _ = model.evaluate(X_2D, y_mean)
# predict returns a probability in (0,1)
probabilities = model.predict(X_2D)
# this converts probs. to 0 or 1
predictions = [float(np.round(x)) for x in probabilities]
accuracy = np.mean(predictions == y_mean)
print("\nLoss: %.2f, Accuracy: %.2f%%" % (loss, accuracy*100))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100

Loss: 0.14, Accuracy: 81.42%


## Logistic Classification with Keras
Now, let's implement logistic regression in a similar manner:

In [3]:
# log. classification in terms of neural net
model = Sequential()
model.add(Dense(1, input_dim=2, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_2D, y_mean, epochs=100, batch_size=10)
loss, _ = model.evaluate(X_2D, y_mean)
probabilities = model.predict(X_2D)
predictions = [float(np.round(x)) for x in probabilities]
accuracy = np.mean(predictions == y_mean)
print("\nLoss: %.2f, Accuracy: %.2f%%" % (loss, accuracy*100))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100

Loss: 0.47, Accuracy: 80.24%


## Neural Net with Keras
Time to add some hidden layers and create a neural net!:

In [4]:
# neural net on same data
model = Sequential()
# this adds the input layer and a 5 neuron hidden layer
model.add(Dense(5,input_dim=2, activation='relu'))
# this adds the one-neuron output layer
model.add(Dense(1, activation='sigmoid'))
# binary crossentropy is the loss used in logistic regression 
# adam is a substitute for gradient descent to find the optimal parameters, W.
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_2D, y_mean, epochs=100, batch_size=10)
loss, _ = model.evaluate(X_2D, y_mean)
probabilities = model.predict(X_2D)
predictions = [float(np.round(x)) for x in probabilities]
accuracy = np.mean(predictions == y_mean)
print("\nLoss: %.2f, Accuracy: %.2f%%" % (loss, accuracy*100))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100

Loss: 0.41, Accuracy: 82.02%


## Exercise
We've seen how to build a neural net with Keras by adding layers. Now, your
task is to build a 3-layer neural net on Boston Housing by additing one 
additional layer. Use the previous examples as a guide and experiment with the
number of neurons in each layer to try to get a better fit:

In [5]:

#---------------Enter Your Code Here---------------------#

model = Sequential()
# Adds the input layer and 5 neuron hidden layer
model.add(Dense(5,input_dim=2, activation='relu'))
# Adds an additional 5 neuron hidden layer
model.add(Dense(5, activation = 'relu'))
# Adds a one neuron output layer
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_2D, y_mean, epochs=100, batch_size=10)
loss, _ = model.evaluate(X_2D, y_mean)
probabilities = model.predict(X_2D)
predictions = [float(np.round(x)) for x in probabilities]
accuracy = np.mean(predictions == y_mean)
print("\nLoss: %.2f, Accuracy: %.2f%%" % (loss, accuracy*100))

#--------------------------------------------------------#

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100

Loss: 0.41, Accuracy: 82.21%


## Advanced: CNNs and RNNs
- Convolutional Neural Networks,CNNs, are a variation of neural networks. They
  are the default choice for image classification at present.
    - Google famously used CNNs to identify cats via millions of cat images.
    - [learn more](https://medium.freecodecamp.org/an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050)
- Recurrent Neural Networks, RNNs, are another form of neural network most
  commonly used for handwriting, speech recognition, or text segmentation. 
    - a common task would be computer generated Shakespeare or Siri identifying
      a question.
    - [learn more](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

## Conclusion
We were introduced to a new tool: neural networks. We saw that linear and
logistic regression were simple cases of neural networks, and we saw how to use
Keras to easily implement a neural network. 

Neural nets work best when there is a nonlinear relationship to learn and there
is a lot of data (millions of rows). The applications are endless and still
currently waiting to be found in every industry. It's an exciting time because,
with some effort, we can be the first to apply deep learning in a novel,
effective way. This is not like physics or math, which have hardly changed for 
hundreds of years.