# Artificial Neural Network 

What is a neural network? the building block of a neural network is the Neuron, we will cover the following 
in this note.
- The Neuron
- Activation Function
- How do they work ? 
- How NN learns? 
- What is Gradient Descent? 
- Stochastic Gradient Descent
- Backpropogation


#### The Neuron : 
The whole purpose of deep learning is to mimic human brain becasue human brain is one of the most powerful learning mechanism on the planet. We first have to recreate a neuron. You can look at the oldest drawing of the nueron in this link 
https://www.the-scientist.com/foundations/the-first-neuron-drawings-1870s-34751

The neuron has a body which is called the neuron, then it has Dendrites and Axons. Neuron on its own is pretty much useless,But when they work together, they can work wonders. Here is how it works, the signals from one neuron travels through its axon and reaches the dendrites of the other neuron and that's how they connect. Ok, so now let move back to technology and see how we can recreate neuroscience. In computer jargon, the neuron is a node which then receives inputs signals very much like Human brain which is like a black box that only receive input signals through your 5 senses. Then the processing occurs in the node (activation) and outputs the processed result. Output value could be continuous, categorical or binary depending on the case.
The input values are all assigned an initial weight, and activation function looks at a weighted sum which depending on the function we choose (more later) the neuron will either pass the signal or not, we will delve deeper in a bit.

Always remember to normalize or standardize your input layer before feeding it into the neuron. (check out this paper for standardization and normalization : Efficient BackProp by Yann LeCun 1998).


#### The Activation Function
We have many types of activation function, we can mention Threshold Function (is not smooth, has kinks but good for binary output cases cuz it returns only two values), Sigmoid function (it is smooth, it's good for probabilities), rectifier function (it's popular for ANNs), Hyperbolic Tangent tanh (values go between -1 to 1) (read more here: Deep sparse rectifier neural networks Xavier Glorot et all 2011
http://proceedings.mlr.press/v15/glorot11a/glorot11a.pdf
)

#### How do NNs work ?
let's look at a case for example property values. Suppose we have the following inputs: Area, Number of bedrooms, Distance to downtown of major city, age of the building. These inputs are passed with their initial weights on to the neuron which consists of hidden layers. each node in the hidden layer looks at the potential correlation between inputs for instance if the combination of area and distance both contirbute to the price then one node would pick up on that. The hidden layer picks up on the combination of inputs that have significance in determining the price. Note that we are talking about an already trained neural network.

#### How do NNs learn ? 
There are two ways to go about it, you could hard code it and tell the code what to look for or provide a trained neural network and allow it to learn on its own. If you want to see which image is cat or dog you can tell the machine cat ears look like this or that or you could provide the nn with a set of cats and dogs and allow the perceptrons figure out and learn what it means to be cat or a dog. We provide those actual values, the nn's output will be measure against the actual and it will update the weights until they are as close as they can get. The learning is an iterative process basically (every time that you train your nn it is called an epoch), keep adjusting the weights and try to minimize your gradient descent. 

#### Gradient Descent
In order for neural network to learn, we need to adjust our weights.  Cost function needs to minimized, we could do it through brute force and try thousands of weights and see which one minimizes the GD but the curse of dimentionality will get you in the end, so yah! not great! sometimes even sunway Taihulight won't get it! Instead lets look at downhil method or Gradient descent, basically take the slope, if slope is negative means you are going down next, if positive you're goin up, keep taking those steps until you find the optimum. This way you take fewer steps until you descent to the minimum of the cost function.

#### Stochastic Gradient Descent
There is a problem with cost functions that are not convex, we could get stuck in local minimum as opposed to the global one. In order to bypass this problem, we use stochastic gradient descent. In case of convex cost function, it's commong to use batches to adjust the weight, we take a whole batch and keep training and retraining, but the stochastic approach take one row at a time and adjusts the weights and then moves on to the next row. this way you bypass local minima in cases of non-convex cost function. you could combine the two, for instance run a few rows at a time, it's called mini batch, also useful!
(read more Neural Network in 13 lines of Python Andrew Trask 2015)

#### Backpropogation
This is a process through which the weights are being updates, and it's a sophisticated mathematical process. The details of the math can be found in this book "Neural Networks and Deep Learning" by Michael Nielsen 2015. All we need to know is that the weights are updated at once.



In [None]:
#This code shows how to use artificial neural nework to predict the outcome
#of a single ovservation. In this case, predict user churn. The performance can be 
#validated using k-fold cross validation. We tackle overfitting and drop out
#as well as parameter tuning.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values

# Encoding categorical data
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()
X = X[:, 1:]

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Part 2 - Now let's make the ANN!

# Importing the Keras libraries and packages
import keras
from keras.models import Sequential
from keras.layers import Dense

# Initialising the ANN
classifier = Sequential()

# Adding the input layer and the first hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu', input_dim = 11))

# Adding the second hidden layer
classifier.add(Dense(units = 6, kernel_initializer = 'uniform', activation = 'relu'))

# Adding the output layer
classifier.add(Dense(units = 1, kernel_initializer = 'uniform', activation = 'sigmoid'))

# Compiling the ANN
classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

# Fitting the ANN to the Training set
classifier.fit(X_train, y_train, batch_size = 10, epochs = 100)

# Part 3 - Making predictions and evaluating the model

# Predicting the Test set results
y_pred = classifier.predict(X_test)
y_pred = (y_pred > 0.5)

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)