# Creating a Artificial Neural Network (THE CONCEPT)

<br>
<font color="red" size=5>__Step 1 :__ </font><font size=4> Randomly initialize the weights to a small number close to 0 ( but not 0 ) </font>
![downward_arrow](downward_arrow.png)
<font color="red" size=5>__Step 2 :__ </font><font size=4> Input the first observation of your dataset in the input layer, each feature in one input node </font>
![downward_arrow](downward_arrow.png)
<font color="red" size=5>__Step 3 :__ </font><font size=4>__Forward Propagation:__ from left to right, the neurons are activated in a way that the impact of each neuron's activation is limited by weights. Propagate the activations until getting the predicted result y</font>
![downward_arrow](downward_arrow.png)
<font color="red" size=5>__Step 4 :__ </font><font size=4>Compare the predicted result to the actual result. Measure the generated error</font>
![downward_arrow](downward_arrow.png)
<font color="red" size=5>__Step 5 :__ </font><font size=4>__Back Propagation:__ from right to left. the error is back-propagated. Update the weights according to how much they are responsible for the error. The __Learning Rate__ decides how much to update the weights</font>
![downward_arrow](downward_arrow.png)
<font color="red" size=5>__Step 6 :__ </font><font size=4>Repeat steps 1 to 5 and update the weights after each observation (__Stochastic Gradient Descent / Reinforcement Learning__), Or, Repeat steps 1 to 5 but update the weights only after a batch of observations (__Batch Gradient Descent / Batch Learning__)</font>
![downward_arrow](downward_arrow.png)
<font color="red" size=5>__Step 7 :__ </font><font size=4>When the whole training set has been passed through the ANN, this completes 1 __Epoch__. Repeat for more epochs</font>
<br><br><br>

# A Working Code Example 

In [19]:
!python -m pip install --upgrade pip
!pip install keras

Collecting pip
  Using cached https://files.pythonhosted.org/packages/5c/e0/be401c003291b56efc55aeba6a80ab790d3d4cece2778288d65323009420/pip-19.1.1-py2.py3-none-any.whl
Installing collected packages: pip
  Found existing installation: pip 18.0
    Uninstalling pip-18.0:
      Successfully uninstalled pip-18.0
Successfully installed pip-19.1.1


In [37]:
# Importing the libraries
import warnings
warnings.simplefilter("ignore")
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

### *A Classification Problem (Churn Prediction)*

In [38]:
# Importing the dataset
dataset = pd.read_csv('Datasets\Churn_Modelling.csv')
X = dataset.iloc[:, 3:13].values
y = dataset.iloc[:, 13].values
print('X Shape : ', X.shape)
print('y shape : ', y.shape)

X Shape :  (10000, 10)
y shape :  (10000,)


### Data Preprocessing

In [39]:
# Encoding categorical data
# Encoding the Independent Variable
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

# ## label encoding Country
labelencoder_X_1 = LabelEncoder()
X[:, 1] = labelencoder_X_1.fit_transform(X[:, 1])

# ## label encoding Gender
labelencoder_X_2 = LabelEncoder()
X[:, 2] = labelencoder_X_2.fit_transform(X[:, 2])

# ## Since categorical variables are not ordinal, we need to convert them to the one-hot encoded feature 
# ## but we already have a kind of 0/1 encoding for Gender. Hence, we do this only for the Country
onehotencoder = OneHotEncoder(categorical_features = [1])
X = onehotencoder.fit_transform(X).toarray()

# ## Remove one of the dependent variables generated from dummy variables to avoid the Dummy Variable Trap 
# ## Ref: https://www.algosome.com/articles/dummy-variable-trap-regression.html
# ## removing one of the dummy variables from the country's dummy variable
X = X[:, 1:]

In [40]:
# ## Split the dataset to Test and Train Dataset
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

In [41]:
# ## IMPOTANT !! Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

### Creating the Artificial Neural Network

In [42]:
import keras # keras with Tensorflow Backend
from keras.models import Sequential # used to initialize the ANN
from keras.layers import Dense # used to create the layers in the ANN

NOTE:
1. We have 11 independent variable, hence 11 input nodes are needed (Ref. Step 1 in Prev Section)
2. The Neurons are Activated by the Activation Function from left to right in such a way that the higher the value of Activation function is for the neuron, the more impact the neuron is going to have on the entire network ie. more it will pass on the signal from the nodes in the left to the nodes in the right
3. We will choose the Rectilinear Activation Function for hidden layer and Sigmoid Activation Function for the Output Layer, because using sigmoid we could also get the probabilities associated with our predictions

TIPS:
1. Experimentation has shown it's usually good to use __Number of Nodes in Hideen Layer = Avg(Number of nodes in Input Layer, Number of Nodes in Output Layer)__ OR use __Parameter Tuning methods__ to determine the optimal number of Nodes in the Hidden Layer

In [None]:
# ## I define it as sequence of Layers

# ## Initializing the ANN 
classifier = Sequential()

# ## Adding the Input Layer and the first Hidden Layer
# ## we add a Dense layer as the first Hidden Layer 
# ## initialize the weights on the input(previous) layer with numbers close to 0, we use 'uniform' to draw it from a uniform distribution
# ## number of nodes on hidden layer = (input_nodes + output_nodes)/2 = (11+1)/2 = 6 = output_dim
# ## activation function on each of hidden layers is Rectifier Activation function
# ## input dim = number of nodes in the input layer = number of independent variables (need to specify this for 1st hidden layer)
classifier.add(layer=Dense(output_dim=6, kernel_initializer='uniform', activation='relu', input_dim=11)) 

# ## Adding a Second Hidden Layer
# ## NOTE: input_dim is not specified because it knows what input dim to expect from the previous hidden layer
classifier.add(layer=Dense(output_dim=6, kernel_initializer='uniform', activation='relu'))

# ## Adding the Output Layer
# ## output_dim = 1 (a Binary output is required)
# ## We need the probabilities of our predictions --> use Sigmoid Activation Function
# ## NOTE: if there are 3 categories(classes) in output, then output_dim=3 and activation='softmax'
classifier.add(layer=Dense(output_dim=1, kernel_initializer='uniform', activation='relu'))

# ## Compiling the ANN
# ## optimizer -- Algo used to get the optimal set of weights in the Neural Network (Stochastic gradient descent). adam is a type of SGD
# ## loss -- loss function within the specified SGD algo (this is a loss func that optizes the weights using the SGD)
# ## NOTE: We are using Logarithmic loss function here
# ## NOTE: If our output variable is Binary -- log loss func -- 'binary_crossentropy'
# ## NOTE: If our output variable is more than 2 variables -- log loss func -- 'categorical_crossentropy'
# ## matrix -- evaluation criterion used throughout back n forward propagation 
classifier.compile(optimizer='adam', loss='binary_crossentropy', metrics=['binary_accuracy'])

### Fitting the ANN to Training Data

In [55]:
# ## Fit the ANN
# ## Batch size and Number of Epochs need to be derieved from Parameter Tuning ( but here we use some fixed values )
classifier.fit(x=X_train, y=y_train, batch_size=10, epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 83/100
Epoch 84/100
Epoch 85/100
Epoch 86/100
Epoch 87/100
Epoch 88/100
Epoch 89/100
Epoch 90/100
Epoch 91/100
Epoch 92/100
Epoch 93/100
Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100


<keras.callbacks.History at 0x2c4b4192e80>

### Predictions and Evaluations

In [59]:
# ## Making Predictions on the Test Set ( Returns the Probabilities)
y_pred = classifier.predict(X_test)
# ## changing the probabilities to actualy 1/0 values
y_pred = (y_pred > 0.5) # if the value of y_pred is < 0.5, then 0 otherwise 1

# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test, y_pred)

array([[1517,   78],
       [ 190,  215]], dtype=int64)

<br/><br/>
# *Notes*

 ## Point 1 :::
 ### A Simple Neural Network with just __1 Neuron__ like the architecture below is called a __Perceptron Model__
 <br/>
 ![perceptron](images/ANN_perceptron.png) 
 <br/>
 ### Now if we use a __Sigmoid Activation Function__ for this Perceptron we get a __Logistic Regression__
 <br/>
 ![logistic_from_perceptron](images/ANN_logistic_from_perceptron.png)

## Point 2 :::
 
### Keras Regression Metrics
  
 * __Mean Squared Error__: mean_squared_error, MSE or mse
 * __Mean Absolute Error__: mean_absolute_error, MAE, mae
 * __Mean Absolute Percentage Error__: mean_absolute_percentage_error, MAPE, mape
 * __Cosine Proximity__: cosine_proximity, cosine

### Keras Classification Metrics
  
 * __Binary Accuracy__: binary_accuracy, acc
 * __Categorical Accuracy__: categorical_accuracy, acc
 * __Sparse Categorical Accuracy__: sparse_categorical_accuracy
 * __Top k Categorical Accuracy__: top_k_categorical_accuracy (requires you specify a k parameter)
 * __Sparse Top k Categorical Accuracy__: sparse_top_k_categorical_accuracy (requires you specify a k parameter)
 
#### Quick Read: [Keras Metrics] (https://machinelearningmastery.com/custom-metrics-deep-learning-keras-python/)
  