<img align="left" src="https://lever-client-logos.s3.amazonaws.com/864372b1-534c-480e-acd5-9711f850815c-1524247202159.png" width=200>
<br></br>
<br></br>

## *Data Science Unit 4 Sprint 2*

# Sprint Challenge - Neural Network Foundations

Table of Problems

1. [Defining Neural Networks](#Q1)
2. [Perceptron on XOR Gates](#Q2)
3. [Multilayer Perceptron](#Q3)
4. [Keras MMP](#Q4)

<a id="Q1"></a>
## 1. Define the following terms:

- **Neuron:**

_An artificial neuron is a mathematical function conceived as a model of biological neurons, a neural network. Artificial neurons are elementary units in an artificial neural network. The artificial neuron receives one or more inputs (representing excitatory postsynaptic potentials and inhibitory postsynaptic potentials at neural dendrites) and sums them to produce an output (or activation, representing a neuron's action potential which is transmitted along its axon). Usually each input is separately weighted, and the sum is passed through a non-linear function known as an activation function or transfer function. The transfer functions usually have a sigmoid shape, but they may also take the form of other non-linear functions, piecewise linear functions, or step functions. They are also often monotonically increasing, continuous, differentiable and bounded. The thresholding function has inspired building logic gates referred to as threshold logic; applicable to building logic circuits resembling brain processing. For example, new devices such as memristors have been extensively used to develop such logic in recent times._

- **Input Layer:**

_The Input Layer is what receives input from our dataset. Sometimes it is called the visible layer because it's the only part that is exposed to our data and that our data interacts with directly. Typically node maps are drawn with one input node for each of the different inputs/features/columns of our dataset that will be passed to the network._

- **Hidden Layer:**

_Layers after the input layer are called Hidden Layers. This is because they cannot be accessed except through the input layer. They're inside of the network and they perform their functions, but we don't directly interact with them._

- **Output Layer:**

_The purpose of the output layer is to output a vector of values that is in a format that is suitable for the type of problem that we're trying to address._

- **Activation:**

_Activation functions are used to introduce non-linearity to neural networks. It squashes the values in a smaller range viz. a Sigmoid activation function squashes values between a range 0 to 1. There are many activation functions used in deep learning industry and ReLU, SeLU and TanH are preferred over sigmoid activation function. In this article I have explained the different activation functions available._

<img align="center" src="https://cdn-images-1.medium.com/max/1600/1*p_hyqAtyI8pbt2kEl6siOQ.png" width=800>
<br></br>
<br></br>


- **Backpropagation:**

_Iterative, recursive and efficient method for calculating the weights updates to improve the network until it is able to perform the task for which it is being trained. It is closely related to the Gauss–Newton algorithm._

_Backpropagation requires the derivatives of activation functions to be known at network design time. Automatic differentiation is a technique that can automatically and analytically provide the derivatives to the training algorithm. In the context of learning, backpropagation is commonly used by the gradient descent optimization algorithm to adjust the weight of neurons by calculating the gradient of the loss function; backpropagation computes the gradient(s), whereas (stochastic) gradient descent uses the gradients for training the model (via optimization)._

## 2. Perceptron on AND Gates <a id="Q3=2"></a>

Create a perceptron class that can model the behavior of an AND gate. You can use the following table as your training data:

|x1	|x2|x3|	y|
|---|---|---|---|
1|	1|	1|	1|
1|	0|	1|	0|
0|	1|	1|	0|
0|	0|	1|	0|

In [None]:
import numpy as np

np.random.seed(38)


# Inputs
inputs = np.array(([1, 1, 1],
                   [1, 0, 1],
                   [0, 1, 1],
                   [0, 0, 1]), dtype=float)

# Ground Truth
ground_truth = np.array(([1],
                         [0],
                         [0],
                         [0]), dtype=float)

In [None]:
def sigmoid(x):
    return 1 / (1+np.exp(-x))

def sigmoid_derivative(x):
    sx = sigmoid(x)
    return sx * (1-sx)

In [None]:
weights = 2 *  -  np.random.random((3,1)) - 1
weights

In [None]:
weighted_sum = np.dot(inputs, weights)

weighted_sum

In [None]:
for iteration in range(10000):
    
    # Weighted sum of inputs/weights
    weighted_sum = np.dot(inputs, weights)
    
    #Activate
    activated_output = sigmoid(weighted_sum)
    
    # Calculate the error
    error = ground_truth - activated_output
    
    # Adjustments
    adjustments = error * sigmoid_derivative(activated_output)
    
    weights += np.dot(inputs.T, adjustments)
    
    print("Weights after training")
    print(weights)
    
    print("Output after training")
    print(activated_output)

## 3. Multilayer Perceptron <a id="Q3"></a>

Implement a Neural Network Multilayer Perceptron class that uses backpropagation to update the network's weights.
Your network must have one hidden layer.
You do not have to update weights via gradient descent. You can use something like the derivative of the sigmoid function to update weights.
Train your model on the Heart Disease dataset from UCI:



In [None]:
cd Documents/GitHub/DS-Unit-4-Sprint-2-Neural-Networks/Sprint-Challenge

In [None]:
import pandas as pd
dataset = pd.read_csv('heart.csv')

In [None]:
dataset.head()

In [None]:
X = dataset.values[:,0:13]
print(X.shape)
print(X)

In [None]:
y = dataset.values[:,-1]
print(y.shape)
print(y)

In [None]:
import tensorflow
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
import numpy as np
# fix random seed for reproducibility
np.random.seed(38)

In [None]:
model = Sequential()

In [None]:
model.add(Dense(1, input_dim=13, activation="sigmoid"))

In [None]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

In [None]:
model.fit(X, y, epochs=150, batch_size=32)

In [None]:
sum(y) / len(y) # Predicting target is 70%

In [None]:
scores = model.evaluate(X,y)
print(f"{model.metrics_names[1]}: {scores[1]*100}")

In [None]:
model = Sequential()
model.add(Dense(1, input_dim=13, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=150)

In [None]:
# evaluate the model
scores = model.evaluate(X, y)
print(f"{model.metrics_names[1]}: {scores[1]*100}")

In [None]:
model_improved = Sequential()

# Input + 1 First Hidden
model_improved.add(Dense(10, input_dim=13, activation='relu'))
# Hidden
model_improved.add(Dense(3, activation='sigmoid'))
# Output
model_improved.add(Dense(1, activation='sigmoid'))

model_improved.compile(loss='binary_crossentropy', 
                       optimizer='adam',
                       metrics=['accuracy'])

model_improved.summary()

## 4. Keras MMP <a id="Q4"></a>

Implement a Multilayer Perceptron architecture of your choosing using the Keras library. Train your model and report its baseline accuracy. Then hyperparameter tune at least two parameters and report your model's accuracy.
Use the Heart Disease Dataset (binary classification)
Use an appropriate loss function for a binary classification task
Use an appropriate activation function on the final layer of your network.
Train your model using verbose output for ease of grading.
Use GridSearchCV to hyperparameter tune your model. (for at least two hyperparameters)
When hyperparameter tuning, show you work by adding code cells for each new experiment.
Report the accuracy for each combination of hyperparameters as you test them so that we can easily see which resulted in the highest accuracy.
You must hyperparameter tune at least 5 parameters in order to get a 3 on this section.

In [None]:
import keras
from keras.models import Sequential
from keras.layers import Dense
import pandas as pd
import numpy as np
# fix random seed for reproducibility
np.random.seed(38)

In [None]:
df = pd.read_csv('heart.csv')


df.head()

In [None]:
from sklearn.model_selection import StratifiedKFold

# fix random seed for reproducibility
seed = 38
np.random.seed(seed)


# split into input (X) and output (y) variables
X = df.drop(columns=['target']).values
y = df['target'].values

# define 5-fold cross validation test harness
kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=seed)
cvscores = []
for train, test in kfold.split(X, y):
  # create model
  model = Sequential()
  model.add(Dense(12, input_dim=13, activation='relu'))
  model.add(Dense(8, activation='relu'))
  model.add(Dense(1, activation='sigmoid'))
  # Compile model
  model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy']) # Fit the model
  model.fit(X[train], y[train], epochs=150, batch_size=10, verbose=0)
  # evaluate the model
  scores = model.evaluate(X[test], y[test], verbose=0)
  print(f'{model.metrics_names[1]}: {(scores[1]*100):.2f}%') 
  cvscores.append(scores[1]*100)
print(f'{np.mean(cvscores):.2f}% +/- {np.std(cvscores):.2f}%')

In [None]:
from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier

# fix random seed for reproducibility
seed = 38
np.random.seed(seed)


# split into input (X) and output (y) variables
X = df.drop(columns=['target']).values
y = df['target'].values

# Function to create model, required for KerasClassifier
def create_model():
	# create model
	model = Sequential()
	model.add(Dense(12, input_dim=13, activation='relu'))
	model.add(Dense(1, activation='sigmoid'))
	# Compile model
	model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
	return model

# create model
model = KerasClassifier(build_fn=create_model, verbose=1)

# define the grid search parameters
# batch_size = [10, 20, 40, 60, 80, 100]
# param_grid = dict(batch_size=batch_size, epochs=epochs)

# define the grid search parameters
param_grid = {'batch_size': [10, 20, 40, 60, 80, 100],
              'epochs': [20]}

# Create Grid Search
grid = GridSearchCV(estimator=model, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}") 

In [None]:
# define the grid search parameters
param_grid = {'batch_size': [80, 100],
              'epochs': [20, 40, 60]}

# Create Grid Search
grid = GridSearchCV(estimator=model, cv=10, param_grid=param_grid, n_jobs=1)
grid_result = grid.fit(X, y)

# Report Results
print(f"Best: {grid_result.best_score_} using {grid_result.best_params_}")
means = grid_result.cv_results_['mean_test_score']
stds = grid_result.cv_results_['std_test_score']
params = grid_result.cv_results_['params']
for mean, stdev, param in zip(means, stds, params):
    print(f"Means: {mean}, Stdev: {stdev} with: {param}")