<div class="table-of-contents" style="background-color:#000000; padding: 20px; margin: 10px; font-size: 110%; border-radius: 25px; box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);">
  <h1 style="color:#00F1FF;">TOC</h1>
  <ol>
    <li><a href="#1" style="color: #00F1FF;">1. Overview</a></li>
     <li><a href="#2" style="color: #00F1FF;">2. Imports</a></li>
    <li><a href="#3" style="color: #00F1FF;">3. Data Analysis</a></li>
    <li><a href="#4" style="color: #00F1FF;">4. Data Preprocessing</a></li>
    <li><a href="#5" style="color: #00F1FF;">5. Model Implementation Helper Functions </a></li>
    <li><a href="#6" style="color: #00F1FF;">6. Model Implementation</a></li>
    <li><a href="#7" style="color: #00F1FF;">7. Evaluation</a></li>
    <li><a href="#8" style="color: #00F1FF;">8. Thank You</a></li>  
  </ol>
</div>

<a id="1"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #00F1FF;'>Overview</center></h1>

# Overview
  

    
**Neural networks are often regarded as opaque, or "black boxes", due to their complex and abstract nature. However, many individuals, including myself, desire a deeper understanding of how they operate. in this notebook we will delve into the implementation and optimization of neural networks from scratch.**

**Our aim is to explore the underlying principles and mechanisms of neural networks, and develop a practical understanding of their inner workings. We will begin by implementing a basic neural network, utilizing fundamental concepts and techniques.**

**With this approach, we can gain a more profound understanding of neural networks, and develop the ability to tailor them to specific tasks and objectives. So, let us begin this journey of discovery and learning.**

<a id="2"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #00F1FF;'>Imports</center></h1>

# Imports
  

In [None]:
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px


from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score

<a id="3"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #00F1FF;'>Data Analysis</center></h1>

# Data Analysis
  

In [None]:
df = pd.read_csv('/kaggle/input/breast-cancer-dataset/breast-cancer.csv')
df.head()

In [None]:
px.histogram(data_frame=df, x='diagnosis', color='diagnosis',color_discrete_sequence=['#05445E','#75E6DA'])


In [None]:
px.histogram(data_frame=df,x='area_mean',color='diagnosis',color_discrete_sequence=['#05445E','#75E6DA'])

In [None]:
px.histogram(data_frame=df,x='radius_mean',color='diagnosis',color_discrete_sequence=['#05445E','#75E6DA'])

In [None]:
px.histogram(data_frame=df,x='perimeter_mean',color='diagnosis',color_discrete_sequence=['#05445E','#75E6DA'])

In [None]:
px.histogram(data_frame=df,x='smoothness_mean',color='diagnosis',color_discrete_sequence=['#05445E','#75E6DA'])

In [None]:
px.histogram(data_frame=df,x='texture_mean',color='diagnosis',color_discrete_sequence=['#05445E','#75E6DA'])

In [None]:
px.scatter(data_frame=df,x='symmetry_worst',color='diagnosis',color_discrete_sequence=['#05445E','#75E6DA'])


In [None]:
px.scatter(data_frame=df,x='concavity_worst',color='diagnosis',color_discrete_sequence=['#05445E','#75E6DA'])


In [None]:
px.scatter(data_frame=df,x='fractal_dimension_worst',color='diagnosis',color_discrete_sequence=['#05445E','#75E6DA'])


<a id="4"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #00F1FF;'>Data Preprocessing</center></h1>

# Data Preprocessing
  

In [None]:
df = pd.read_csv('/kaggle/input/breast-cancer-dataset/breast-cancer.csv')
                 
df.head()

In [None]:
df.drop('id', axis=1, inplace=True) #drop redundant columns

In [None]:
df.describe().T

## Encode the target

In [None]:
df['diagnosis'] = (df['diagnosis'] == 'M').astype(int) #encode the label into 1/0

## Get highly correlated features

In [None]:
corr = df.corr()
plt.figure(figsize=(20,20))
sns.heatmap(corr, cmap='mako_r',annot=True)
plt.show()

In [None]:
# Get the absolute value of the correlation
cor_target = abs(corr["diagnosis"])

# Select highly correlated features (thresold = 0.2)
relevant_features = cor_target[cor_target>0.2]

# Collect the names of the features
names = [index for index, value in relevant_features.iteritems()]

# Drop the target variable from the results
names.remove('diagnosis')

# Display the results
print(names)

## Assign data and labels

In [None]:
X = df[names].values
y = df['diagnosis'].values

In [None]:
def train_test_split(X, y, random_state=41, test_size=0.2):
    """
    Splits the data into training and testing sets.

    Parameters:
        X (numpy.ndarray): Features array of shape (n_samples, n_features).
        y (numpy.ndarray): Target array of shape (n_samples,).
        random_state (int): Seed for the random number generator. Default is 42.
        test_size (float): Proportion of samples to include in the test set. Default is 0.2.

    Returns:
        Tuple[numpy.ndarray]: A tuple containing X_train, X_test, y_train, y_test.
    """
    # Get number of samples
    n_samples = X.shape[0]

    # Set the seed for the random number generator
    np.random.seed(random_state)

    # Shuffle the indices
    shuffled_indices = np.random.permutation(np.arange(n_samples))

    # Determine the size of the test set
    test_size = int(n_samples * test_size)

    # Split the indices into test and train
    test_indices = shuffled_indices[:test_size]
    train_indices = shuffled_indices[test_size:]

    # Split the features and target arrays into test and train
    X_train, X_test = X[train_indices], X[test_indices]
    y_train, y_test = y[train_indices], y[test_indices]

    return X_train, X_test, y_train, y_test

In [None]:
def scale(X):
    """
    Standardizes the data in the array X.

    Parameters:
        X (numpy.ndarray): Features array of shape (n_samples, n_features).

    Returns:
        numpy.ndarray: The standardized features array.
    """
    # Calculate the mean and standard deviation of each feature
    mean = np.mean(X, axis=0)
    std = np.std(X, axis=0)

    # Standardize the data
    X = (X - mean) / std
    return X


In [None]:
X = scale(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=42) #split the  data into traing and validating


<a id="5"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #00F1FF;'>Model Implementation Helper Functions</center></h1>

# Model Implementation Helper Functions
  

## Activation Functions


**The Rectified Linear Unit (ReLU) is a simple, yet highly effective activation function commonly used in Neural Networks. It is defined as:**

**\begin{equation}
f(Z) = max(0, Z)
\end{equation}**

**where $Z$ is the input to the function.**

**ReLU sets all negative values of $Z$ to zero, and leaves the positive values unchanged. This non-linear activation function helps Neural Networks model complex non-linear relationships between inputs and outputs, allowing them to learn more complex representations of the data.**

**In addition to its effectiveness in Neural Networks, ReLU is also computationally efficient and easy to implement.**

In [None]:
def relu(Z):
    """
    Implement the ReLU function.

    Arguments:
    Z -- Output of the linear layer

    Returns:
    A -- Post-activation parameter
    cache -- used for backpropagation
    """
    A = np.maximum(0,Z)
    cache = Z 
    return A, cache

In [None]:
z = np.linspace(-12, 12, 200)
fig = px.line(x=z, y=relu(z)[0],title='ReLU Function',template="plotly_dark")
fig.update_layout(
    title_font_color="#00F1FF", 
    xaxis=dict(color="#00F1FF"), 
    yaxis=dict(color="#00F1FF") 
)
fig.show()

## The derivative of the ReLU function can be computed as:

**\begin{equation}
f'(Z) = \begin{cases}
0, & \text{if } Z \leq 0 \
1, & \text{if } Z > 0
\end{cases}
\end{equation}**

In [None]:
def relu_backward(dA, cache):
    """
    Implement the backward propagation for a single ReLU unit.

    Arguments:
    dA -- post-activation gradient
    cache -- 'Z'  stored for backpropagation

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    Z = cache
    dZ = np.array(dA, copy=True) 
    # When z <= 0, dz is equal to 0 as well. 
    dZ[Z <= 0] = 0
    
    return dZ

### Sigmoid

**The Sigmoid function is a common activation function used in Neural Networks, particularly for binary classification problems. It is represented by the following formula:**

**\begin{equation}
f(Z) = \frac{1}{1+e^{-Z}}
\end{equation}**

**where $Z$ is the input to the function.**

**The Sigmoid function maps any real-valued number to a value between 0 and 1, which can be interpreted as a probability. In binary classification problems, we often use the Sigmoid function as the activation function for the output layer of the Neural Network, since it can be used to compute the probability of the input belonging to the positive class.**





In [None]:
def sigmoid(Z):
    """
    Implement the Sigmoid function.

    Arguments:
    Z -- Output of the linear layer

    Returns:
    A -- Post-activation parameter
    cache -- a python dictionary containing "A" for backpropagation
    """
    A = 1/(1+np.exp(-Z))
    cache = Z
    return A, cache

In [None]:
z = np.linspace(-12, 12, 200)
fig = px.line(x=z, y=sigmoid(z)[0],title='Sigmoid Function',template="plotly_dark")
fig.update_layout(
    title_font_color="#00F1FF", 
    xaxis=dict(color="#00F1FF"), 
    yaxis=dict(color="#00F1FF") 
)
fig.show()

## The derivative of the Sigmoid function can be computed as:

\begin{equation}
f'(Z) = f(Z)(1-f(Z))
\end{equation}

In [None]:
def sigmoid_backward(dA, cache):
    """
    Implement the backward propagation for a single sigmoid unit.

    Arguments:
    dA -- post-activation gradient
    cache -- 'Z' stored during forward pass

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    Z = cache
    s = 1/(1+np.exp(-Z))
    dZ = dA * s * (1-s)
    return dZ

<a id="6"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #00F1FF;'>Model Implementation</center></h1>

# Model Implementation
  

**As always we'll be using [vectorized implementation](https://www.kaggle.com/code/fareselmenshawii/vectorization/edit/run/109247974)**

# How the Algorithm works

**Initialize Parameters: We start by initializing the weights and biases of the model. For each layer $l$ in the network, we initialize $W^{[l]}$ to be a matrix with dimensions $(n^{[l]}, n^{[l-1]})$, where $n^{[l]}$ is the number of units in layer $l$ and $n^{[l-1]}$ is the number of units in the previous layer. We also initialize $b^{[l]}$ to be a vector with dimensions $(n^{[l]}, 1)$.**

**Forward Propagation: In the forward pass, we propagate through the network calculating the output of every layer. For each layer $l$ in the network, we calculate:**

**\begin{equation}
Z^{[l]} = W^{[l]}.A^{[l-1]} +b^{[l]}
\end{equation}**

**\begin{equation}
A^{[l]} = g(Z^{[l]})
\end{equation}**

**where $A^{[0]} = X$ is the input to the network and $g(.)$ is the activation function used in layer $l$**.

**Compute Cost: We calculate the cost function to determine how well we are doing. For binary classification problems, we often use the binary cross-entropy to measure our network performance. The cost function can be computed as:**

**\begin{equation}
J = -\frac{1}{m} \sum_{i=1}^{m} \left[y^{(i)} \log\left(a^{L}\right) + (1-y^{(i)})\log\left(1- a^{L}\right)\right]
\end{equation}**

**where $a^{L}$ is the predicted output of the network for the $i$-th input example and $y^{(i)}$ is the true output for the $i$-th input example.**

**Backpropagation: In the backward pass, we compute the derivatives of the loss function with respect to the parameters of the network using the chain rule of differentiation. Specifically, we calculate the derivatives of the cost function with respect to $Z^{[l]}$, which can then be used to calculate the derivatives of the cost function with respect to $W^{[l]}$ and $b^{[l]}$.**

**Updating Parameters: We update the parameters of the network using gradient descent, which tries to reduce the cost by adjusting the parameters in the opposite direction of the gradient. The gradient descent rule is, for each layer $l$ in the network:**

**\begin{equation}
W^{[l]} = W^{[l]} - \alpha \text{ } dW^{[l]}
\end{equation}**

**\begin{equation}
b^{[l]} = b^{[l]} - \alpha \text{ } db^{[l]}
\end{equation}**

**where $\alpha$ is the learning rate and $dW^{[l]}$ and $db^{[l]}$ are the derivatives of the cost function with respect to $W^{[l]}$ and $b^{[l]}$, respectively.**


In [None]:
class NeuralNetwork:
    def __init__(self, layer_dimensions=[25,16,16,1],learning_rate=0.00001):
        """
        Parameters
        ----------

        layer_dimensions : list
            python array (list) containing the dimensions of each layer in our network
                
        learning_rate :  float
            learning rate of the network.

        """
        self.layer_dimensions = layer_dimensions
        self.learning_rate = learning_rate
        
        
    def initialize_parameters(self):
        """initializes the parameters"""
        np.random.seed(3)
        self.n_layers =  len(self.layer_dimensions)
        for l in range(1, self.n_layers):
            vars(self)[f'W{l}'] = np.random.randn(self.layer_dimensions[l], self.layer_dimensions[l-1]) * 0.01
            vars(self)[f'b{l}'] = np.zeros((self.layer_dimensions[l], 1))

    
    def _linear_forward(self, A, W, b):
        """
        Implements the linear part of a layer's forward propagation.

        Arguments:
        A -- activations from previous layer (size of previous layer, number of examples)
        W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
        b -- bias vector, numpy array of shape (size of the current layer, 1)

        Returns:
        Z -- pre-activation parameter 
        cache -- a python tuple containing "A", "W" and "b"  for backpropagation
        """
        # Compute Z
        Z = np.dot(W,A) + b
        # Cache  A, W , b for backpropagation
        cache = (A, W, b)
        return Z, cache
    
    def _forward_propagation(self,A_prev ,W ,b , activation):
        """
        Implements the forward propagation for a network layer

        Arguments:
        A_prev -- activations from previous layer, shape : (size of previous layer, number of examples)
        W -- shape : (size of current layer, size of previous layer)
        b -- shape : (size of the current layer, 1)
        activation -- the activation to be used in this layer

        Returns:
        A -- the output of the activation function 
        cache -- a python tuple containing "linear_cache" and "activation_cache" for backpropagation
        """
        
        # Compute Z using the function defined above, compute A using the activaiton function
        if activation == "sigmoid":
            Z, linear_cache = self._linear_forward(A_prev, W, b)
            A, activation_cache = sigmoid(Z) 
        elif activation == "relu":
            Z, linear_cache = self._linear_forward(A_prev, W, b) 
            A, activation_cache = relu(Z) 
            #Store the cache for backpropagation
        cache = (linear_cache, activation_cache)
        return A, cache
    
    
    def forward_propagation(self, X):
        """
        Implements forward propagation for the whole network

        Arguments:
        X --  shape : (input size, number of examples)

        Returns:
        AL -- last post-activation value
        caches -- list of cache returned by _forward_propagation helper function
        """
        # Initialize empty list to store caches
        caches = []
        # Set initial A to X 
        A = X
        L =  self.n_layers -1
        for l in range(1, L):
            A_prev = A 
            # Forward propagate through the network except the last layer
            A, cache = self._forward_propagation(A_prev, vars(self)['W' + str(l)], vars(self)['b' + str(l)], "relu")
            caches.append(cache)
        # Forward propagate through the output layer and get the predictions
        predictions, cache = self._forward_propagation(A, vars(self)['W' + str(L)], vars(self)['b' + str(L)], "sigmoid")
        # Append the cache to caches list recall that cache will be (linear_cache, activation_cache)
        caches.append(cache)

        return predictions, caches
    
    def compute_cost(self, predictions, y):
        """
        Implements the cost function 

        Arguments:
        predictions -- The model predictions, shape : (1, number of examples)
        y -- The true values, shape : (1, number of examples)

        Returns:
        cost -- cross-entropy cost
        """
        # Get number of training examples
        m = y.shape[0]
        # Compute cost we're adding small epsilon for numeric stability
        cost = (-1/m) * (np.dot(y, np.log(predictions+1e-9).T) + np.dot((1-y), np.log(1-predictions+1e-9).T))
        # squeeze the cost to set it into the correct shape 
        cost = np.squeeze(cost)
        return cost   
        
    def _linear_backward(self, dZ, cache):
        """
        Implements the linear portion of backward propagation 

        Arguments:
        dZ -- Gradient of the cost with respect to the linear output of the current layer 
        cache -- tuple of values (A_prev, W, b) coming from the forward propagation in the current layer

        Returns:
        dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
        dW -- Gradient of the cost with respect to W (current layer l), same shape as W
        db -- Gradient of the cost with respect to b (current layer l), same shape as b
        """
        # Get the cache from forward propagation
        A_prev, W, b = cache
        # Get number of training examples
        m = A_prev.shape[1]
        # Compute gradients for W, b and A
        dW = (1/m) * np.dot(dZ, A_prev.T)
        db = (1/m) * np.sum(dZ, axis=1, keepdims=True)
        dA_prev = np.dot(W.T,dZ)
        return dA_prev, dW, db
    
            
    def _back_propagation(self, dA, cache, activation):
        """
        Implements the backward propagation for a single layer.

        Arguments:
        dA -- post-activation gradient for current layer l 
        cache -- tuple of values (linear_cache, activation_cache) 
        activation -- the activation to be used in this layer

        Returns:
        dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
        dW -- Gradient of the cost with respect to W (current layer l), same shape as W
        db -- Gradient of the cost with respect to b (current layer l), same shape as b
        """
        # get the cache from forward propagation and activation derivates function
        linear_cache, activation_cache = cache
        # compute gradients for Z depending on the activation function
        if activation == "relu":
            dZ = relu_backward(dA, activation_cache)

        elif activation == "sigmoid":
            dZ = sigmoid_backward(dA, activation_cache)
        # Compute gradients for W, b and A 
        dA_prev, dW, db = self._linear_backward(dZ, linear_cache)
        return dA_prev, dW, db

    def back_propagation(self, predictions, Y, caches):
        """
        Implements the backward propagation for the NeuralNetwork

        Arguments:
        Prediction --  output of the forward propagation 
        Y -- true label
        caches -- list of caches
        """
        L =  self.n_layers - 1
        # Get number of examples
        m = predictions.shape[1]
        Y = Y.reshape(predictions.shape) 
        # Initializing the backpropagation we're adding a small epsilon for numeric stability 
        dAL = - (np.divide(Y, predictions+1e-9) - np.divide(1 - Y, 1 - predictions+1e-9))
        current_cache = caches[L-1] # Last Layer
        # Compute gradients of the predictions
        vars(self)[f'dA{L-1}'], vars(self)[f'dW{L}'], vars(self)[f'db{L}'] = self._back_propagation(dAL, current_cache, "sigmoid")
        for l in reversed(range(L-1)):
            # update the cache
            current_cache = caches[l]
            # compute gradients of the network layers 
            vars(self)[f'dA{l}'] , vars(self)[f'dW{l+1}'], vars(self)[f'db{l+1}'] = self._back_propagation(vars(self)[f'dA{l + 1}'], current_cache, activation = "relu")
            


    def update_parameters(self):
            """
            Updates parameters using gradient descent
            """
            L = self.n_layers - 1
            # Loop over parameters and update them using computed gradients
            for l in range(L):
                vars(self)[f'W{l+1}'] = vars(self)[f'W{l+1}'] - self.learning_rate * vars(self)[f'dW{l+1}']
                vars(self)[f'b{l+1}']  = vars(self)[f'b{l+1}'] - self.learning_rate * vars(self)[f'db{l+1}']
                

    def fit(self,X, Y, epochs=2000, print_cost=True):
            """
            Trains the Neural Network using input data
            
            Arguments:
            X -- input data
            Y -- true "label" 
            Epochs -- number of iterations of the optimization loop
            print_cost -- If set to True, this will print the cost every 100 iterations 
            """
            # Transpose X to get the correct shape
            X = X.T
            np.random.seed(1)
            #create empty array to store the costs
            costs = [] 
            # Get number of training examples
            m = X.shape[1]                           
            # Initialize parameters 
            self.initialize_parameters()
            # loop for stated number of epochs
            for i in range(0, epochs):
                # Forward propagate and get the predictions and caches
                predictions, caches = self.forward_propagation(X)
                #compute the cost function
                cost = self.compute_cost(predictions, Y)
                # Calculate the gradient and update the parameters
                self.back_propagation(predictions, Y, caches)

                self.update_parameters()


                # Print the cost every 10000 training example
                if print_cost and i % 5000 == 0:
                    print("Cost after iteration {}: {}".format(i, np.squeeze(cost)))
                if print_cost and i % 5000 == 0:
                    costs.append(cost)
            if print_cost:         
            # Plot the cost over training    
                fig = px.line(y=np.squeeze(costs),title='Cost',template="plotly_dark")
                fig.update_layout(
                    title_font_color="#00F1FF", 
                    xaxis=dict(color="#00F1FF"), 
                    yaxis=dict(color="#00F1FF") 
                )
                fig.show()


    def predict(self,X,y):
        """
        uses the trained model to predict given X value

        Arguments:
        X -- data set of examples you would like to label
        y -- True values of examples; used for measuring the model's accuracy
        Returns:
        predictions -- predictions for the given dataset X
        """
        X = X.T
        # Get predictions from forward propagation
        predictions, _ = self.forward_propagation(X)
        # Predictions Above 0.5 are True otherwise they are False
        predictions = (predictions > 0.5)
        # Squeeze the predictions into the correct shape and cast true/false values to 1/0
        predictions = np.squeeze(predictions.astype(int))
        #Print the accuracy
        return np.sum((predictions == y)/X.shape[1]), predictions.T

<a id="7"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #00F1FF;'>Evaluation</center></h1>

# Evaluation
  

## Create evaluation function

In [None]:
def train_evaluate_model(X_train, y_train, X_test, y_test, learning_rate, layer_dimensions, epochs):
    '''
    Keyword arguments:
    X_train -- Training data
    y_train -- Traing labels
    X_train -- test data
    y_train -- test labels
    layer_dimensions -- python array (list) containing the dimensions of each layer in our network
    learning_rate --  learning rate of the network.
    Epochs -- number of iterations of the optimization loop
    returns a dataframe 
    '''
    # create model instance with the given hyperparameters
    model = NeuralNetwork(learning_rate=learning_rate,layer_dimensions=layers)
    # fit the model
    model.fit(X_train, y_train,epochs=epochs,print_cost=False)
    accuracy, predictions = model.predict(X_test, y_test) # calculate accuracy and predictions
    
    #create a dataframe to visualize the results
    eval_df = pd.DataFrame([[learning_rate, layer_dimensions, epochs, accuracy]], columns=['Learning_Rate', 'Layers', 'Epochs', 'Accuracy'])
    return eval_df

In [None]:
learning_rate = 0.001
layers = [25,1,1]
epochs = 3000
results = train_evaluate_model(X_train, y_train, X_test, y_test, learning_rate=learning_rate, layer_dimensions=layers, epochs=epochs)

In [None]:
results.index = ['Model_1']

In [None]:
results.style.background_gradient(cmap =sns.cubehelix_palette(start=.5, rot=-.5, as_cmap=True))

In [None]:
learning_rate = 0.001
layers = [25,16,1]
epochs = 3000
temp_df = train_evaluate_model(X_train, y_train, X_test, y_test, learning_rate=learning_rate, layer_dimensions=layers, epochs=epochs)
temp_df.index = ['Model_2']
results = results.append(temp_df)

In [None]:
results.style.background_gradient(cmap =sns.cubehelix_palette(start=.5, rot=-.5, as_cmap=True))

In [None]:
learning_rate = 0.0001
layers = [25,16,1]
epochs = 3000
temp_df = train_evaluate_model(X_train, y_train, X_test, y_test, learning_rate=learning_rate, layer_dimensions=layers, epochs=epochs)
temp_df.index = ['Model_3']
results = results.append(temp_df)

In [None]:
results.style.background_gradient(cmap =sns.cubehelix_palette(start=.5, rot=-.5, as_cmap=True))

In [None]:
learning_rate = 0.0001
layers = [25,16,1]
epochs = 30000
temp_df = train_evaluate_model(X_train, y_train, X_test, y_test, learning_rate=learning_rate, layer_dimensions=layers, epochs=epochs)
temp_df.index = ['Model_4']
results = results.append(temp_df)

In [None]:
results.style.background_gradient(cmap =sns.cubehelix_palette(start=.5, rot=-.5, as_cmap=True))

In [None]:
learning_rate = 0.0001
layers = [25,16,16,1]
epochs = 30000
temp_df = train_evaluate_model(X_train, y_train, X_test, y_test, learning_rate=learning_rate, layer_dimensions=layers, epochs=epochs)
temp_df.index = ['Model_5']
results = results.append(temp_df)

In [None]:
results.style.background_gradient(cmap =sns.cubehelix_palette(start=.5, rot=-.5, as_cmap=True))

In [None]:
learning_rate = 0.0001
layers = [25,16,16,16,1]
epochs = 30000
temp_df = train_evaluate_model(X_train, y_train, X_test, y_test, learning_rate=learning_rate, layer_dimensions=layers, epochs=epochs)
temp_df.index = ['Model_6']
results = results.append(temp_df)

In [None]:
results.style.background_gradient(cmap =sns.cubehelix_palette(start=.5, rot=-.5, as_cmap=True))

In [None]:
learning_rate = 0.0001
layers = [25,32,32,1]
epochs = 30000
temp_df = train_evaluate_model(X_train, y_train, X_test, y_test, learning_rate=learning_rate, layer_dimensions=layers, epochs=epochs)
temp_df.index = ['Model_7']
results = results.append(temp_df)

In [None]:
results.style.background_gradient(cmap =sns.cubehelix_palette(start=.5, rot=-.5, as_cmap=True))

In [None]:
learning_rate = 0.0001
layers = [25,128,128,1]
epochs = 30000
temp_df = train_evaluate_model(X_train, y_train, X_test, y_test, learning_rate=learning_rate, layer_dimensions=layers, epochs=epochs)
temp_df.index = ['Model_8']
results = results.append(temp_df)

In [None]:
results.style.background_gradient(cmap =sns.cubehelix_palette(start=.5, rot=-.5, as_cmap=True))

## View training process for the best results "Model 5 with lowest runtime and highest accuracy"

In [None]:
model = NeuralNetwork(learning_rate=0.0001)

model.fit(X_train, y_train,epochs=30000,print_cost=True)


In [None]:
accuracy,predictions = model.predict(X_test, y_test)

<a id="8"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #00F1FF;'>Thank You</center></h1>

  
# Thank you
**Thank you for going through this notebook**

**If you have any suggestions please let me know**

<div style="padding:10px; 
            color:#333333;
            margin:10px;
            font-size:150%;
            display:fill;
            border-radius:1px;
            border-style:solid;
            border-color:#666666;
            background-color:#F9F9F9;
            overflow:hidden;">
    <center>
        <a id='top'></a>
        <b>Machine Learning From Scratch Series</b>
    </center>
    <br>
    <ul>
        <li>
            <a href="https://www.kaggle.com/code/fareselmenshawii/linear-regression-from-scratch" style="color:#0072B2">1 - Linear Regression</a>
        </li>
        <li>
            <a href="https://www.kaggle.com/code/fareselmenshawii/logistic-regression-from-scratch" style="color:#0072B2">2 -  Logistic Regression</a>
        </li>
        <li>
            <a href="https://www.kaggle.com/code/fareselmenshawii/kmeans-from-scratch" style="color:#0072B2">3 - KMeans</a>
        </li>
        <li>
            <a href="https://www.kaggle.com/code/fareselmenshawii/decision-tree-classifier-from-scratch" style="color:#0072B2">4 - Decision Trees</a>
        </li> 
        <li>
            <a href="https://www.kaggle.com/code/fareselmenshawii/random-forest-classifier-from-scratch" style="color:#0072B2">5 -  Random Forest</a>
        </li>
        <li>
            <a href="https://www.kaggle.com/code/fareselmenshawii/knn-from-scratch" style="color:#0072B2">6 - KNearestNeighbor</a>
        </li>
        <li>
            <a href="https://www.kaggle.com/code/fareselmenshawii/pca-from-scratch?scriptVersionId=121402593" style="color:#0072B2">7 - PCA</a>
        </li>
        <li>
            <a href="https://www.kaggle.com/code/fareselmenshawii/svm-from-scratch" style="color:#0072B2">8 - SVM</a>
        </li>
        <li>
            <a href="https://www.kaggle.com/code/fareselmenshawii/naive-bayes-from-scratch" style="color:#0072B2">9 - Naive Baye</a>
        </li>
        <li>
            <a href="https://www.kaggle.com/code/fareselmenshawii/optimized-neural-network-from-scratch" style="color:#0072B2">10 - Optimized Neural Network</a>
        </li>
        <li>
            <a href="https://www.kaggle.com/code/fareselmenshawii/neural-network-from-scratch" style="color:#0072B2">11 - Neural Network</a>
        </li>
        <li>
            <a href="https://www.kaggle.com/code/fareselmenshawii/cnn-from-scratch" style="color:#0072B2">12 - CNN</a>
        </li>
        <li>
            <a href="https://www.kaggle.com/code/fareselmenshawii/rnn-from-scratch" style="color:#0072B2">13 - RNN</a>
        </li>
        <li>
            <a href="https://www.kaggle.com/code/fareselmenshawii/lstm-from-scratch" style="color:#0072B2">14 - LSTM</a>
        </li>
        <li>
            <a href="https://www.kaggle.com/code/fareselmenshawii/gru-from-scratch" style="color:#0072B2">15 - GRU</a>
        </li>
    </ul>
</div>