# Neural Network
### Anya Kramar - Kubeflow DevX Team

## Introduction

<ul>
    <li>Mathematical models inspired by the human brain</li>
    <li>Learn patterns from data without explicit programming</li>
</ul>

<h3>Why are Neural Networks Important?</h3>
<ul>
    <li>Solve problems traditional algorithms can't handle</li>
    <li>Power everyday technologies we rely on:
        <ul>
            <li>Voice assistants</li>
            <li>Recommendation systems</li>
            <li>Image recognition</li>
            <li>Language translation</li>
            <li>Self‑driving car technologies</li>
        </ul>
    </li>
    <li>Continue to push AI boundaries forward</li>
</ul>

## Biological Neurons

<p>Neural networks are loosely inspired by the structure and function of the human brain.</p>
<div style="text-align:center;">
    <figure>
        <img src="https://cdn.i-scmp.com/sites/default/files/styles/1020x680/public/d8/images/methode/2020/07/10/ad89450a-c1d5-11ea-8c85-9f30eae6654e_image_hires_194031.JPG?itok=PdRHJEj7&v=1594381242" width="300"/>
    </figure>
</div>

<h3>How Biological Neurons Work</h3>
<ul>
    <li>The human brain contains billions of neurons</li>
    <li>Each neuron is connected to thousands of others</li>
    <li>Key components:
        <ul>
            <li><b>Dendrites:</b> receive signals</li>
            <li><b>Cell Body:</b> processes signals</li>
            <li><b>Axon:</b> transmits signals</li>
            <li><b>Synapses:</b> connections between neurons</li>
        </ul>
    </li>
    <li>When enough input signals arrive in a short time, the neuron “fires,” sending an electrical pulse down the axon</li>
    <li>This signal may then activate other connected neurons</li>
</ul>
<div style="text-align:center;">
    <figure>
        <img src="https://s3-us-west-2.amazonaws.com/courses-images/wp-content/uploads/sites/1223/2017/02/07195441/Figure_35_01_02-1024x687.png" width="600"/>
    </figure>
</div>

<h3>Biological vs. Artificial Neural Networks</h3>
<ul>
    <li>Artificial neural networks are <b>simplified mathematical models</b> of biological neural systems</li>
    <li>The human brain has about 86 billion neurons with trillions of connections</li>
</ul>


## Artifical Neurons

<h3>What is a neuron?</h3>
<p>An artificial neuron is a mathematical function designed to mimic the basic behaviour of a biological neuron.</p>
<ul>
    <li>Takes multiple inputs (features from our data)</li>
    <li>Processes them</li>
    <li>Produces an output value</li>
</ul>
<div style="text-align:center;">
    <figure>
        <img src="https://www.oreilly.com/api/v2/epubs/9781789346565/files/assets/46651759-ddc3-4669-8e1b-827bc63b1eca.png" width="500"/>
    </figure>
</div>
<h3>Components of an Artificial Neuron</h3>
<ul>
    <li>Real‑valued inputs: \(x_1, x_2, \ldots, x_n\)</li>
    <li>Weights for each input: \(w_1, w_2, \ldots, w_n\)</li>
    <li>Bias \(b\)</li>
    <li>Weighted sum:
        \[
            z = b + w_1x_1 + w_2x_2 + \dots + w_nx_n
        \]
    </li>
    <li>Activation function: transforms \(z\) into final output</li>
</ul>
<h3>Weights and Bias</h3>
<ul>
    <li><b>Weights</b> set the importance of each input</li>
    <li><b>Bias</b> lets the neuron fire even when inputs are zero</li>
    <li>Training adjusts weights and biases to minimise prediction errors</li>
</ul>

<h3>Activation Functions</h3>
<li>The neuron then usually applies an <b>activation function</b>, $g$, to the weighted sum, $z$.
        Many activation functions have been proposed, including:
        <ul>
            <li><b>linear activation function</b>: $$g(z) = z$$</li>
            <li><b>step activation function</b>:
                $$g(z) = \left\{ \begin{array}{lr}
                    0 & \mbox{if } z < 0 \\
                    1 & \mbox{if } z \geq 0
                    \end{array}
                  \right.
                $$
            </li>
            <li><b>sigmoid activation function</b>: $$g(z) = \frac{1}{1 + e^{-z}}$$</li>
            <li><b>ReLU activation function</b> (ReLU stands for Rectified Linear Unit): $$g(z) = max(0, z)$$</li>
            <li><b>tanh activation function</b> (tanh is the hyperbolic tangent): $$g(z) = \tanh(z)$$
                <li><b>softmax activation function</b> (multi‑class output):
        $$g(z_i)=\frac{e^{z_i}}{\sum_j e^{z_j}}$$
        produces a vector of values in (0, 1) that sum to 1.
    </li>
        </ul>
    </li>
    <li>Apart from the linear activation function, these activation functions are <b>non-linear</b>, which
        is important to the power of neural networks.
    </li>
</ul>


## Layers of Neurons

<p>Neural networks are built from layers of neurons; each layer processes inputs and passes outputs on.</p>
<div style="text-align:center;">
    <figure>
        <img src="https://cs231n.github.io/assets/nn1/neural_net.jpeg" width="500"/>
    </figure>
</div>

<p>These multi-layer networks have a distinct structure:</p>
<ul>
    <li><b>Input layer</b>: receives raw data</li>
    <li><b>Hidden layers</b>: do most computations and extract features</li>
    <li><b>Output layer</b>: produces final prediction</li>
</ul>
<p>When we talk about a network's <b>depth</b> - we're referring to the number of layers of neurons it contains. This is why modern approaches are called "deep learning" - they use networks with many layers.</p>
<p>What happens between these layers? Under the hood, it's mostly matrix multiplication.</p>

<p>The networks we're discussing now are called <b>layered</b>, <b>dense</b>, <b>feedforward</b> networks.</p>


## Learning of a Neural Network

<li>While we're responsible for deciding the network architecture and hyperparameters, the network itself learns the optimal parameter values:
    <ul>
        <li>The <b>parameters</b> of a neural network are its weights and biases</li>
        <li>These parameters are what get updated during training</li>
    </ul>
</li>

<h3>The Training Process</h3>
        <li><b>Initialization</b>: We start by assigning random small values to all weights and biases</li>
        <li><b>Forward Propagation</b>: Data flows through the network, with each layer performing its calculations</li>
        <li><b>Loss Calculation</b>: We measure how far our predictions are from the true values using a <b>loss function</b></li>
        <li><b>Backpropagation</b>: The error is propagated backward through the network to determine how each weight contributed</li>
        <li><b>Weight Update</b>: We adjust weights and biases to reduce error using an optimization algorithm</li>
    </ul>
<p>We repeat these steps with more data until the model converges to a solution.</p>


<h3>Key Concepts in Neural Network Learning</h3>
<ul>
    <li><b>Loss Function</b>: shows how far predictions are from true values
        <ul>
            <li>Mean Squared Error (MSE) for regression</li>
            <li>Cross‑Entropy Loss for classification</li>
        </ul>
    </li>
    <li>Optimizer that moves weights toward lower error
        <ul>
            <li>Learning rate sets the step size</li>
            <li>If too high, training can diverge</li>
            <li>If too low, training is slow</li>
        </ul>
    </li>
    <li><b>Epochs</b>: one full pass through the training set</li>
    <li><b>Batch Size</b>: number of samples per update step</li>
    <li><b>Overfitting</b>: great on training data but poor on new data</li>
    <li><b>Learning Approaches</b>:
        <ul>
            <li><b>Supervised Learning</b>: learn from labelled examples</li>
            <li><b>Unsupervised Learning</b>: find structure without labels</li>
            <li><b>Reinforcement Learning</b>: learn by interacting with an environment</li>
        </ul>
    </li>
</ul>

## Types of Neural Networks

<li><b>Feedforward Networks</b>:
    <ul>
        <li>Information flows in only one direction - from input to output</li>
        <li>They have no "memory" of previous inputs</li>
        <li>They form the foundation that all other architectures build upon</li>
    </ul>
</li>

<li><b>Single-layer Perceptron</b>:
    <ul>
        <li>It consists of just one layer of neurons</li>
        <li>It can only solve <b>linearly separable</b> problems</li>
    </ul>
</li>

<li><b>Multilayer Perceptrons (MLPs)</b>:
    <ul>
        <li>By adding hidden layers with non-linear activation functions, these networks overcame the limitations of single-layer perceptrons</li>
    </ul>
</li>

<li><b>Convolutional Neural Networks (CNNs)</b> :
    <ul>
        <li>Designed for image processing</li>
        <li>Uses convolutional layers to automatically learn hierarchical features from input images</li>
    </ul>
</li>

<li><b>Recurrent Neural Networks (RNNs)</b>:
    <ul>
        <li>Unlike feedforward networks, RNNs have connections that form cycles</li>
        <li>These feedback loops allow information to persist, giving the network a form of "memory"</li>
        <li>This makes them ideal for tasks where context and order matter</li>
    </ul>
</li>

<li><b>Long Short-Term Memory (LSTM)</b>:
    <ul>
        <li>LSTMs are a special type of RNN with a more complex cell structure</li>
        <li>They use ingenious "gates" to control what information to remember, what to update, and what to output</li>
        <li>This allows them to capture long-range dependencies much more effectively</li>
        <li>They're important for language processing, speech recognition, and time series forecasting</li>
    </ul>
</li>

## Implementation of Neural Networks

In [204]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

from tensorflow.keras import Model, Input
from tensorflow.keras.layers import Dense, Normalization, Rescaling
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.utils import to_categorical

<h2>Introduction</h2>
<ul>
    <li>We load the <b>Titanic</b> dataset with <code>seaborn.load_dataset</code>, shuffle it,
        split 80 % train / 20 % test, and convert to NumPy arrays.</li>
    <li>We will reuse the same DataFrame but choose different targets for each task:
        <ul>
            <li><b>Regression:</b> predict <code>fare</code></li>
            <li><b>Binary classification:</b> predict <code>survived</code></li>
            <li><b>Multi‑class classification:</b> predict passenger class (<code>pclass</code>)</li>
        </ul>
    </li>
    <li>Categorical inputs (<code>sex</code> and <code>embarked</code>) are one‑hot encoded with <code>pd.get_dummies</code>.</li>
</ul>


In [205]:
# Load, shuffle, split once
titanic_df = sns.load_dataset("titanic").dropna(subset=["age", "fare", "embarked"])
titanic_df = titanic_df.sample(frac=1, random_state=2).reset_index(drop=True)

train_df, test_df = train_test_split(titanic_df, train_size=0.8, random_state=2)

# One‑hot encode categorical columns (drop_first avoids redundancy)
train_df = pd.get_dummies(train_df, columns=["sex", "embarked"], drop_first=True)
test_df  = pd.get_dummies(test_df,  columns=["sex", "embarked"], drop_first=True)

In [206]:
train_df.head(10)

Unnamed: 0,survived,pclass,age,sibsp,parch,fare,class,who,adult_male,deck,embark_town,alive,alone,sex_male,embarked_Q,embarked_S
544,0,3,61.0,0,0,6.2375,Third,man,True,,Southampton,no,True,True,False,True
440,0,3,33.0,0,0,7.8958,Third,man,True,,Southampton,no,True,True,False,True
13,0,1,65.0,0,0,26.55,First,man,True,E,Southampton,no,True,True,False,True
486,1,1,35.0,1,0,90.0,First,woman,False,C,Southampton,yes,False,False,False,True
411,0,2,28.0,0,0,13.5,Second,man,True,,Southampton,no,True,True,False,True
40,0,3,8.0,4,1,29.125,Third,child,False,,Queenstown,no,False,True,True,False
205,0,2,36.0,0,0,13.0,Second,man,True,,Southampton,no,True,True,False,True
65,0,2,30.0,0,0,13.0,Second,man,True,,Southampton,no,True,True,False,True
72,1,2,36.0,0,0,13.0,Second,woman,False,D,Southampton,yes,True,False,False,True
304,0,2,70.0,0,0,10.5,Second,man,True,,Southampton,no,True,True,False,True


<h1>A Neural Network for Regression</h1>
<ul>
    <li>Task: estimate a passenger’s <code>fare</code> from simple attributes.</li>
    <li>Architecture:
        <ul>
            <li>Input layer with 7 inputs (<i>age, sibsp, parch, pclass, sex_male, embarked_Q, embarked_S</i>).</li>
            <li>Two hidden layers, 64 neurons each, ReLU activation.</li>
            <li>Single linear output neuron.</li>
        </ul>
    </li>
</ul>

In [207]:
reg_features = ["age", "sibsp", "parch", "pclass", "sex_male", "embarked_Q", "embarked_S"]

train_X = train_df[reg_features].to_numpy(dtype="float32")
test_X  = test_df[reg_features].to_numpy(dtype="float32")
train_y = train_df["fare"].to_numpy(dtype="float32")
test_y  = test_df["fare"].to_numpy(dtype="float32")

In [208]:
inputs = Input(shape=(len(reg_features),))
x = Normalization()(inputs)
x = Dense(64, activation="relu")(x)
x = Dense(64, activation="relu")(x)
outputs = Dense(1, activation="linear")(x)
fare_model = Model(inputs, outputs)

fare_model.compile(optimizer=RMSprop(0.001), loss="mse", metrics=["mae"])

In [209]:
fare_model.fit(train_X, train_y, epochs=10, batch_size=32)

Epoch 1/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 499us/step - loss: 3535.4402 - mae: 30.3641
Epoch 2/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 542us/step - loss: 2787.0029 - mae: 26.5208
Epoch 3/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 597us/step - loss: 3438.6038 - mae: 29.0416
Epoch 4/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 488us/step - loss: 4510.0522 - mae: 32.5407
Epoch 5/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 455us/step - loss: 2245.5208 - mae: 26.7276
Epoch 6/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 410us/step - loss: 3148.0759 - mae: 30.5827
Epoch 7/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 405us/step - loss: 2739.0881 - mae: 29.3444
Epoch 8/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 348us/step - loss: 2828.3499 - mae: 27.6538
Epoch 9/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━

<keras.src.callbacks.history.History at 0x37be91c40>

In [210]:
test_loss, test_mae = fare_model.evaluate(test_X, test_y)
test_mae

[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 868us/step - loss: 3650.2295 - mae: 30.2567


27.988588333129883

<h1>A Neural Network for Binary Classification</h1>
<ul>
    <li>Task: predict whether a passenger <code>survived</code> (0 or 1).</li>
    <li>Same input attributes as before.</li>
    <li>Output layer: one neuron with sigmoid activation.</li>
    <li>We'll scale inputs inside the model with a <code>Normalization</code> layer
        to avoid data‑leakage worries.</li>
</ul>

In [211]:
bin_features = reg_features

train_Xb = train_df[bin_features].to_numpy(dtype="float32")
test_Xb  = test_df[bin_features].to_numpy(dtype="float32")
train_yb = train_df["survived"].to_numpy(dtype="float32")
test_yb  = test_df["survived"].to_numpy(dtype="float32")

In [212]:
inputs = Input(shape=(len(bin_features),))
x = Normalization()(inputs)
x = Dense(64, activation="relu")(x)
x = Dense(64, activation="relu")(x)
outputs = Dense(1, activation="sigmoid")(x)
surv_model = Model(inputs, outputs)

surv_model.compile(optimizer=RMSprop(0.001), loss="binary_crossentropy", metrics=["accuracy"])

In [213]:
surv_model.fit(train_Xb, train_yb, epochs=10, batch_size=32)

Epoch 1/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 686us/step - accuracy: 0.5411 - loss: 1.3075
Epoch 2/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 450us/step - accuracy: 0.6194 - loss: 0.6495
Epoch 3/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 533us/step - accuracy: 0.6498 - loss: 0.6162
Epoch 4/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 420us/step - accuracy: 0.6774 - loss: 0.6022
Epoch 5/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 431us/step - accuracy: 0.6912 - loss: 0.6073
Epoch 6/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 333us/step - accuracy: 0.7055 - loss: 0.5777
Epoch 7/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 363us/step - accuracy: 0.6987 - loss: 0.5973
Epoch 8/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 444us/step - accuracy: 0.7054 - loss: 0.5953
Epoch 9/10
[1m18/18[0m [32m━━━━━━━━━━

<keras.src.callbacks.history.History at 0x381df7640>

In [214]:
test_loss, test_acc = surv_model.evaluate(test_Xb, test_yb)
test_acc

[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 771us/step - accuracy: 0.7887 - loss: 0.5455


0.7762237787246704

<h1>A Neural Network for Multi‑Class Classification</h1>
<ul>
    <li>Task: predict passenger class (<code>pclass</code> = 1, 2 or 3).</li>
    <li>Same 7 inputs.</li>
    <li>Output layer: three neurons with softmax activation.</li>
</ul>

In [215]:
multi_features = reg_features

train_Xm = train_df[multi_features].to_numpy(dtype="float32")
test_Xm  = test_df[multi_features].to_numpy(dtype="float32")
train_ym = (train_df["pclass"] - 1).to_numpy(dtype="float32")
test_ym  = (test_df["pclass"] - 1).to_numpy(dtype="float32")

In [216]:
inputs = Input(shape=(len(multi_features),))
x = Normalization()(inputs)
x = Dense(64, activation="relu")(x)
x = Dense(64, activation="relu")(x)
outputs = Dense(3, activation="softmax")(x)
class_model = Model(inputs, outputs)

class_model.compile(optimizer=RMSprop(0.001), loss="sparse_categorical_crossentropy", metrics=["accuracy"])

In [217]:
class_model.fit(train_Xm, train_ym, epochs=10, batch_size=32)

Epoch 1/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 551us/step - accuracy: 0.3659 - loss: 2.5451
Epoch 2/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 416us/step - accuracy: 0.5325 - loss: 1.0109
Epoch 3/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 438us/step - accuracy: 0.6353 - loss: 0.8401
Epoch 4/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 356us/step - accuracy: 0.6628 - loss: 0.7536
Epoch 5/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 382us/step - accuracy: 0.6696 - loss: 0.7378
Epoch 6/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 313us/step - accuracy: 0.6504 - loss: 0.7696
Epoch 7/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 332us/step - accuracy: 0.6849 - loss: 0.6758
Epoch 8/10
[1m18/18[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 324us/step - accuracy: 0.6918 - loss: 0.6545
Epoch 9/10
[1m18/18[0m [32m━━━━━━━━━━

<keras.src.callbacks.history.History at 0x37bf13dc0>

In [218]:
test_loss, test_acc = class_model.evaluate(test_Xm, test_ym)
test_acc

[1m5/5[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 769us/step - accuracy: 0.7522 - loss: 0.5410


0.7762237787246704

## Applications of Neural Networks

<ol>
    <li><b>Image and Video Recognition</b>: CNNs are extensively used in applications such as facial recognition, autonomous driving, and medical image analysis.</li>
    <li><b>Natural Language Processing (NLP)</b>: RNNs and transformers power language translation, chatbots, and sentiment analysis.</li>
    <li><b>Finance</b>: Predicting stock prices, fraud detection, and risk management.</li>
    <li><b>Healthcare</b>: Neural networks assist in diagnosing diseases, analyzing medical images, and personalizing treatment plans.</li>
    <li><b>Gaming and Autonomous Systems</b>: Neural networks enable real-time decision-making, enhancing user experience in video games and enabling autonomous systems like self-driving cars.</li>
</ol>