In [42]:
import warnings
from sklearn.exceptions import ConvergenceWarning

# Ignore the ConvergenceWarning
warnings.filterwarnings("ignore", category=ConvergenceWarning)


In [43]:
pip install -r requirements.txt

Note: you may need to restart the kernel to use updated packages.


# Artificial Neural Networks 

## About this notebook

This notebook kernel was created to help you understand more about machine learning. I intend to create tutorials with several machine learning algorithms from basic to advanced. I hope I can help you with this data science trail. For any information, you can contact me through the link below.

Contact me here: https://www.linkedin.com/in/vitorgamalemos/

## Introduction 

<img src="https://media.springernature.com/original/springer-static/image/art%3A10.1007%2Fs40846-016-0191-3/MediaObjects/40846_2016_191_Fig1_HTML.gif">

<p style="text-align: justify;">Artificial Neural Networks are mathematical models inspired by the human brain, specifically the ability to learn, process, and perform tasks. The Artificial Neural Networks are powerful tools that assist in solving complex problems linked mainly in the area of combinatorial optimization and machine learning. In this context, artificial neural networks have the most varied applications possible, as such models can adapt to the situations presented, ensuring a gradual increase in performance without any human interference. We can say that the Artificial Neural Networks are potent methods can give computers a new possibility, that is, a machine does not get stuck to preprogrammed rules and opens up various options to learn from its own mistakes.</p>

## Biologic Model

<img src="https://www.neuroskills.com/images/photo-500x500-neuron.png">
<p style="text-align: justify;">Artificial neurons are designed to mimic aspects of their biological counterparts. The neuron is one of the fundamental units that make up the entire brain structure of the central nervous system; such cells are responsible for transmitting information through the electrical potential difference in their membrane. In this context, a biological neuron can be divided as follows.</p>

**Dendrites** – are thin branches located in the nerve cell. These cells act on receiving nerve input from other parts of our body.

**Soma** – acts as a summation function. As positive and negative signals (exciting and inhibiting, respectively) arrive in the soma from the dendrites they are added together.

**Axon** – gets its signal from the summation behavior which occurs inside the soma. It is formed by a single extended filament located throughout the neuron. The axon is responsible for sending nerve impulses to the external environment of a cell.

## Artificial Neuron as Mathematic Notation
In general terms, an input X is multiplied by a weight W and added a bias b producing the net activation. 
<img style="max-width:60%;max-height:60%;" src="https://miro.medium.com/max/1290/1*-JtN9TWuoZMz7z9QKbT85A.png">

We can summarize an artificial neuron with the following mathematical expression:
$$
\hat{y} = f\left(\text{net}\right)= f\left(\vec{w}\cdot\vec{x}+b\right) = f\left(\sum_{i=1}^{n}{w_i x_i + b}\right)
$$

## The Singlelayer Perceptron

<p style="text-align: justify;">The Perceptron and its learning algorithm pioneered the research in neurocomputing. the perceptron is an algorithm for supervised learning of binary classifiers [1]. A binary classifier is a function which can decide whether or not an input, represented by a vector of numbers, belongs to some specific class. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.<p>
    
<img src="https://www.edureka.co/blog/wp-content/uploads/2017/12/Perceptron-Learning-Algorithm_03.gif">
    
#### References
    
- Freund, Y.; Schapire, R. E. (1999). "Large margin classification using the perceptron algorithm" (PDF). Machine Learning

- Aizerman, M. A.; Braverman, E. M.; Rozonoer, L. I. (1964). "Theoretical foundations of the potential function method in pattern recognition learning". Automation and Remote Control. 25: 821–837.
 
- Mohri, Mehryar and Rostamizadeh, Afshin (2013). Perceptron Mistake Bounds.

Source: https://www.kaggle.com/code/vitorgamalemos/perceptron-neural-network

## Training a Perceptron

We are going to use a Perceptron from the `sklearn` library to build a simple classification model. We make use of a MultiLayerPerceptron as the name suggest in default it has multiple layers (100). In this excercise we bring this down to a perceptron which is basically a MLP with only one layer, so the variable 'hidden_layer_sizes' will be 1.  
Before training, we must prepare the data by applying Standard Scaling (to normalize the data) and One-Hot Encoding (to convert categorical values into a format suitable for the model).  

##### 1. Data Preprocessing
- One-Hot Encoding converts categorical features into a numerical format by creating binary columns for each category.
- Standard scaling transforms features to have a mean of 0 and a standard deviation of 1. This normalization step is crucial for models like Perceptron, which are sensitive to varying scales in input data. Without scaling, features with larger magnitudes can dominate the learning process, leading to poor model performance.

##### 2. Splitting the Data
Like with Decision Trees or Random Forests, we split our dataset into training and test sets to evaluate model performance.

##### 3. Training the Perceptron
We initialize the Perceptron model and train it using gradient descent.  
The two key parameters we start fine-tuning with are:

###### Learning Rate (`eta0`)
- Controls how much the model updates its weights during training.
- **High learning rate:** Fast convergence, but may overshoot optimal weights.
- **Low learning rate:** More stable, but slow training.

###### Epochs (`max_iter`)
- Defines the number of passes over the dataset during training.
- More epochs allow the model to learn better, but too many can cause overfitting.

##### 4. Fine-Tuning the Model
After training, we can fine-tune hyperparameters like:
- Learning rate (`eta0`)
- Epochs (`max_iter`)

By adjusting these parameters, we aim to improve accuracy and generalization.

In [44]:
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

In [45]:
# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target

In [46]:
# Preprocess the data (scale the features)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

In [47]:
# Split the data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

#### Iteration 1

In [48]:
# Set the learning rate, max_iter, and hidden layer sizes as specified
learning_rate = 0.001
max_iter = 5 # from 5 to 40
hidden_layer_sizes = 1

# Initialize the MLPClassifier with the given hyperparameters
model = MLPClassifier(
    hidden_layer_sizes=hidden_layer_sizes,
    learning_rate_init=learning_rate,
    max_iter=max_iter,
    random_state=42
)

# Train the model
model.fit(X_train, y_train)

# Predict on the training set
y_train_pred = model.predict(X_train)

# Predict on the test set
y_test_pred = model.predict(X_test)

# Calculate accuracy for both training and test sets
train_accuracy = accuracy_score(y_train, y_train_pred)
test_accuracy = accuracy_score(y_test, y_test_pred)

# Print the results
print(f"Training Accuracy: {train_accuracy:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")


Training Accuracy: 0.3583
Test Accuracy: 0.3333


#### Iteration 2

In [49]:
# Set the learning rate, max_iter, and hidden layer sizes as specified
learning_rate = 0.05 # from 0.001 to 0.1 like 0.01 or 0.05 
max_iter = 4 
hidden_layer_sizes = 1

# Initialize the MLPClassifier with the given hyperparameters
model = MLPClassifier(
    hidden_layer_sizes=hidden_layer_sizes,
    learning_rate_init=learning_rate,
    max_iter=max_iter,
    random_state=42
)

# Train the model
model.fit(X_train, y_train)

# Predict on the training set
y_train_pred = model.predict(X_train)

# Predict on the test set
y_test_pred = model.predict(X_test)

# Calculate accuracy for both training and test sets
train_accuracy = accuracy_score(y_train, y_train_pred)
test_accuracy = accuracy_score(y_test, y_test_pred)

# Print the results
print(f"Training Accuracy: {train_accuracy:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")


Training Accuracy: 0.4583
Test Accuracy: 0.4333


#### Iteration 3

In [50]:
# Set the learning rate, max_iter, and hidden layer sizes as specified
learning_rate = 0.1
max_iter = 50
hidden_layer_sizes = 1

# Initialize the MLPClassifier with the given hyperparameters
model = MLPClassifier(
    hidden_layer_sizes=hidden_layer_sizes,
    learning_rate_init=learning_rate,
    max_iter=max_iter,
    random_state=42
)

# Train the model
model.fit(X_train, y_train)

# Predict on the training set
y_train_pred = model.predict(X_train)

# Predict on the test set
y_test_pred = model.predict(X_test)

# Calculate accuracy for both training and test sets
train_accuracy = accuracy_score(y_train, y_train_pred)
test_accuracy = accuracy_score(y_test, y_test_pred)

# Print the results
print(f"Training Accuracy: {train_accuracy:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")


Training Accuracy: 0.9750
Test Accuracy: 0.9667


### Portfolio assignment 20
30 min: Train a perceptron to predict the number of the MNIST dataset.
- Fit a Perceptron model (keep hidden_layer_sizes=1) using the images in de fetch openml dataset.
- Change the learning_rate and max_iter to find the 'right fit'.
- Use your perceptron to make predictions for both the train and test set.<br>

In [51]:
from sklearn.datasets import fetch_openml
import seaborn as sns

# Load the MNIST dataset
mnist = fetch_openml('mnist_784')
X = mnist.data
y = mnist.target.astype(int)
print(X.shape)
print(y.shape)

(70000, 784)
(70000,)


![](https://i.imgur.com/0v1CGNV.png)<br>
- Fine-tune the learning rate and epochs (max iterations).
- Calculate the accuracy for both the train set predictions and test set predictions, what happens per iteration?
- Is the accurracy different? Did you expect this difference?

Optional: Perform the same tasks but change the hidden_layer_sizes <br>

Findings: ...<br>

### Portfolio assignment 21
30 min: Train a perceptron to predict one of the categorical columns of your own dataset.
- Prepare the data:<br>
    - <b>Note</b>: Some machine learning algorithms can not handle missing values. You will either need to: 
         - replace missing values (with the mean or most popular value). For replacing missing values you can use .fillna(\<value\>) https://pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html
         - remove rows with missing data.  You can remove rows with missing data with .dropna() https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html <br>
    - <b>Note</b>: Some machine learning algorithms can not handle categorical values. You will either need to:
        -  To handle categorical data, you can use One-Hot Encoding to convert categories into binary columns with .get_dummies(<value(s)>). This creates a new column for each category
- Split your dataset into training (70%) and testing (30%) sets. 
- Use your Perceptron to make predictions for both the train and test set.<br>
<br>

![](https://i.imgur.com/0v1CGNV.png)<br>
- Fit a Perceptron model using your own selected feature columns.
- Fine-tune the learning rate and epochs (max iterations). 
- Calculate the accuracy for both the train set predictions and test set predictions, what happens per iteration?
- Is the accurracy different? Did you expect this difference?

Optional: Perform the same tasks but change the hidden_layer_sizes <br>

Findings: ...<br>