## Building and improving a Multi-layer percepton : Feed-forward neural network using real-word data
## Overview

The dataset is Electronic Health Record Predicting collected from a private Hospital in Indonesia. It contains the patients laboratory test results used to determine next patient treatment whether in care or out care patient.



This workshop , is gonna be divided into 5 steps :



1.   Exploratory Data Analysis (EDA)
2.   Feature Engineering
3.   Building our MLP Model
3.   Attemping to increase accuracy and model performace with Feature Selection
4.   Attemping to increase accuracy and model performance with Feature Scaling







In [None]:
import numpy as np
import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from matplotlib import pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neural_network import MLPClassifier
import seaborn as sns
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

from sklearn.metrics import accuracy_score, f1_score ,classification_report


import warnings
warnings.filterwarnings(action='ignore')

# Data Preprocessing
## Exploratory Data Analysis

In the next following steps , we are gonna apply simples concepts of data analysis.
In this part we are gonna use libraries such as pandas , matplotlib , numpy.
The main goal we to summarize their main importance
It is essential before every decision-making to have a look at your dataset and understanding each attributes , this includes :

*   Attribute Name and Type
*   Measurement unit

**Important note :**

A feature is an individual measurable property within a recorded dataset. In machine learning and statistics, features are often called “variables” or “attributes.” Relevant features have a correlation or bearing (called feature importance) on a model's use case.

### Reading data

To be able to read data from our dataset file , we import a library called pandas.
This library allow us to not only read the file but to display the information that comes with it.

By using **.head()** , it returs by default the first 5 rows of our dataset.

When using **.head(n)** , it returns the first n rows of our dataset.






In [None]:
patient_data = pd.read_csv('data-ori.csv')

patient_data.head() # return the first 5 rows by default

In [None]:
patient_data.head(3) # return the first n rows by default

The next step would be displayig , the info of our dataset attributes to have a better perpestive of how the data is organized.

When using **.info()** , it returns a consice summary of our dataset.



In [None]:
patient_data.info()

After looking at our dataset summary , we can observe the following attributes :

* **HAEMOTOCRIT** - Patient laboratory test result of haematocrit;
* **HAEMOGLOBINS** - Patient laboratory test result of haemoglobins;
* **ERYTHROCYTE** - Patient laboraty test result of erythrocyte;
* **LEUCOCYTE** - Patient laboratory test result of leucocyte;
* **THROMOBOCYTE** - Patient laboratory test result of thrombocyte;
* **MCH** - Patient laboratory test result of MCH , it shows the average amount of hemoglobin in a cell;
* **MCHC** - Patient laboratory test result of MCHC , it shows the measurement of the amount of hemoglobin a red blood cell has relative to the size of the cell;
* **MCV** - Patient laboratory test result of MCV , it shows measures the average size of your red blood cells;
* **AGE** - Patient Age;
* **SEX** - Patient Gender;
* **SOURCE** - Patient In Care or Out Care.

## Feature Engineering



There are a few essential generalise key steps we need to follow before looking at our neural network, applying feature engineering alows to extract features from raw data with some domain knowldge
In this specefic workshop we are gonna go through the following steps :



1.   Identify if there are NaN Values
2.   Identify the number of distinc elements




When using **isna()** and **.sum()** , we are checking if in our dataset there are any NaN values.
NaN values stand for Not a Number , and its one of the major problems of data analysis.


In [None]:
patient_data.isna().sum()

When using **.nunique()** , we are checking the number of distinct elements in a specific axes.

In [None]:
patient_data.nunique()

### Visualing statistical data of our dataset

When using **.describe()** , it returns a statistical summary of our dataset.



*   **count** -> corresponds to the number of non-NA/null observations
*   **mean** -> corresponds to the mean of the observations (the sum of all values divided by the total number of values)
*   **std** -> corresponds to the standard deviation of the observations (measure of the amount of variation of a random variable expected about its mean)


  



In [None]:
patient_data.describe()

## Categorical Data Encoding

When looking at our dataset , attributes type , we can see that both 'SOURCE' and 'SEX' , have **object** as a Type.
When dealing with
Deep learning neural networks require that input and output variables are numbers meaning that categorical data must be encoded to numbers before we can use it to fit and evaluate a model.
In this case we are gonna use a binary enconding startegy.


In this case , we are gonna attribute 'In' to the numeric value of 1 (True) and 'Out' to the numeric value of 0 (False).

In [None]:
patient_data['SOURCE'] = patient_data.SOURCE.replace({"in":1, 'out':0})

In [None]:
patient_data['SEX'] = patient_data['SEX'].replace({'F': 0, 'M': 1})

After applying our data transformation , we can analyse the outcome of it.
When using **.value_counts()** , it returns the different type of possible values present in that column and the ammount of rows it is present

In [None]:
patient_data.SOURCE.value_counts()

## Plotting our data

After dealing with data management , we can plot our data to have a better inpretation of how it behaves.

In [None]:

slices = [patient_data.shape[0] - patient_data['SOURCE'].sum() , patient_data['SOURCE'].sum()]
labels = ['In-care', 'Out-care']
explode = [0, 0.1]

plt.pie(slices, labels=labels, explode=explode, shadow=True,
        startangle=90, autopct='%1.1f%%',
        wedgeprops={'edgecolor': 'black'})

plt.title("Patient Classification")
plt.show()

In [None]:
patient_data.SEX.value_counts() ## Returns the number of enteties , on the Gender

In [None]:
patient_data.SEX.value_counts().plot(kind='bar')

## Patient Classfication

To be able to predict a patient classification , we need to define that attribute as our target.
The same attribute will be separated from the rest of the others.


In [None]:
feat = [f for f in patient_data.columns if f !='SOURCE']

y = patient_data['SOURCE']
x = patient_data.drop('SOURCE', axis=1)

In [None]:
print(f"The dataset contains {patient_data.shape[0]} rows and {patient_data.shape[1]} columns")

num_feat = [f for f in feat if patient_data[f].dtype != object]
cat_feat = [f for f in feat if patient_data[f].dtype == object]

print(f"Total number of features : {len(feat)}")
print(f"Number of numerical features : {len(num_feat)}")
print(f"Number of categorical features : {len(cat_feat)}\n")

patient_data.info()

## Attribute Behaviour

Plotting each attribute behaviour according to our dataset

In [None]:
fig, axes = plt.subplots(9, 1, figsize=(8, 25))
for i, c in enumerate(num_feat):
    f = patient_data[[c]].plot(kind='kde',ax=axes[i])

## Data Training/Testing (70%/30%)


In Deep Learning, training and testing data are essential components used to develop and evaluate predictive models. Here's a brief explanation of each:

Training Data:

Training data is used to train a machine learning model. During training, the model learns patterns, relationships, and features from the input data to make predictions or classifications.
It consists of a labeled dataset, where both the input features and the corresponding correct output (target) are known. The model adjusts its parameters based on this labeled data to minimize the difference between its predictions and the actual outcomes.
The model iteratively updates its internal parameters through optimization algorithms, adjusting its behavior to capture the underlying patterns in the training data.


Testing Data:

Testing data is used to assess the model's performance and generalization ability. Once the model is trained, it is evaluated on this separate dataset to measure how well it can make predictions on new, unseen data.
Similar to training data, testing data includes input features, but the corresponding target values are kept hidden during the evaluation process. This allows the model to be tested on its ability to generalize and make accurate predictions for new, previously unseen examples.
The model's predictions on the testing data are compared against the actual outcomes to calculate performance metrics such as accuracy, precision, recall, and F1 score. These metrics help assess how well the model is expected to perform on new, real-world data.

In [None]:
x_train,x_test,y_train,y_test = train_test_split(x,y,train_size = 0.7,
                                                     shuffle = True,random_state = 1)

# Multi-layer preception neural network

A Multi-Layer Perceptron (MLP) is a type of artificial neural network with multiple layers of nodes. It's a feedforward network, processing information from input to output without loops. MLPs are used for tasks like classification and regression. Key components include:

Input Layer: Represents input data features, with each node corresponding to a feature.

Hidden Layers: Layers between input and output, applying weights, biases, and activation functions. Number and size of hidden layers are adjustable parameters.

Weights and Bias: Connections between nodes have weights, learned during training. Nodes have bias terms for capturing non-origin patterns.

Activation Function: Applies non-linearity to node outputs, like sigmoid, tanh, or ReLU, enabling the network to learn complex relationships.

Feedforward Process: Input data passes through layers, with nodes processing input, applying weights, adding bias, and passing through activation functions to produce output.

Output Layer: Produces final results based on task; e.g., one node with sigmoid for binary classification or multiple nodes with softmax for multiclass.

Training: Adjusts weights and biases using backpropagation, minimizing the difference between predicted and actual values.

Loss Function: Measures the difference between predicted and actual values; goal during training is to minimize this difference.

MLPs are versatile neural networks suitable for diverse machine learning tasks, capable of learning intricate data relationships through non-linear processing.



In [None]:
mlp=MLPClassifier(hidden_layer_sizes=(150,100,50), max_iter=300,activation = 'relu',solver='adam',random_state=1)
mlp.fit(x_train, y_train)

y_pred = mlp.predict(x_test)



# Accuracy Score:

The accuracy curve , is also knwon as the training accuracy curve , show us how good the model is at making correct predictions on the training data as it goes through the trainign process.

It is measure in percentages and gives us the proportion of instances the model correctly classified out of the total number of instances.

The accuracy curve gives us a sense of how well the model fits the training data and improves its ability to make acccurate predictions

In [None]:
# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")


# F1 Score :

As showing above there are a number of metrics useful for measuring performance of classification models.
Accuracy is realiable as long as your dataset has an equal number of samples for each class.

When having for example 98% of instances from Class A ( yes ) and 2% of samples from Class B ( no). Regardless of the instance of this type of dataset , the accuracy will still be 98%.

When talking about imbalanced data , F1 score comes to rescue . It takes into account the type of errors - false positive and false negative - and not just the number of predictions that were incorrect.

F1 scores can range from 0 to 1 , with 1 representing a model that perfectly classifies each observation into the correct class and 0 the opposite.

F1 score is usefull to usi in a classifciation problem as it balances precison and recal.

In Healthcare , high F1 scores indicates that the model is good at identifiying both positive and negative cases , minimazing misdiagnosis and ensure patients receive proper treatment.

In [None]:

class_report = classification_report(y_test, y_pred)
print("Classification Report:\n", class_report)


In [None]:
f1 = f1_score(y_test,y_pred)
print(" F1 score : {:.5f}".format(f1))

# Attemping to improve our model accuracy
In deep learning, feature selection is used to make the process more accurate. It also increases the prediction power of the algorithms by selecting the most critical variables and eliminating the redundant and irrelevant ones. This is why feature selection is important.

Three key benefits of feature selection are:
  

*   Decreases over-fitting  
*   Fewer redundant data means fewer chances of making decisions based on noise
* Improves Accuracy  
* Reduces Training Time  



## Plotting the relationship between target (Y) and mean of each numerical features

In [None]:
  # Relationship between target and mean of each numerical features

fig, axes = plt.subplots(5,2, figsize=(14,24))
axes = [x for axes_row in axes for x in axes_row]
for i,c in enumerate(patient_data[num_feat]):
    df = patient_data.groupby("SOURCE")[c].mean()
    plot = df.plot(kind='bar', title=c, ax=axes[i], ylabel=f'Mean {c}')

## Pearson Correlation

The Pearson correlation coefficient (r) is the most common way of measuring a linear correlation. It is a number between –1 and 1 that measures the strength and direction of the relationship between two variables.

In [None]:
corr_matt = patient_data.corr()[['SOURCE']].sort_values(by='SOURCE',ascending=False)
plt.figure(figsize=(3,5))
corr = sns.heatmap(corr_matt, annot=True, cmap='BrBG', cbar=False)


## Selecting the features you want to remove

In [None]:
# Remove least correlated features

features_to_remove = ['MCHC','MCH','MCV']


In [None]:
x_train.drop(features_to_remove, axis=1, inplace=True)
x_test.drop(features_to_remove, axis=1, inplace=True)

for features in features_to_remove :
  num_feat.remove(features)
# final train set
x_train.head()

## Proving if our model accuracy increase with feature selection

In [None]:
mlp=MLPClassifier(hidden_layer_sizes=(50,50), activation='relu', solver='adam', alpha=0.0001,
                    learning_rate='adaptive', learning_rate_init=0.001, power_t=0.5, max_iter=400,
                    random_state=0, tol=0.0001, verbose=10, warm_start=True,
                    momentum=0.9, nesterovs_momentum=True, validation_fraction=0.2,
                    beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=100)
mlp.fit(x_train, y_train)

# Make predictions on the test data
y_pred = mlp.predict(x_test)


In [None]:
# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

In [None]:
class_report = classification_report(y_test, y_pred)
print("Classification Report:\n", class_report)

In [None]:
f1 = f1_score(y_test,y_pred)
print(" F1 score : {:.5f}".format(f1))

## Improving model performance (f1 score) by applying feature scaling


Feature scaling in deep learning refers to the process of standardizing or normalizing the input features of a neural network. The goal is to ensure that all features contribute equally to the model's learning process and prevent certain features from dominating others due to differences in their scales. Common techniques for feature scaling include:

Suppose we create the StandarScaler object, and then we perform .fit_transform(). It will calculate the mean(μ)and standard deviation(σ) of the feature F at a time it will transform the data points of the feature F.

In [None]:
scaler = StandardScaler()
scaler.fit(x_train)
x_train[num_feat] = scaler.fit_transform(x_train[num_feat]) #fit and transform the train set
x_test[num_feat] = scaler.transform(x_test[num_feat]) #transform the test test

In [None]:
x_train.head()

## Building our final model

To finish our workshop , we are gonna check if we were abble to improve our accuracy and model performance.
Let´s start building our final MLP model.
In this workshop you were able to apply the different techniques :

*   Feature engineering
*   Feature Selection
*   Feature Scaling







In [None]:
mlp=MLPClassifier(hidden_layer_sizes=(50,50), activation='relu', solver='adam', alpha=0.0001, batch_size=1000,
                    learning_rate='adaptive', learning_rate_init=0.001, power_t=0.5, max_iter=200,
                    shuffle=True, random_state=0, tol=0.0001, verbose=10, warm_start=True,
                    momentum=0.9, nesterovs_momentum=True, validation_fraction=0.2,
                    beta_1=0.9, beta_2=0.999, epsilon=1e-08, n_iter_no_change=100)
mlp.fit(x_train, y_train)

# Make predictions on the test data
y_pred = mlp.predict(x_test)

In [None]:
# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")

In [None]:
class_report = classification_report(y_test, y_pred)
print("Classification Report:\n", class_report)

In [None]:
f1 = f1_score(y_test,y_pred)
print(" F1 score : {:.5f}".format(f1))

## Conclusion

In our first model we achieved :


*   Accuracy Score : **70% or 0.70**
*   Model Performance (f1 score): **0.47632**

After applying feature selection and feature scaling :


*   Accuracy Score : **73% or 0.73**
*   Model Performance (f1 score) : **0.65354**



