<a href="https://colab.research.google.com/github/lugsantistebanji/WCS-IA/blob/main/WCS_IA_Building_Perceptrons_with_Scikit_Learn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Diabetese Detection Models

This [dataset](https://raw.githubusercontent.com/mansont/datasets-tests/main/diabetese.csv) contains patient data and their diabetese condition: "1" they have diabetes, "0" they do not have diabetese.


Build the following models and compare their performance:
* A logistic regression model
* A single-layer perceptron model
* A multilayer perceptron

__IMPORTS__

In [None]:
# Your code here
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd

__DATAFRAME DATA__

In [None]:
url_csv = "https://raw.githubusercontent.com/mansont/datasets-tests/main/diabetese.csv"

df = pd.read_csv(url_csv)

In [None]:
df.head()

Unnamed: 0,pregnancies,glucose,diastolic,triceps,insulin,bmi,dpf,age,diabetes
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   pregnancies  768 non-null    int64  
 1   glucose      768 non-null    int64  
 2   diastolic    768 non-null    int64  
 3   triceps      768 non-null    int64  
 4   insulin      768 non-null    int64  
 5   bmi          768 non-null    float64
 6   dpf          768 non-null    float64
 7   age          768 non-null    int64  
 8   diabetes     768 non-null    int64  
dtypes: float64(2), int64(7)
memory usage: 54.1 KB


__MODELING__

__Split Data__

In [None]:
X = df[['pregnancies', 'glucose', 'diastolic', 'triceps', 'insulin', 'bmi', 'dpf', 'age']]
y = df['diabetes']

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=40, test_size=0.25)

__Standardized data__

In [None]:
sc = StandardScaler()
sc.fit(X_train)
StandardScaler(copy=True, with_mean=True, with_std=True)

X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

__Logistic Regression Model__

In [None]:
modelLoRe = LogisticRegression()
modelLoRe.fit(X_train_std, y_train)

In [None]:
y_train_pred_LoRe = modelLoRe.predict(X_train_std)
y_test_pred_LoRe = modelLoRe.predict(X_test_std)

In [None]:
accuracy_train_LoRe = accuracy_score(y_train, y_train_pred_LoRe)
accuracy_test_LoRe = accuracy_score(y_test, y_test_pred_LoRe)

print(f"Accuracy score of train set is {accuracy_train_LoRe:.2f}")
print(f"Accuracy score of test set is {accuracy_test_LoRe:.2f}")

Accuracy score of train set is 0.79
Accuracy score of test set is 0.72


__Single Layer Perceptron__

In [None]:
ppn = Perceptron(random_state=40, max_iter=1000)
ppn.fit(X_train_std, y_train)

In [None]:
y_train_pred_ppn = ppn.predict(X_train_std)
y_test_pred_ppn = ppn.predict(X_test_std)

In [None]:
accuracy_train_ppn = accuracy_score(y_train, y_train_pred_ppn)
accuracy_test_ppn = accuracy_score(y_test, y_test_pred_ppn)

print(f"Accuracy score of train set is {accuracy_train_ppn:.2f}")
print(f"Accuracy score of test set is {accuracy_test_ppn:.2f}")

Accuracy score of train set is 0.69
Accuracy score of test set is 0.70


__Multi Layer Perceptron__

In [None]:
mlp = MLPClassifier(random_state=40, activation="relu")
mlp.fit(X_train_std, y_train)



In [None]:
y_train_pred_mlp = mlp.predict(X_train_std)
y_test_pred_mlp = mlp.predict(X_test_std)

In [None]:
accuracy_train_mlp = accuracy_score(y_train, y_train_pred_mlp)
accuracy_test_mlp = accuracy_score(y_test, y_test_pred_mlp)

print(f"Accuracy score of train set is {accuracy_train_mlp:.2f}")
print(f"Accuracy score of test set is {accuracy_test_mlp:.2f}")

Accuracy score of train set is 0.82
Accuracy score of test set is 0.74


### Is there a notable difference in the MLP performance when a ReLU, Sigmoid or SoftMax activation function is used?


__MULTI LAYER PERCEPTRON__

__ReLU__

In [None]:
print(f"Accuracy score of train set is {accuracy_train_mlp}")
print(f"Accuracy score of test set is {accuracy_test_mlp}")

Accuracy score of train set is 0.8194444444444444
Accuracy score of test set is 0.7395833333333334


__Sigmoid__

In [None]:
mlp_sig = MLPClassifier(random_state=40, activation="logistic")
mlp_sig.fit(X_train_std, y_train)

In [None]:
y_train_pred_mlp_sig = mlp_sig.predict(X_train_std)
y_test_pred_mlp_sig = mlp_sig.predict(X_test_std)

In [None]:
accuracy_train_mlp_sig = accuracy_score(y_train, y_train_pred_mlp_sig)
accuracy_test_mlp_sig = accuracy_score(y_test, y_test_pred_mlp_sig)

print(f"Accuracy score of train set is {accuracy_train_mlp_sig}")
print(f"Accuracy score of test set is {accuracy_test_mlp_sig}")

Accuracy score of train set is 0.7899305555555556
Accuracy score of test set is 0.71875


__SoftMax__

In [None]:
mlp_soft = MLPClassifier(random_state=40, activation='identity', solver='adam', )
mlp_soft.fit(X_train_std, y_train)

In [None]:
y_train_pred_mlp_soft = mlp_soft.predict(X_train_std)
y_test_pred_mlp_soft = mlp_soft.predict(X_test_std)

In [None]:
accuracy_train_mlp_soft = accuracy_score(y_train, y_train_pred_mlp_soft)
accuracy_test_mlp_soft = accuracy_score(y_test, y_test_pred_mlp_soft)

print(f"Accuracy score of train set is {accuracy_train_mlp_soft}")
print(f"Accuracy score of test set is {accuracy_test_mlp_soft}")

Accuracy score of train set is 0.7916666666666666
Accuracy score of test set is 0.71875


__Comments__

There are not a notable differente using differents activation functions to fit de model.

### Does the network performance change when the density (number of neurons) of the hidden layers change?

__Multiple Layer Perceptron  with 100 layers and ReLU__





In [None]:
print(f"Accuracy score of train set is {accuracy_train_mlp}")
print(f"Accuracy score of test set is {accuracy_test_mlp}")

Accuracy score of train set is 0.9010416666666666
Accuracy score of test set is 0.7604166666666666


__Multiple Layer Perceptron  with 1000 layers and ReLU__

In [None]:
mlp = MLPClassifier(random_state=40, activation="relu", hidden_layer_sizes=(1000,))
mlp.fit(X_train_std, y_train)



In [None]:
y_train_pred_mlp = mlp.predict(X_train_std)
y_test_pred_mlp = mlp.predict(X_test_std)

In [None]:
accuracy_train_mlp = accuracy_score(y_train, y_train_pred_mlp)
accuracy_test_mlp = accuracy_score(y_test, y_test_pred_mlp)

print(f"Accuracy score of train set is {accuracy_train_mlp:.2f}")
print(f"Accuracy score of test set is {accuracy_test_mlp:.2f}")

Accuracy score of train set is 0.90
Accuracy score of test set is 0.76


__Comments__

There is a noticeable increase in performance when layer density increases.