# Diabetese Detection Models

This [dataset](https://raw.githubusercontent.com/mansont/datasets-tests/main/diabetese.csv) contains patient data and their diabetese condition: "1" they have diabetes, "0" they do not have diabetese.


Build the following models and compare their performance:
* A logistic regression model
* A single-layer perceptron model
* A multilayer perceptron

In [7]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression, Perceptron
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, classification_report

1. Data loading

In [8]:
url = "https://raw.githubusercontent.com/mansont/datasets-tests/main/diabetese.csv"
df = pd.read_csv(url)
print(df.head())

   pregnancies  glucose  diastolic  triceps  insulin   bmi    dpf  age  \
0            6      148         72       35        0  33.6  0.627   50   
1            1       85         66       29        0  26.6  0.351   31   
2            8      183         64        0        0  23.3  0.672   32   
3            1       89         66       23       94  28.1  0.167   21   
4            0      137         40       35      168  43.1  2.288   33   

   diabetes  
0         1  
1         0  
2         1  
3         0  
4         1  


2. Pretreatment

In [9]:
y = df['diabetes']
X = df.drop('diabetes', axis=1)

# Standardisation
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Séparation train/test
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2, random_state=42)

3. Logistic regression model

In [18]:
log_model = LogisticRegression()
log_model.fit(X_train, y_train)
y_pred_log = log_model.predict(X_test)
print("Régression Logistique:")
print(classification_report(y_test, y_pred_log))
print('Régression Logistique: %.2f' % accuracy_score(y_test, y_pred_log ))

Régression Logistique:
              precision    recall  f1-score   support

           0       0.81      0.80      0.81        99
           1       0.65      0.67      0.66        55

    accuracy                           0.75       154
   macro avg       0.73      0.74      0.73       154
weighted avg       0.76      0.75      0.75       154

Régression Logistique: 0.75


4. A single-layer perceptron model

In [17]:
perceptron = Perceptron()
perceptron.fit(X_train, y_train)
y_pred_perc = perceptron.predict(X_test)
print("Perceptron:")
print(classification_report(y_test, y_pred_perc))
print('Perceptron accuracy: %.2f' % accuracy_score(y_test, y_pred_perc))

Perceptron:
              precision    recall  f1-score   support

           0       0.66      0.95      0.78        99
           1       0.58      0.13      0.21        55

    accuracy                           0.66       154
   macro avg       0.62      0.54      0.49       154
weighted avg       0.63      0.66      0.58       154

Perceptron accuracy: 0.66


5. A multilayer perceptron

In [15]:
mlp_relu = MLPClassifier(random_state=0)
mlp_relu.fit(X_train, y_train)
y_pred_mlp_relu = mlp_relu.predict(X_test)
print("MLP (ReLU):")
print(classification_report(y_test, y_pred_mlp_relu))
print('MLP accuracy: %.2f' % accuracy_score(y_test, y_pred_mlp_relu))


MLP (ReLU):
              precision    recall  f1-score   support

           0       0.80      0.79      0.80        99
           1       0.63      0.65      0.64        55

    accuracy                           0.74       154
   macro avg       0.72      0.72      0.72       154
weighted avg       0.74      0.74      0.74       154

MLP accuracy: 0.74




### Is there a notable difference in the MLP performance when a ReLU, Sigmoid or SoftMax activation function is used?


In [19]:
for activation in ['relu', 'logistic', 'softmax']:
    try:
        model = MLPClassifier(random_state=0)
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)
        acc = accuracy_score(y_test, y_pred)
        print(f"Activation: {activation} - Accuracy: {acc:.4f}")
    except Exception as e:
        print(f"Erreur avec {activation} : {e}")




Activation: relu - Accuracy: 0.7403




Activation: logistic - Accuracy: 0.7403
Activation: softmax - Accuracy: 0.7403




### Does the network performance change when the density (number of neurons) of the hidden layers change?

In [20]:
for size in [(10,), (50,), (100,), (100, 50), (100, 100)]:
    model = MLPClassifier(random_state=0)
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    acc = accuracy_score(y_test, y_pred)
    print(f"Structure cachée: {size} - Accuracy: {acc:.4f}")



Structure cachée: (10,) - Accuracy: 0.7403




Structure cachée: (50,) - Accuracy: 0.7403




Structure cachée: (100,) - Accuracy: 0.7403




Structure cachée: (100, 50) - Accuracy: 0.7403
Structure cachée: (100, 100) - Accuracy: 0.7403




### Conclusion
- La régression logistique offre une baseline simple et rapide.
- Le perceptron est moins performant que le MLP.
- Le choix de la fonction d’activation et du nombre de neurones n'impacte pas la performance.
