# Comparing Models: Logistic vs 1-Layer vs 3-Layer Neural Networks
**Dataset**: Wine Quality (UCI)  
**Dataset Details**:  
- **Source**: UCI Machine Learning Repository (Red Wine Variants)  
- **Samples**: 1,599 red wines with 11 physicochemical features  
- **Features**: Fixed acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total sulfur dioxide, density, pH, sulphates, alcohol  
- **Target**: Binary quality classification (0=low quality ≤5, 1=high quality >5)  

**Challenge**:  
Physical-chemical properties interact nonlinearly to determine quality. Simple logistic regression assumes linear feature relationships, while neural networks (with hidden layers) can model complex interactions through activation functions like ReLU.

<h1 style="color:red;">Intructions</h1>

- Progress cell-by-cell.
- Check for **<a style="color:red;">Execute</a>s**, where codes for <a style="color:green;">green</a> tasks are already written and you are expected write codes to excute the remaining tasks.
- After completing all the coding tasks, check **<a style="color:red;">Compute</a>** challenge at the end.


In [None]:
# Imports
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.neural_network import MLPClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
from sklearn.exceptions import ConvergenceWarning

# Suppress only ConvergenceWarnings
warnings.filterwarnings("ignore", category=ConvergenceWarning)

## 1️⃣ Load Wine Quality Dataset
<h3 style="color:red;">Execute:</h3>

- <a style="color:green;">Load dataset from UCI URL</a>
- <a style="color:green;">Inspect class distribution</a>
- <a style="color:green;">Separate features (X) and labels (y)</a>
- Using `df.head()`, check the column names and first few rows of the data
- Print the shape of X and y and compare with the details provided in the above data description
- Print the correlation matrix (you can use the code from the previous tutorial)

In [None]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
df = pd.read_csv(url, delimiter=';')
X = df.drop('quality', axis=1).values
y = df['quality'].apply(lambda x: 1 if x > 5 else 0).values  # Binary classification

# Print head of the dataframe


# print shapes of X and y


# print corr matrix





## 2️⃣ Train/Test Split & Normalization
<h3 style="color:red;">Execute:</h3>

- Split data into 70% train / 30% test
- <a style="color:green;">Apply standardization</a>

In [None]:
# write code for splitting the data into (X_train, y_train) and (X_test, y_test)


# apply standardization 
scaler = StandardScaler().fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

## 3️⃣ Model Definitions, Training & Evaluation
<h3 style="color:red;">Execute:</h3>

- <a style="color:green;">Initialize Logistic Regression</a>
- <a style="color:green;">Initialize MLP with 1 hidden layer (50 neurons)</a>
- <a style="color:green;">Initialize MLP with 2 hidden layers (50,25 neurons)</a>

In [None]:
# Logistic regression model
log_reg = LogisticRegression(tol=1e-4, 
                             max_iter=10000, 
                             random_state=42)
# 1-layer NN classifier
mlp_1layer = MLPClassifier(hidden_layer_sizes=(32,), 
                           solver='adam', 
                           activation='relu',
                           tol=1e-4, 
                           max_iter=10000, 
                           random_state=42)
# 3-layer NN classifier
mlp_3layer = MLPClassifier(hidden_layer_sizes=(128, 64, 32), 
                           solver='adam', 
                           activation='relu',
                           tol=1e-4, 
                           max_iter=10000, 
                           random_state=42)

<h3 style="color:red;">Execute:</h3>

- <a style="color:green;">Fit models on training data</a>
- For each model, get predictions
- Compute test accuracy for each for each model using true labels and predicted labels
- Print the accuracies

In [None]:
log_reg.fit(X_train, y_train)
mlp_1layer.fit(X_train, y_train)
mlp_3layer.fit(X_train, y_train)

# get predictions


# Compute accuracy for each model


# print the accuracies



## 4️⃣ Mean Performance Over 10 experiments
<h3 style="color:red;">Execute:</h3>

- Combine above tasks to compute the average performance by running 10 experimenst with different seeds

In [None]:
acc_log_list = []
acc_mlp1_list = []
acc_mlp3_list = []

for seed in range(10):

    # Split the data
    

    # Scale the data
    

    
    # Logistic regression model



    
    # 1-layer NN classifier



    
    # 3-layer NN classifier


    

    # fit the models

    

    # get predictions

    

    # Compute accuracy for each model


    
    # print the accuracies for each experiment

    

    # collect the accuracies


# print mean accuracies 
print("\n-------- Final mean results ---------")
print("Logistic Regression Mean Test Accuracy: ", np.mean(acc_log_list))
print("1-Layer MLP Mean Test Accuracy: ", np.mean(acc_mlp1_list))
print("3-Layer MLP Mean Test Accuracy: ", np.mean(acc_mlp3_list))

## 5️⃣ Parameter Count Challenge
<h3 style="color:red;">Compute the number of parameters:</h3>

Compute the number of learnable paramters in each one of the above three models. **Note** that `MLPClassifier` uses `sigmoid` function instead of `softmax` for binary classification.

- *Logistic Regression:*  

 
- *Above 1-Layer MLP:*

  
- *Above 3-Layer MLP:*

  