# Project 5: Ensemble Models on Wine Quality

**Name:** Saratchandra Golla    
**Date:** November 15, 2025

**Introduction:**   
This project explores the use of ensemble machine learning models to classify the quality of red wine using physicochemical properties from the UCI Wine Quality Dataset. Ensemble methods, which combine multiple models, are powerful tools for improving predictive performance by reducing overfitting and enhancing generalization . The goal is to compare the performance of selected ensemble models and determine the best approach for this multi-class classification problem.

## Imports

We import all necessary libraries for data loading, preprocessing, model building (including ensemble methods), cross-validation, and evaluation.

In [8]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Ensemble Models
from sklearn.ensemble import (
    RandomForestClassifier,
    AdaBoostClassifier,
    GradientBoostingClassifier,
    BaggingClassifier,
    VotingClassifier,
)
# Base Estimators for Voting and Bagging
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier

# Utilities and Metrics
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    confusion_matrix,
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
)

# Set random seed for reproducibility
RANDOM_STATE = 42

## Section 1. Load and Inspect the Data
We load the Red Wine Quality Dataset. The original dataset contains 11 physicochemical features and a quality target variable . We use a semicolon (;) as a separator for the CSV file

In [9]:
# Load the wine quality dataset
try:
    df = pd.read_csv("winequality-red.csv", sep=";")
except FileNotFoundError:
    print("Error: 'winequality-red.csv' not found. Please ensure the file is in the same directory.")
    df = None # Handle case where file is missing

if df is not None:
    print("--- Wine Quality Dataset Info ---")
    df.info()
    print("\n--- Wine Quality Dataset Head ---")
    print(df.head())
    print(f"\nDataset loaded with {len(df)} samples and 12 columns (11 features + quality).")

--- Wine Quality Dataset Info ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         1599 non-null   float64
 1   volatile acidity      1599 non-null   float64
 2   citric acid           1599 non-null   float64
 3   residual sugar        1599 non-null   float64
 4   chlorides             1599 non-null   float64
 5   free sulfur dioxide   1599 non-null   float64
 6   total sulfur dioxide  1599 non-null   float64
 7   density               1599 non-null   float64
 8   pH                    1599 non-null   float64
 9   sulphates             1599 non-null   float64
 10  alcohol               1599 non-null   float64
 11  quality               1599 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 150.0 KB

--- Wine Quality Dataset Head ---
   fixed acidity  volatile acidity  citric acid  residual