# Ensemble Learning for Wine Classification

In this notebook, we will apply **ensemble learning** to classify the wine dataset using multiple classifiers. We'll use a **Voting Classifier**, which combines three different models: **RandomForestClassifier**, **GradientBoostingClassifier**, and **LogisticRegression**. This method aims to improve classification performance by leveraging the strengths of each individual model.

In [1]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

### Step 1: Load the Wine Dataset
First, we will load the wine dataset, which contains information about various wine features and their corresponding class (wine types). We'll assume the dataset is stored in `Datasets/wine_data.csv`.

In [2]:
# Load the Wine dataset from the Datasets folder
# Assuming the wine data is in a CSV format called "wine_data.csv" stored in the "Datasets" folder
wine_data = pd.read_csv('Datasets/wine_data.csv')

# Check the first few rows of the data
wine_data.head()

### Step 2: Data Preprocessing
Next, we will separate the features (X) and target variable (y), and then split the dataset into training and test sets. The target variable is assumed to be the `Class` column, which represents the wine type.

In [3]:
# Separate features (X) and target variable (y)
X = wine_data.drop(columns=['Class'])  # Features
y = wine_data['Class']  # Target variable (wine types)

# Split the dataset into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Step 3: Standardize the Features
We will scale the features to standardize the data, ensuring that each feature has a mean of 0 and a standard deviation of 1. This helps certain classifiers perform better, especially **Logistic Regression** and **Gradient Boosting**.

In [4]:
# Standardize the features for better performance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

### Step 4: Define Base Models
We will define three base learners: **RandomForestClassifier**, **GradientBoostingClassifier**, and **LogisticRegression**. These models will be combined in the next step using a **VotingClassifier**.

In [5]:
# Define the base learners (models)
rf = RandomForestClassifier(n_estimators=100, random_state=42)
gb = GradientBoostingClassifier(n_estimators=100, random_state=42)
lr = LogisticRegression(max_iter=1000, random_state=42)

### Step 5: Create and Train the Voting Classifier
Now, we will create a **VotingClassifier** that combines the three base models. The final prediction will be based on the majority vote from these classifiers.

In [6]:
# Create and train the Voting Classifier (ensemble method)
voting_clf = VotingClassifier(estimators=[
    ('rf', rf),
    ('gb', gb),
    ('lr', lr)
], voting='hard')  # 'hard' voting uses the majority class from the base learners

# Train the ensemble model on the training data
voting_clf.fit(X_train, y_train)

### Step 6: Make Predictions and Evaluate the Model
After training the model, we will make predictions on the test set and evaluate the performance using accuracy.

In [7]:
# Make predictions on the test set
y_pred = voting_clf.predict(X_test)

# Evaluate the model's performance
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy of the Ensemble Learning Model: {accuracy:.2f}')

Accuracy of the Ensemble Learning Model: 0.98