----

# **Applying Averaging And Max Vooting Ensamble Learning Techinque**

## **Author**   :  **Muhammad Adil Naeem**

## **Contact**   :   **madilnaeem0@gmail.com**
<br>

----



------

## **`Average Classifiers`**


-------

### **Import Libraries**

In [19]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.impute import SimpleImputer
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.svm import SVC
from sklearn import model_selection
from sklearn.model_selection import train_test_split, KFold
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report, r2_score

import warnings
warnings.filterwarnings('ignore')

### **Load Dataset**

In [5]:
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data',
                   header=None)
data.columns = ['Sample code', 'Clump Thickness', 'Uniformity of Cell Size', 'Uniformity of Cell Shape',
                'Marginal Adhesion', 'Single Epithelial Cell Size', 'Bare Nuclei', 'Bland Chromatin',
                'Normal Nucleoli', 'Mitoses', 'Class']


### **Data Cleaning**

In [6]:
data.drop(['Sample code'], axis=1, inplace=True)

data.replace('?', 0, inplace=True)
data['Bare Nuclei'] = data['Bare Nuclei'].astype(int)

In [7]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 699 entries, 0 to 698
Data columns (total 10 columns):
 #   Column                       Non-Null Count  Dtype
---  ------                       --------------  -----
 0   Clump Thickness              699 non-null    int64
 1   Uniformity of Cell Size      699 non-null    int64
 2   Uniformity of Cell Shape     699 non-null    int64
 3   Marginal Adhesion            699 non-null    int64
 4   Single Epithelial Cell Size  699 non-null    int64
 5   Bare Nuclei                  699 non-null    int64
 6   Bland Chromatin              699 non-null    int64
 7   Normal Nucleoli              699 non-null    int64
 8   Mitoses                      699 non-null    int64
 9   Class                        699 non-null    int64
dtypes: int64(10)
memory usage: 54.7 KB


### **Apply Simple Imputer**

In [None]:
values = data.values
imputer = SimpleImputer()
scaled = imputer.fit_transform(values)

### **Scale Data using Min-Max Scaler**

In [9]:
scaler = MinMaxScaler(feature_range=(0, 1))
scaled = scaler.fit_transform(scaled)

### **Splitting Data into Depeendent and Independent Variables**

In [10]:
X = pd.DataFrame(scaled)
y = data['Class']

### **Initialization of Classification Models**

- This code initializes three different classifiers: a Logistic Regression model, a Decision Tree Classifier, and a Support Vector Machine (SVM) Classifier, preparing them for training and evaluation.

In [14]:
logistic_regression_clf = LogisticRegression()
decision_tree_clf = DecisionTreeClassifier()
svm_clf = SVC()

### **Fitting Classification Models**

- This code trains three classifiers—Logistic Regression, Decision Tree, and Support Vector Machine—using the dataset `X` and corresponding labels `y`.

In [15]:
logistic_regression_clf.fit(X, y)
decision_tree_clf.fit(X, y)
svm_clf.fit(X, y)

### **Averaging Predictions and Evaluating Accuracy**

- This code predicts outcomes using three classifiers, averages their predictions, and calculates the R² score to evaluate the model's accuracy compared to the true labels `y`, then prints the result.

In [17]:
logistic_regression_clf_pred = logistic_regression_clf.predict(X)
decision_tree_clf_pred = decision_tree_clf.predict(X)
svm_clf_pred = svm_clf.predict(X)

avg_pred = (logistic_regression_clf_pred + decision_tree_clf_pred + svm_clf_pred) / 3
acc = r2_score(y, avg_pred)
print(acc)

1.0


In [18]:
avg_pred

array([2., 2., 2., 2., 2., 4., 2., 2., 2., 2., 2., 2., 4., 2., 4., 4., 2.,
       2., 4., 2., 4., 4., 2., 4., 2., 4., 2., 2., 2., 2., 2., 2., 4., 2.,
       2., 2., 4., 2., 4., 4., 2., 4., 4., 4., 4., 2., 4., 2., 2., 4., 4.,
       4., 4., 4., 4., 4., 4., 4., 4., 4., 4., 2., 4., 4., 2., 4., 2., 4.,
       4., 2., 2., 4., 2., 4., 4., 2., 2., 2., 2., 2., 2., 2., 2., 2., 4.,
       4., 4., 4., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 4., 4., 4., 4.,
       2., 4., 4., 4., 4., 4., 2., 4., 2., 4., 4., 4., 2., 2., 2., 4., 2.,
       2., 2., 2., 4., 4., 4., 2., 4., 2., 4., 2., 2., 2., 4., 2., 2., 2.,
       2., 2., 2., 2., 2., 2., 4., 2., 2., 2., 4., 2., 2., 4., 2., 4., 4.,
       2., 2., 4., 2., 2., 2., 4., 4., 2., 2., 2., 2., 2., 4., 4., 2., 2.,
       2., 2., 2., 4., 4., 4., 2., 4., 2., 4., 2., 2., 2., 4., 4., 2., 4.,
       4., 4., 2., 4., 4., 2., 2., 2., 2., 2., 2., 2., 2., 4., 4., 2., 2.,
       2., 4., 4., 2., 2., 2., 4., 4., 2., 4., 4., 4., 2., 2., 4., 2., 2.,
       4., 4., 4., 4., 2.

------

## **`Voting Classifiers`**


-------

### **Create KFold and Split with 10**

In [20]:
kfold = KFold(n_splits=10, shuffle=True, random_state=7)

### **Voting Classifier with Cross-Validation**

- This code creates an ensemble of classifiers—Logistic Regression, Decision Tree, and Support Vector Machine—using a Voting Classifier. It performs cross-validation on the dataset `X` with labels `y`, printing the mean accuracy of the ensemble model.

In [21]:
estimater = []

model1 = LogisticRegression()
estimater.append(('logistic', model1))
model2 = DecisionTreeClassifier()
estimater.append(('cart', model2))
model3 = SVC()
estimater.append(('svm', model3))

ensemble = VotingClassifier(estimater)
results = model_selection.cross_val_score(ensemble, X, y, cv=kfold)
print(results.mean())

1.0
