## Bagging Classifier



*   Bagging Classifier is a machine learning technique which uses randomly selected data or features from the dataset to generate parallel model predictions and uses the mode of these predictions as the final result. This is done so as to reduce the variance of the model. 

* By default, the Bagging Classifier uses Decision Trees. The decision tree classifier makes classifications based on selected features. It starts at the root node and ends with a decision at the leaves. 


In [7]:
# importing the base libraries

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier

## Business Problem

To predict the species of flowers based on its characteristics


## Dataset

This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

Predicted attribute: class of iris plant.

This is an exceedingly simple domain.

This data differs from the data presented in Fishers article (identified by Steve Chadwick, spchadwick '@' espeedaz.net ). The 35th sample should be: 4.9,3.1,1.5,0.2,"Iris-setosa" where the error is in the fourth feature. The 38th sample: 4.9,3.6,1.4,0.1,"Iris-setosa" where the errors are in the second and third features.



## Features

1) ID number

1.   ID number
2.   Diagnosis (M = malignant, B = benign)

2) Diagnosis 

Ten real-valued features are computed for each cell nucleus:

1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class


In [8]:
#load the dataset
iris = sns.load_dataset('iris')
Df = pd.DataFrame(iris)
print(Df.head)

<bound method NDFrame.head of      sepal_length  sepal_width  petal_length  petal_width    species
0             5.1          3.5           1.4          0.2     setosa
1             4.9          3.0           1.4          0.2     setosa
2             4.7          3.2           1.3          0.2     setosa
3             4.6          3.1           1.5          0.2     setosa
4             5.0          3.6           1.4          0.2     setosa
..            ...          ...           ...          ...        ...
145           6.7          3.0           5.2          2.3  virginica
146           6.3          2.5           5.0          1.9  virginica
147           6.5          3.0           5.2          2.0  virginica
148           6.2          3.4           5.4          2.3  virginica
149           5.9          3.0           5.1          1.8  virginica

[150 rows x 5 columns]>


In [9]:
print(Df.columns)

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
       'species'],
      dtype='object')


In [10]:
y = Df['species']
X = Df.drop('species', axis=1)

In [11]:
print(X.columns)

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width'], dtype='object')


In [12]:
# Doing the train test split for model validation
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.33, random_state=42)


In [13]:
# instantiating the Bagging classifier

model = BaggingClassifier(random_state = 42)

In [14]:
# fitting the model to the train data

model.fit(X_train, y_train)

BaggingClassifier(random_state=42)

In [15]:
# getting the train score

model.score(X_train, y_train)

0.99

In [16]:
# getting the test score

model.score(X_test, y_test)

1.0