## ***ExtraTreesClassifier***

An extra-trees(extremely randomized trees) classifier  implements a meta estimator that fits a number of randomized decision trees (a.k.a. extra-trees) on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.

Extra Trees is like Random Forest, in that it builds multiple trees and splits nodes using random subsets of features, but with two key differences: it does not bootstrap observations (meaning it samples without replacement), and nodes are split on random splits, not best splits. So, in summary, ExtraTrees:

*   builds multiple trees with bootstrap = False by default, which means it samples without replacement
*   nodes are split based on random splits among a random subset of the features selected at every node



***Importing the Libraries***

In [0]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [0]:
from google.colab import files
uploaded = files.upload()

Saving diabetes.csv to diabetes.csv


***Importing the dataset***

In [0]:
diabetes = pd.read_csv('diabetes.csv')
diabetes.head()

Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [0]:
diabetes.isnull().sum()

Pregnancies                 0
Glucose                     0
BloodPressure               0
SkinThickness               0
Insulin                     0
BMI                         0
DiabetesPedigreeFunction    0
Age                         0
Outcome                     0
dtype: int64

***There are no null values and instead we have many zeroes in the dataset, we have to find how many zeroes each column has and replace it by its mean***

In [0]:
print("total number of rows :",(len(diabetes)))
print("number of rows missing Glucose",len(diabetes[diabetes['Glucose'] == 0]))
print("number of rows missing Blood Pressure",len(diabetes[diabetes['BloodPressure'] == 0]))
print("number of rows missing SkinThickness",len(diabetes[diabetes['SkinThickness'] == 0]))
print("number of rows missing Insulin",len(diabetes[diabetes['Insulin'] == 0]))
print("number of rows missing BMI",len(diabetes[diabetes['BMI'] == 0]))
print("number of rows missing Diabetespedigreefunction",len(diabetes[diabetes['DiabetesPedigreeFunction'] == 0]))
print("number of rows missing Age:",len(diabetes[diabetes['Age'] == 0]))


total number of rows : 768
number of rows missing Glucose 5
number of rows missing Blood Pressure 35
number of rows missing SkinThickness 227
number of rows missing Insulin 374
number of rows missing BMI 11
number of rows missing Diabetespedigreefunction 0
number of rows missing Age: 0


In [0]:
x=diabetes.iloc[:, :-1].values


In [0]:
y=diabetes['Outcome'].values

In [0]:

from sklearn.preprocessing import Imputer

fill_values = Imputer(missing_values=0, strategy="mean", axis=0)

x = fill_values.fit_transform(x)



In [0]:
x=pd.DataFrame(x)

In [0]:
x.head()

Unnamed: 0,0,1,2,3,4,5,6,7
0,6.0,148.0,72.0,35.0,155.548223,33.6,0.627,50.0
1,1.0,85.0,66.0,29.0,155.548223,26.6,0.351,31.0
2,8.0,183.0,64.0,29.15342,155.548223,23.3,0.672,32.0
3,1.0,89.0,66.0,23.0,94.0,28.1,0.167,21.0
4,4.494673,137.0,40.0,35.0,168.0,43.1,2.288,33.0


In [0]:
y=pd.DataFrame(y)

***Splitting the dataset into the Training set and Test set***

In [0]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.30, random_state=10)

***Finding the bestparameter using RandomizedSearch***

In [0]:
from sklearn.model_selection import RandomizedSearchCV

from sklearn.ensemble import ExtraTreesClassifier

In [0]:
classifier=ExtraTreesClassifier()

In [0]:
from scipy.stats import randint

In [0]:
parameters={'n_estimators':[40,60,80,90,100,140,220,250,300],'criterion':['gini','entropy'],'max_depth':[3,4,5,6,7],'max_features':randint(1,3)}
randomsearch=RandomizedSearchCV(estimator=classifier,param_distributions=parameters,n_iter=20,cv=10,n_jobs=-1,scoring = 'accuracy')
randomsearch=randomsearch.fit(X_train,y_train)

  self.best_estimator_.fit(X, y, **fit_params)


In [0]:
randomsearch.best_estimator_

ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
                     max_depth=6, max_features=2, max_leaf_nodes=None,
                     min_impurity_decrease=0.0, min_impurity_split=None,
                     min_samples_leaf=1, min_samples_split=2,
                     min_weight_fraction_leaf=0.0, n_estimators=80, n_jobs=None,
                     oob_score=False, random_state=None, verbose=0,
                     warm_start=False)

***Fitting ExtraTrees to the Training set***

In [0]:
classifier=ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
                     max_depth=6, max_features=2, max_leaf_nodes=None,
                     min_impurity_decrease=0.0, min_impurity_split=None,
                     min_samples_leaf=1, min_samples_split=2,
                     min_weight_fraction_leaf=0.0, n_estimators=80, n_jobs=None,
                     oob_score=False, random_state=None, verbose=0,
                     warm_start=False)
classifier.fit(X_train,y_train)

  


ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
                     max_depth=6, max_features=2, max_leaf_nodes=None,
                     min_impurity_decrease=0.0, min_impurity_split=None,
                     min_samples_leaf=1, min_samples_split=2,
                     min_weight_fraction_leaf=0.0, n_estimators=80, n_jobs=None,
                     oob_score=False, random_state=None, verbose=0,
                     warm_start=False)

***Predicting the Test set results***

In [0]:
y_pred=classifier.predict(X_test)

***Making the Confusion Matrix***

In [0]:
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

[[137   7]
 [ 59  28]]


In [0]:
from sklearn.metrics import accuracy_score
accuracy=accuracy_score(y_test,y_pred)
accuracy

0.7142857142857143