The goal of ensemble methods is to combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustness over a single estimator.

In [None]:
Two families of ensemble methods are usually distinguished: averaging methods and boosting methods

# Averaging methods

The driving principle is to build several estimators independently and then to average their predictions.
(Bagging methods, Forests of randomized trees)
the combined estimator is usually better than any of the single base estimator because its variance is reduced.
it is like parrale estimator

# Bagging meta-estimator

bagging methods form a class of algorithms which build several instances of a estimator on random subsets of the original training set and then aggregate their individual predictions to form a final prediction.

These methods are used as a way to reduce the variance of a base estimator (e.g., a decision tree), by introducing randomization into its construction procedure as making an ensemble out of it.

 As they provide a way to reduce overfitting, bagging methods work best with strong and complex models (e.g., fully developed decision trees)
 

# flavours of Bagging methods 

In [None]:
Pasting : When random subsets of the dataset are drawn as random subsets of the samples.

Bagging : When random subsets of the dataset are drawn as random subsets of the samples with replacement.
    
Random Subspaces : When random subsets of the dataset are drawn as random subsets of the features.
    
Random Patches : when base estimators are built on subsets of both samples and features

# BaggingClassifier

A Bagging classifier is an ensemble meta-estimator that fits base classifiers each on random subsets of the 
original dataset and then aggregate their individual predictions either by voting or by averaging to form a final prediction


# Parameters

In [None]:
base_estimatorobject, default=None

In [None]:
The base estimator to fit on random subsets of the dataset
If None, then the base estimator is a DecisionTreeClassifier.

In [None]:
n_estimatorsint, default=10

In [None]:
The number of base estimators in the ensemble.

In [None]:
max_samplesint or float, default=1.0

In [None]:
The number of samples to draw from given dataset to train each base estimator (with or without replacement)

In [None]:
max_features

In [None]:
The number of features to draw from given dataset to train each base estimator
(without replacement by default, see bootstrap_features)

In [None]:
bootstrap : bool, default=True

In [None]:
Whether samples are drawn with replacement. If False, sampling without replacement is performed.

In [None]:
bootstrap_features : bool, default=False

In [None]:
Whether features are drawn with replacement.

In [None]:
oob_score bool, default=False

Whether to use out-of-bag samples to estimate the generalization error. Only available if bootstrap=True.

When using a subset of the available samples the generalization accuracy can be estimated with the out-of-bag samples by setting oob_score=True

In [None]:
n_jobs

In [None]:
useing spyspark

In [None]:
random_state and verbose

In [None]:
verbose : containing more words than necessary 

# EXAMPLE DATASET

In [2]:
from sklearn.ensemble import BaggingClassifier
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import pandas as pd
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(return_X_y=True)

In [3]:
scaled_X = preprocessing.scale(X)

X_train, X_test, y_train, y_test = train_test_split(scaled_X, y, stratify=y,
                                                   random_state=1)


# Parameter

 use base estimator as DecisionTreeClassifier.
 tuning parameter
 n_estimators,base_estimator,max_samples

In [4]:
clf=BaggingClassifier(base_estimator=None, #The base estimator to fit on random subsets of the dataset
n_estimators=20, #The number of samples to draw from X to train each base estimator 
max_samples=0.32, #The number of samples (batch) to draw from X to train each base estimator 30% of sample are taken
max_features=1.0, #The number of features to draw from X to train each base estimator all samples are taken
bootstrap=True,  # samples are drawn with replacement
bootstrap_features=False, #features are not drawn with replacement.
oob_score=True, # use out-of-bag samples to estimate the generalization error
warm_start=False, n_jobs=None, #take is as defult
random_state=1, #Controls the random resampling of the original dataset
verbose=0 #Controls the verbosity when fitting and predicting
                     )

In [5]:
clf.fit(X_train, y_train) 

BaggingClassifier(max_samples=0.32, n_estimators=20, oob_score=True,
                  random_state=1)

The predicted class of an input sample is computed as the class with the highest mean predicted probability and  it resorts to voting

In [6]:
clf.predict(X_test) 

array([0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1,
       0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1,
       0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0,
       1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0])

In [7]:
clf.score( X_test, y_test) # mean accuracy on the given test data and labels.

0.9440559440559441

# Stacked generalization

1.)Stacked generalization is a method for combining estimators to reduce their biases.

2.)the predictions of each individual estimator are stacked together and used as input to a final estimator to compute the prediction.

3.)This final estimator is trained through cross-validation.

# StackingClassifier /Stack of estimators with a final classifier.

Stacked generalization consists in stacking the output of individual estimator and use a classifier to compute the final prediction.

Note that estimators_ are fitted on the full X while final_estimator_ is trained using cross-validated predictions of the base estimators using cross_val_predict.

# Parameters

In [None]:
estimators

In [None]:
Base estimators which will be stacked together.
and it is a list

In [None]:
final_estimator

In [None]:
A classifier which will be used to combine the base estimators. The default classifier is a LogisticRegression.

In [None]:
cv

In [None]:
Determines the cross-validation splitting strategy used in cross_val_predict to train final_estimator.
1.)None, to use the default 5-fold cross validation,
2.)integer, to specify the number of folds in a (Stratified) KFold,
3.)An object to be used as a cross-validation generator,
4.)An iterable yielding train, test splits.
cv is not used for model evaluation but for prediction.

In [None]:
stack_method

In [None]:
Methods called for each base estimator.
‘auto’ will try to invoke, for each estimator, 'predict_proba', 'decision_function' or 'predict' in that order.
otherwise, one of 'predict_proba', 'decision_function' or 'predict'. 
If the method is not implemented by the estimator, it will raise an error so we chose it by defult

In [None]:
n_jobsint,

In [None]:
pyspark

In [None]:
passthrough

In [None]:
When False, only the predictions of estimators will be used as training data for final_estimator.

In [None]:
When True, the final_estimator is trained on the predictions as well as the original training data both

In [None]:
verbose

In [None]:
Verbosity level.

In [None]:
# StackingClassifier /Stack of estimators with a final classifier.

In [None]:
every parameter is same except in regression we are not use stack_method

In [None]:
# EXAMPLE DATASET

In [15]:
from sklearn.ensemble import StackingClassifier
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import LinearSVC
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(return_X_y=True)

In [16]:
scaled_X = preprocessing.scale(X)

X_train, X_test, y_train, y_test = train_test_split(scaled_X, y, stratify=y,
                                                   random_state=1)


In [17]:
estimators = [
     ('mf', RandomForestClassifier(n_estimators=10, random_state=42)),
     ('skr', LinearSVC(random_state=42)) ]

In [19]:
clf1 = StackingClassifier(estimators=estimators, #Base estimators which will be stacked together.
final_estimator=None,   #The default classifier is a LogisticRegression.
cv=None ,#None, to use the default 5-fold cross validation
stack_method='auto', #take is defult beacouse it will try to invoke, for each estimator all methods
n_jobs=None,
passthrough=False,# only the predictions of estimators will be used as training data for final_estimator
verbose=0)

In [20]:
clf1.fit(X_train, y_train) 

StackingClassifier(estimators=[('mf',
                                RandomForestClassifier(n_estimators=10,
                                                       random_state=42)),
                               ('skr', LinearSVC(random_state=42))])

In [22]:
clf1.predict(X_test) 

array([0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1,
       0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0,
       1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1,
       0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0,
       1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1])

In [24]:
clf1.score( X_test, y_test)

0.965034965034965

In [26]:
clf2 = StackingClassifier(estimators=estimators, #Base estimators which will be stacked together.
final_estimator=None,   #The default classifier is a LogisticRegression.
cv=None ,#None, to use the default 5-fold cross validation
stack_method='auto', #take is defult beacouse it will try to invoke, for each estimator all methods
n_jobs=None,
passthrough=True,# only the predictions of estimators will be used as training data for final_estimator
verbose=0)

In [27]:
clf2.fit(X_train, y_train) 
clf2.predict(X_test) 


array([0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1,
       0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1,
       0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0,
       1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1,
       0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0,
       1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1])

In [28]:
clf2.score( X_test, y_test)

0.958041958041958

# Forests of randomized trees

In [None]:
includes two averaging algorithms based on randomized decision trees:
1)the RandomForest algorithm
2)Extra-Trees method

In [None]:
Both algorithms are perturb-and-combine techniques (good for unstable method) specifically designed for trees. 

In [None]:
perturb:create different model
combine : create a single prediction


In [None]:
perturb create different models useing
resampleing,s
ubsampleing,
adding noise,
adaptively rewaiting,
randomly choosing from the competitor split

In [None]:
combine pradict useing voteing weighted voteing and avarageing

# Random Forests

In [None]:
In random forests  each tree in the ensemble is built from a sample drawn with replacement from the training set.

In [None]:
the best split is found either from all input features or a random subset of size max_features. 

In [None]:
Random forests achieve a reduced variance by combining diverse trees, 

sometimes at the cost of a slight increase in bias. In practice the variance reduction is often significant hence yielding an overall better model.

# RandomForestClassifier

In [None]:
A random forest is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset 
and uses averaging to improve the predictive accuracy and control over-fitting

paraameters

# min_weight_fraction_leaf

In [None]:
The minimum  weighted fraction (the sum total of weights) of all the input samples required to be at a leaf node.
Samples have equal weight when sample_weight is not provided

# min_impurity_decrease

In [None]:
 min_impurity_decrease=N_t / N * {impurity - (N_t_R / N_t * right_impurity)- (N_t_L / N_t * left_impurity)}
where N is the total number of samples, 
N_t is the number of samples at the current node,
N_t_L is the number of samples in the left child, 
and N_t_R is the number of samples in the right child.

example
N_t = 26
N = 90
N_t_R = 4
N_t_L = 22
impurity = 0.2041
right impurity = 0.375
left impurity = 0

In [7]:
a=26/90 #N_t / N
impurity=0.2041
c=4/26 #N_t_R / N_t
d=22/90#N_t_L / N_t
right_impurity = 0.375
left_impurity = 0

In [8]:
 min_impurity_decrease=a*(impurity-(c*right_impurity)-(d*left_impurity))

In [9]:
 min_impurity_decrease

0.042295555555555545

In [None]:
A node will be split if this split induces a decrease of the impurity 0.042295555555555545 greater than or equal to 0.0
defult 0 to consider full length

# bootstrap and  oob_score

In [None]:
bootstrap
Fix the sample size
Randomly choose a data point for a sample
After selection, keep it back in the main set (replacement)
Again choose a data point from the main training set for the sample and after selection, keep it back.
Perform the above steps, till we reach the specified sample size.

oob_score

While making the samples, data points were chosen randomly and with replacement, and the data points which fail to be a part of that particular sample are known as OUT-OF-BAG points.

In [None]:
 OOB_Score prevents leakage and gives a better model with low variance, so we use OOB_score for validating the model.

In [None]:
No leakage of data,Less Variance,Less Computatio 
Better Predictive Model: for small and medium size data set


# warm_start

In [None]:
would you need to increase the number of estimators before approaching a new fit
When set to True, reuse the solution of the previous call to fit and add more estimators to the ensemble,


# class_weight

In [None]:
It penalizes mistakes in samples of class[i] with class_weight[i] instead of 1.
So higher class-weight means you want to put more emphasis on a class.

example
From what you say it seems class 0 is 19 times more frequent than class 1. 
So you should increase the class_weight of class 1 relative to class 0, say {0:.1, 1:.9}.

If the class_weight doesn't sum to 1, it will basically change the regularization parameter.

The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in 
the input data as n_samples / (n_classes * np.bincount(y)).(use it for unbalanced  dataset )

For multi-output, the weights of each column of y will be multiplied.

If not given, all classes are supposed to have weight one (n0ne)

weights associated with classes in the form {class_label: weight}

For multi-output problems, a list of dicts can be provided in the same order as the columns of y and the 
weights of each column of y will be multiplied.



In [None]:
 ccp_alpha

In [None]:
Minimal cost complexity pruning recursively finds the node with the “weakest link”.
As alpha increases, more of the tree is pruned, which increases the total impurity of its leaves.


This algorithm is parameterized by α(≥0) known as the complexity parameter.

The complexity parameter is used to define the cost-complexity measure, Rα(T) of a given tree T: Rα(T)=R(T)+α|T|

where |T| is the number of terminal nodes 
in T and R(T) is traditionally defined as the total misclassification rate of the terminal nodes.

DecisionTree in sklearn has a function called cost_complexity_pruning_path, which gives the effective alphas of 
subtrees during pruning and also the corresponding impurities. In other words, we can use these values of alpha to 
prune our decision tree:


In [None]:
procedure

In [None]:
1)it the data use all bydefult setting of tree.
2)findout max depth and  ccp_alpha useing cost_complexity_pruning_path(X, y[, …])and get_depth() funtion
3)useing grid scherch cv to pass all parameters (depth,ccp_alpha) to find out best parameter.
4)use those parameter to RandomForest.

# RandomForestClassifier

In [None]:
RandomForestClassifier(n_estimators=100, #The number of trees in the forest.
                       criterion='gini', #The function to measure the quality of a split.
                       max_depth=None,#The maximum depth of the tree(defult to allow max_depth)
                       min_samples_split=2, #The minimum number of samples required to split an internal node(defult)
                       min_samples_leaf=1, #The minimum number of samples required to be at a leaf node.(for classification 1)
                       min_weight_fraction_leaf=0.0, #Samples have equal weight when sample_weight is not provided
                       max_features='auto', #The number of features to consider when looking for the best split
                       max_leaf_nodes=None, #Grow trees with max_leaf_nodes in best-first fashion
                       min_impurity_decrease=0.0, #A node will be split if the impurity decrease greater than or equal to this value.
                       min_impurity_split=0.0, #A node will split if its impurity is above the threshold
                       bootstrap=True, #Whether bootstrap samples are used when building trees.(always true for RandomForest )
                       oob_score=True, #Whether to use out-of-bag samples to estimate the generalization score(unbalanced dataset)
                       n_jobs=None, #he number of jobs to run in parallel.
                       random_state=0, #Controls both the randomness of the bootstrapping of the samples used when building trees
                       verbose=0, #Controls the verbosity when fitting and predicting.
                       warm_start=False, #fit a whole new forest
                       class_weight=None, # all classes are supposed to have weight one 
                       ccp_alpha=0.0,#Complexity parameter used for Minimal Cost-Complexity Pruning. 
                       max_samples=None#If bootstrap is True, the number of samples to draw from X to train each base estimator.
                      )

# RandomForestRegressor

In [None]:
RandomForestRegressor(n_estimators=100, #The number of trees in the forest.
                      criterion='mse', #The function to measure the quality of a split.
                      max_depth=None, #The maximum depth of the tree(defult to allow max_depth)
                      min_samples_split=2, #The minimum number of samples required to split an internal node(defult)
                      min_samples_leaf=1,  #The minimum number of samples required to be at a leaf node.(for classification 1)
                      min_weight_fraction_leaf=0.0,#Samples have equal weight when sample_weight is not provided
                      max_features='auto',#The number of features to consider when looking for the best split
                      max_leaf_nodes=None, #Grow trees with max_leaf_nodes in best-first fashion
                      min_impurity_decrease=0.0, #A node will be split if the impurity decrease greater than or equal to this value.
                      min_impurity_split=None, #A node will split if its impurity is above the threshold
                      bootstrap=True, #Whether bootstrap samples are used when building trees.(always true for RandomForest )
                      oob_score=False, #Whether to use out-of-bag samples to estimate the generalization score(unbalanced dataset)
                      n_jobs=None, #he number of jobs to run in parallel.
                      random_state=None, #Controls both the randomness of the bootstrapping of the samples used when building trees
                      verbose=0, #Controls the verbosity when fitting and predicting.
                      warm_start=False, #fit a whole new forest
                      ccp_alpha=0.0, #Complexity parameter used for Minimal Cost-Complexity Pruning.
                      max_samples=None#If bootstrap is True, the number of samples to draw from X to train each base estimator.
                     )

# Extremely Randomized Trees

In [None]:
Random Forest chooses the optimum split while Extra Trees chooses it randomly.
This usually allows to reduce the variance of the model a bit more, at the expense of a slightly greater increase in bias
improve the predictive accuracy and control over-fitting.

In [None]:
ExtraTreesClassifier(n_estimators=100,#The number of trees in the forest.
                     criterion='gini',
                     max_depth=None, 
                     min_samples_split=2, 
                     min_samples_leaf=1, 
                     min_weight_fraction_leaf=0.0, 
                     max_features='auto',
                     max_leaf_nodes=None,
                     min_impurity_decrease=0.0, 
                     min_impurity_split=None,
                     bootstrap=False, #the whole dataset is used to build each tree.
                     oob_score=False, # Only available if bootstrap=True.
                     n_jobs=None, 
                     random_state=None, 
                     verbose=0, 
                     warm_start=False, 
                     class_weight=None, 
                     ccp_alpha=0.0,
                     max_samples=None
                    )

In [None]:
ExtraTreesRegressor(n_estimators=100,
                    criterion='mse',
                    max_depth=None,
                    min_samples_split=2, 
                    min_samples_leaf=1, 
                    min_weight_fraction_leaf=0.0, 
                    max_features='auto', 
                    max_leaf_nodes=None, 
                    min_impurity_decrease=0.0, 
                    min_impurity_split=None, 
                    bootstrap=False, 
                    oob_score=False, 
                    n_jobs=None, 
                    random_state=None, 
                    verbose=0, 
                    warm_start=False, 
                    ccp_alpha=0.0, 
                    max_samples=None)

# boosting methods,

boosting methods, base estimators are built sequentially and one tries to reduce the bias of the combined estimator it is like seirise of estimator.
The motivation is to combine several weak models to produce a powerful ensemble.

# AdaBoost

In [None]:
The core principle of AdaBoost is to fit a sequence of weak learners  on repeatedly modified versions of the data.

In [None]:
The predictions from all of them are then combined through a weighted majority vote (or sum) to produce the final prediction

In [None]:
first step simply trains a weak learner on the original data.

For each successive iteration, the sample weights(initialy 1) are individually modified 
and the learning algorithm is reapplied to the reweighted data
    
At a given step, those training examples that were incorrectly predicted by the boosted model induced at the previous step
have their weights increased, whereas the weights are decreased for those that were predicted correctly.

As iterations proceed, examples that are difficult to predict receive ever-increasing influence.

In [None]:
base_estimato (weak learner)

In [None]:
Weak Classifier: Formally, a classifier that achieves slightly better than 50 percent accuracy.(binary classification.)

Weak Learner:  a model that performs slightly better than a naive model.it has been generalized to multi-class classification 
and has a different meaning beyond better than 50 percent accuracy.

Decision Stump: A decision tree with a single node operating on one input variable, the output of which 
makes a prediction directly.

weak learning models
k-Nearest Neighbors, with k=1 operating on one or a subset of input variables.
DecisionTreeClassifier initialized with max_depth=1
Naive Bayes, operating on a single input variable.


In [None]:
algorithm

In [None]:
For multi-class classification, AdaBoostClassifier implements AdaBoost-SAMME and AdaBoost-SAMME.R 
For regression, AdaBoostRegressor implements AdaBoost.R2

In [None]:
AdaBoostClassifier(base_estimator=None, # If None, then the base estimator is DecisionTreeClassifier initialized with max_depth=1
                   n_estimators=50, #The maximum number of estimators at which boosting is terminated. 
                   learning_rate=1.0,#Weight applied to each classifier at each boosting iteration.
                   algorithm='SAMME.R', # lower test error with fewer boosting iterations.
                   random_state=None #Controls the random seed given at each base_estimator at each boosting iteration.
                
                  )

learning_rate

In [None]:
A higher learning rate increases the contribution of each classifier. 

In [None]:
AdaBoostRegressor(base_estimator=None,#The base estimator from which the boosted ensemble is built. 
                  n_estimators=50, #The maximum number of estimators at which boosting is terminated
                  learning_rate=1.0, #Weight applied to each classifier at each boosting iteration.
                  loss='linear', #he loss function to use when updating the weights after each boosting iteration.
                  random_state=None#Controls the random seed given at each base_estimator at each boosting iteration.
                 )

# Voting Classifier

In [None]:
Combine conceptually different machine learning classifiers and use a majority vote or the average predicted probabilities
(soft vote) to predict the class labels.

Such a classifier can be useful for a set of equally well performing model in order to balance out their individual weaknesses.

Majority Class Labels (Majority/Hard Voting)

Weighted Average Probabilities (Soft Voting)

In [None]:
Majority Class Labels (Majority/Hard Voting)

In [None]:
If ‘hard’, uses predicted class labels for majority rule voting.

In [None]:
In majority voting, the predicted class label for a particular sample is the class label that represents the majority (mode)
of the class labels predicted by each individual classifier.

if the prediction for a given sample is

classifier 1 -> class 1

classifier 2 -> class 1

classifier 3 -> class 2

the VotingClassifier (with voting='hard') would classify the sample as “class 1” based on the majority class 
Sequence of weights (float or int) to weight the occurrences of predicted class labels 

In [None]:
voting

In [None]:
If ‘hard’, uses predicted class labels for majority rule voting.

Weighted Average Probabilities (Soft Voting)
oft voting returns the class label as argmax of the sum of predicted probabilitie
The weighted average probabilities for a sample would then be calculated as follows:

which is recommended for an ensemble of well-calibrated classifiers.

In [None]:
weights

In [None]:
Sequence of weights to weight the occurrences of predicted class labels 
class probabilities before averaging (soft voting).

array-like of shape (n_classifiers,)
class probabilities before averaging
Number of `estimators` and weights must be equal

In [None]:
#voting='hard'
VotingClassifier(estimators, 
                 voting='hard', #If ‘hard’, uses predicted class labels for majority rule voting
                 weights=None, #Uses uniform weights for hard voteing
                 n_jobs=None, #The number of jobs to run in parallel for fit. 
                 flatten_transform=False, #Affects shape of transform output it returns (n_classifiers, n_samples, n_classes).
                 verbose=False
                )

#voting='soft'
VotingClassifier(estimators, 
                 voting='soft', #‘soft’, predicts the class label based on the argmax of the sums of the predicted probabilities,
                 weights=None, #Uses uniform weights for hard voteing,Sequence of weights to weight the occurrences of predicted
                 n_jobs=None, #The number of jobs to run in parallel for fit. 
                 flatten_transform=True, #Affects shape of transform output it returns (n_classifiers, n_samples, n_classes).
                 verbose=False
                )

# EXAMPLE

In [17]:
from sklearn import datasets
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
import pandas as pd

In [18]:
iris = datasets.load_iris()
X=pd.DataFrame(iris.data)
y=pd.DataFrame(iris.target)

In [19]:
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y,
                                                   random_state=1)


In [21]:
count = y_test[0].value_counts()

In [22]:
count

2    13
1    13
0    12
Name: 0, dtype: int64

# useing parameter for VotingClassifier

In [None]:

weights=None #beacouse all class has same frequency
flatten_transform=False ##Affects shape of transform output
voting='hard' #uses predicted class labels for majority rule voting

# estimators

In [32]:
#GaussianNB
gnb=GaussianNB(priors=None,#Prior probabilities of the classes ,calculated by defult
               var_smoothing=1e-09 #Portion of the largest variance 
              )


#RandomForestClassifie
Ran=RandomForestClassifier(n_estimators=100, #The number of trees in the forest.
                       criterion='gini', #The function to measure the quality of a split.
                       max_depth=None,#The maximum depth of the tree(defult to allow max_depth)
                       min_samples_split=2, #The minimum number of samples required to split an internal node(defult)
                       min_samples_leaf=1, #The minimum number of samples required to be at a leaf node.(for classification 1)
                       min_weight_fraction_leaf=0.0, #Samples have equal weight when sample_weight is not provided
                       max_features='auto', #The number of features to consider when looking for the best split
                       max_leaf_nodes=None, #Grow trees with max_leaf_nodes in best-first fashion
                       min_impurity_decrease=0.0, #A node will be split if the impurity decrease greater than or equal to this value.
                       min_impurity_split=0.0, #A node will split if its impurity is above the threshold
                       bootstrap=True, #Whether bootstrap samples are used when building trees.(always true for RandomForest )
                       oob_score=True, #Whether to use out-of-bag samples to estimate the generalization score(unbalanced dataset)
                       n_jobs=None, #he number of jobs to run in parallel.
                       random_state=0, #Controls both the randomness of the bootstrapping of the samples used when building trees
                       verbose=0, #Controls the verbosity when fitting and predicting.
                       warm_start=False, #fit a whole new forest
                       class_weight=None, # all classes are supposed to have weight one 
                       ccp_alpha=0.0,#Complexity parameter used for Minimal Cost-Complexity Pruning. 
                       max_samples=None#If bootstrap is True, the number of samples to draw from X to train each base estimator.
                      )

#Logistic regression
log= LogisticRegression(penalty='l2',# norm used in the penalization
                       dual=False, # when n_samples > n_features.
                       tol=0.0001, #The tolerance for the optimization
                       fit_intercept=True,#data is not expected to be centered
                       class_weight=None,#Weights associated with classes but not consider in logit
                       random_state=0,
                       max_iter=1000, #Maximum number of iterations taken for the solvers to converge
                       multi_class='multinomial',#muliclass 
                       verbose=0, warm_start=False, n_jobs=None)

In [33]:
estimators1=[('log', log), ('Ran', Ran), ('gnb', gnb)]

In [34]:
V1=VotingClassifier(estimators=estimators1, 
                 voting='hard', #If ‘hard’, uses predicted class labels for majority rule voting
                 weights=None, #Uses uniform weights for hard voteing
                 n_jobs=None, #The number of jobs to run in parallel for fit. 
                 flatten_transform=False, #Affects shape of transform output it returns (n_classifiers, n_samples, n_classes).
                 verbose=False
                )

In [35]:
V1.fit(X_train, y_train) 





VotingClassifier(estimators=[('log',
                              LogisticRegression(max_iter=1000,
                                                 multi_class='multinomial',
                                                 random_state=0)),
                             ('Ran',
                              RandomForestClassifier(min_impurity_split=0.0,
                                                     oob_score=True,
                                                     random_state=0)),
                             ('gnb', GaussianNB())],
                 flatten_transform=False)

In [39]:
V1.score(X_train, y_train) # mean accuracy on the given test data and labels.


0.9821428571428571

In [40]:
V1.predict(X_test) 

array([2, 0, 0, 0, 1, 0, 1, 1, 0, 1, 2, 2, 2, 1, 2, 1, 2, 1, 1, 1, 1, 2,
       2, 1, 0, 0, 0, 1, 2, 0, 0, 2, 1, 0, 0, 1, 2, 2])

# VotingRegressor

In [None]:
combine conceptually different machine learning regressors and return the average predicted values.

In [None]:
weights

In [None]:
Sequence of weights (float or int) to weight the occurrences of predicted values before averaging. 
Uses uniform weights if None.
Number of `estimators` and weights must be equal

In [None]:
VotingRegressor(estimators, 
                weights=None, #uniform weights if None.
                n_jobs=None,
                verbose=False)

# Exmple

In [52]:
from sklearn.datasets import load_boston
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import AdaBoostRegressor
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.ensemble import VotingRegressor
import pandas as pd
import numpy as np

# weak learner

In [53]:

K_neigh_r = KNeighborsRegressor(n_neighbors=2)

reg=LinearRegression( fit_intercept=True,#calculate the intercept for this model
                     normalize=True,#the regressors X will be normalized before regressin 
                    )


# AdaBoost

In [54]:
ada_boost_1=AdaBoostRegressor(base_estimator=K_neigh_r,#The base estimator from which the boosted ensemble is built. 
                  n_estimators=50, #The maximum number of estimators at which boosting is terminated
                  learning_rate=1.0, #Weight applied to each classifier at each boosting iteration.
                  loss='linear', #he loss function to use when updating the weights after each boosting iteration.
                  random_state=None#Controls the random seed given at each base_estimator at each boosting iteration.
                 )

ada_boost_2=AdaBoostRegressor(base_estimator=reg,#The base estimator from which the boosted ensemble is built. 
                  n_estimators=50, #The maximum number of estimators at which boosting is terminated
                  learning_rate=1.0, #Weight applied to each classifier at each boosting iteration.
                  loss='linear', #he loss function to use when updating the weights after each boosting iteration.
                  random_state=None#Controls the random seed given at each base_estimator at each boosting iteration.
                 )


ada_boost_3=AdaBoostRegressor(base_estimator=None,# If None, then the base estimator is DecisionTreeRegressor 
                  n_estimators=50, #The maximum number of estimators at which boosting is terminated
                  learning_rate=1.0, #Weight applied to each classifier at each boosting iteration.
                  loss='linear', #he loss function to use when updating the weights after each boosting iteration.
                  random_state=None#Controls the random seed given at each base_estimator at each boosting iteration.
                 )

In [55]:
estimators2=[('ada_boost_1', ada_boost_1), ('ada_boost_2', ada_boost_2), ('ada_boost_3', ada_boost_3)]

# create model

In [60]:
V_r=VotingRegressor(estimators=estimators2,weights= [0.1,0.2,0.3])

# load dataset

In [61]:

da1= load_boston()
X=da1.data
y=da1.target
scaled_X = preprocessing.scale(X)
X_train, X_test, y_train, y_test = train_test_split(scaled_X, y)

In [62]:
model4=V_r.fit(X_train, y_train)

In [63]:
y_redict=model4.predict(X_test)

In [65]:
model4.score(X_train, y_train, sample_weight=None) #Return the coefficient of determination r2 of the prediction.

0.9120136463046514

In [67]:

r2_score(y_test,y_redict)


0.8345345410159364

In [None]:
#both bais and vvarience are good enough to use the model