## **Boosting**

In machine learning, boosting is an ensemble meta-algorithm for primarily reducing bias, and also variance in supervised learning, and a family of machine learning algorithms that convert weak learners to strong ones.

By [Muhammad Huzaifa Shahbaz](https://www.linkedin.com/in/mhuzaifadev)

## **Importing Libraries**

In [0]:
import pandas as pd
from sklearn.ensemble import AdaBoostClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_predict
from sklearn import datasets

## **Dataset Loading**

We will load data from the csv dataset

In [0]:
data = pd.read_csv("./wine.csv",sep = ";")

## **Label Filtering Function: isTasty(quality)**

Converting regression into logit

In [0]:
def isTasty(quality):
  if quality >= 7:
    return 1
  else:
    return 0

## **Features Extraction**

To apply a classifier on this data, we need to extract features and target data and split it into test and train.

In [0]:
features = data[["fixed acidity","volatile acidity","citric acid","residual sugar","chlorides","free sulfur dioxide","total sulfur dioxide","density","pH","sulphates","alcohol"]]
data['tasty'] = data['quality'].apply(isTasty)
targets = data['tasty']

feature_train, feature_test, target_train, target_test = train_test_split(features,targets,test_size=0.2)

## **Finding an Optimal Value**

It may take upto 45min in execution on Colab or maybe hours in your local desktop, based on processing power.

In [0]:
param_grid = {
    'n_estimators' : [50,100,200,300,500,1000],
    'learning_rate' : [0.01,0.05,0.1,0.3,1],
    }

grid_search = GridSearchCV(estimator=AdaBoostClassifier(), param_grid=param_grid,cv=10)
grid_search.fit(feature_train, target_train)

print(grid_search.best_params_)

optimal_estimators = grid_search.best_params_.get("n_estimators")
optimal_lrate = grid_search.best_params_.get("learning_rate")

{'learning_rate': 0.3, 'n_estimators': 300}


## **Training the Model**

We will use AdaBoostClassifier for training the model.

In [0]:
best_model = AdaBoostClassifier(n_estimators=optimal_estimators,learning_rate=optimal_lrate)
best_model.fit(feature_train, target_train)

predictions = best_model.predict(feature_test)

## **Printing an Error Matrix and Accuracy Score**

In [0]:
print(confusion_matrix(target_test,predictions))
print(accuracy_score(target_test,predictions))

[[717  45]
 [135  83]]
0.8163265306122449
