# Bagging and RandomForest Red Wine Classification

* ## Bagging, also known as bootstrap aggregation, is the ensemble learning method that is commonly used to reduce variance within a noisy dataset. In bagging, a random sample of data in a training set is selected with replacement—meaning that the individual data points can be chosen more than once.
* ## What is bagging used for?
### Bagging, also known as bootstrap aggregation, is the ensemble learning method that is commonly used to reduce variance within a noisy dataset. In bagging, a random sample of data in a training set is selected with replacement—meaning that the individual data points can be chosen more than once

  ![](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c8/Ensemble_Bagging.svg/440px-Ensemble_Bagging.svg.png)

# Importing Library

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingClassifier
from sklearn.metrics import confusion_matrix,accuracy_score,precision_score,recall_score

# Loading dataset

In [None]:
data=pd.read_csv('../input/red-wine-quality-cortez-et-al-2009/winequality-red.csv')

In [None]:
data.head()

In [None]:
data.tail()

In [None]:
data.info()

In [None]:
data.describe()

# Checking for Null Values or Missing Values

In [None]:
data.isnull().sum()

# PreProcessing Data

In [None]:
# Quality is the target variable

data.quality.value_counts()

In [None]:
data.quality.hist()

In [None]:
fig = go.Figure(data=[go.Pie(labels=data.quality.value_counts().index, values=data.quality.value_counts())])
fig.show()

## Quality Column's data is not properly distributed and so we will convert it into 0 and 1
### [3,4,5] = 0  Bad Wine
### [6,7,8] = 1  Good Wine

In [None]:
data.quality=data.quality.replace([3,4,5],0)
data.quality=data.quality.replace([6,7,8],1)

In [None]:
data.quality.value_counts()

In [None]:
labels = ['Good Red Wine','Bad Red Wine']
fig = go.Figure(data=[go.Pie(labels=labels, values=data.quality.value_counts())])
fig.show()

# Our data is now in proper Format so now lets Visualize Dataset

In [None]:
corr = data[data.columns].corr()
sns.heatmap(corr, cmap="YlGnBu", annot = True)
plt.title('Heatmap for Correlation of Parameters')
sns.set(rc={'figure.figsize':(20,20)})
plt.show()

In [None]:
data.hist(figsize=(17,12),color='red')
plt.show()

# Splitting Data into Training Data and Testing Data

In [None]:
X=data.drop(['quality'],axis=1)
y=data.quality

In [None]:
X.shape

In [None]:
X

In [None]:
y.shape

In [None]:
y

In [None]:
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=42,test_size=0.2)

In [None]:
X_train.shape

In [None]:
y_train.shape

In [None]:
X_test.shape

In [None]:
y_test.shape

# Model Building 

In [None]:
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators =90, random_state = 200)
rf.fit(X_train, y_train)
rf_predictions=rf.predict(X_test)
print('Accuracy',np.mean(rf.predict(X_test) == y_test)*100)

In [None]:
c=confusion_matrix(y_test,rf_predictions)
a=accuracy_score(y_test,rf_predictions)
p=precision_score(y_test,rf_predictions)
r=recall_score(y_test,rf_predictions)

In [None]:
print('Confusion Matrix:\n',c)

In [None]:
print('Accuracy:',a*100)

In [None]:
print('Precision:',p*100)

In [None]:
print('Recall:',r*100)

In [None]:
fig, ax = plt.subplots(figsize=(7.5, 7.5))
ax.matshow(c, cmap=plt.cm.Blues, alpha=0.3)
for i in range(c.shape[0]):
    for j in range(c.shape[1]):
        ax.text(x=j, y=i,s=c[i, j], va='center', ha='center', size='xx-large')
plt.xlabel('Predictions', fontsize=18)
plt.ylabel('Actuals', fontsize=18)
plt.title('Confusion Matrix', fontsize=18)
plt.show()