<h1>Table of contents</h1>

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ol>
        <li><a href="#about_dataset">About the dataset</a></li>
        <li><a href="#import_data">Importing the Data</a></li>
        <li><a href="#pre-processing">Pre-processing</a></li>
        <li><a href="#visualization">Visualization</a></li>
        <li><a href="#modeling">Modeling, Prediction, Evaluation</a></li> 
    </ol>
</div>
<br>
<hr>

#### List of Input Files

In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

<div id="about_dataset">
    <h2> About The Dataset </h2>
The two datasets are related to red and white variants of the Portuguese "Vinho Verde" wine. For more details, consult the reference [Cortez et al., 2009]. Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

These datasets can be viewed as classification or regression tasks. The classes are ordered and not balanced (e.g. there are much more normal wines than excellent or poor ones).

This dataset is also available from the UCI machine learning repository, https://archive.ics.uci.edu/ml/datasets/wine+quality

</div>

_IMPORTING NECESSARY LIBRARIES_

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

<div id="import_data">
    <h2> Import Data </h2>
    _READING THE DATA INTO A PANDAS DATAFRAME_
</div>


In [None]:
data = pd.read_csv('/kaggle/input/red-wine-quality-cortez-et-al-2009/winequality-red.csv')

<div id="pre-processing">
    <h2> Pre-Processing </h2>
</div>
_PEEKING INTO DATAFRAME_

In [None]:
data.head(3)

#### INFORMATION ABOUT COLUMNS OF DATAFRAME'S
_Checking COLUMN NAMES, NUMBER OF OBSERVATIONS, NULL VALUES and DATATYPES_

In [None]:
data.info()

# EDA

_STASTISTICAL ANALYSIS OF THE DataFrame_

In [None]:
data.describe().T

<div id= "visualization">
    <h2> Visualization </h2>
</div>

### PAIRPLOT SHOWING RELATION AMONG ATTRIBUTES OF THE DATA

In [None]:
sns.pairplot(data, hue='quality', palette='husl')

### PAIRPLOTS OF ATTRIBUTES WITH LINEAR REGRESSION

In [None]:
sns.pairplot(data, kind='reg')

### HEATMAP

In [None]:
plt.figure(figsize=(14,12))
sns.heatmap(data.corr(), linewidth=0.2, cmap="YlGnBu", annot=True)

## BOXPLOT
_FOR POINTING OUTLIERS_

In [None]:
plt.figure(figsize=(24,6))
sns.boxplot(data=data)

### Analysing effects of various _COMPONENTS_ On _QUALITY_ of the Wine

In [None]:
fig,axes = plt.subplots(3,2, figsize=(20,12))
sns.barplot(x='quality', y='volatile acidity', data=data, ax=axes[0][0])
sns.barplot(x='quality', y='citric acid', data=data, ax=axes[0][1])
sns.barplot(x='quality', y='chlorides', data=data, ax=axes[1][0])
sns.barplot(x='quality', y='sulphates', data=data, ax=axes[1][1])
sns.barplot(x='quality', y='alcohol', data=data, ax=axes[2][0])


### Observation :-

> 1. QUALITY increases as Volatile Acidity decreases.
> 2. QUALITY increases as Citric Acid increases.
> 3. QUALITY increases as Chlorides decreases.
> 4. QUALITY increases as Sulphates' quantity increases.
> 5. QUALITY increases as Alcohal quantity increases.__

<div id="">

<div id="modeling"> </div>
# Random Forest Classifier

In [None]:
x = data.loc[:,data.columns != 'quality']
y = data.loc[:,data.columns == 'quality']
x = np.array(x)
y = np.array(y)


from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.25,random_state=42)

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=100,max_depth=300,random_state=42)
clf.fit(x_train, y_train)
predict = clf.predict(x_test)

from sklearn import metrics
print('Accuracy :: ',metrics.accuracy_score(y_test,predict)*100, ' Percent ')

In [None]:
data1 = data

 Converting __wine_quality > 5 as 1__ (GOOD) and
            __wine_quality <= 5 as 0__ (BAD)

#### This will enable us to categories test_samples into GOOD or BAD

In [None]:
data1['quality'].values[data1['quality']<6] = 0 
data1['quality'].values[data1['quality']>5] = 1

In [None]:
data1.quality.value_counts()

Changing numerical value into Category

In [None]:
data1.quality = data1.quality.astype('category')

In [None]:
data1.dtypes

In [None]:
x = data1.loc[:,data1.columns != 'quality']
y = data1.loc[:,data1.columns == 'quality']
x = np.array(x)
y = np.array(y)


from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y,test_size=0.29,random_state=7)

from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(n_estimators=1000,max_depth=1700,random_state=7)
clf.fit(x_train, y_train)
predict = clf.predict(x_test)

from sklearn import metrics
print('Accuracy :: ',metrics.accuracy_score(y_test,predict)*100, ' Percent ')