**Business Background and Project Overview**

The Board of the production department of a large corporation I work for is worried about the relatively high production of wine and wants me to build a model that predict if the wine is good or not. As a Data Scientist on the team assigned to this task, I am to build a model to better understand the situation.

**Import the neccesary libraries**

In [1]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

In [2]:
# loading the dataset to a Pandas DataFrame
wine_dataset = pd.read_csv('../input/red-wine-quality-cortez-et-al-2009/winequality-red.csv')

In [3]:
# number of rows & columns in the dataset
wine_dataset.shape

In [4]:
# First 5 rows of the dataset
wine_dataset.head()

In [5]:
# Checking for missing values in the dataset
wine_dataset.isna().sum()

**Exploratory Data Analysis (EDA)**

In [6]:
# number of values for each quality
sns.catplot(x='quality', data = wine_dataset, kind = 'count')


In [7]:
# volatile acidity vs Quality
plot = plt.figure(figsize=(5,5))
sns.barplot(x='quality', y = 'volatile acidity', data = wine_dataset)

In [8]:
# citric acid vs Quality
plot = plt.figure(figsize=(5,5))
sns.barplot(x='quality', y = 'citric acid', data = wine_dataset)

In [9]:
correlation = wine_dataset.corr()
# constructing a heatmap to understand the correlation between the columns
plt.figure(figsize=(10,10))
sns.heatmap(correlation, cbar=True, square=True, fmt = '.1f', annot = True, annot_kws={'size':8}, cmap = 'Blues')

**Data Preprocessing**

In [10]:
# separate the data and Label
X = wine_dataset.drop('quality',axis=1)
print(X)

In [11]:
Y = wine_dataset["quality"].apply(lambda y_value: 1 if y_value>=7 else 0)

In [12]:
print(Y)

**Train_Test split**

In [13]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=3)

In [14]:
print(Y.shape, Y_train.shape, Y_test.shape)

**Model Training**

In [15]:
model = RandomForestClassifier()
model.fit(X_train, Y_train)

**Model Evaluation**

In [16]:
# accuracy on test data
X_test_prediction = model.predict(X_test)
test_data_accuracy = accuracy_score(X_test_prediction, Y_test)

In [17]:
print('Accuracy score is : ', test_data_accuracy)

In [18]:
input_data = (7.5,0.5,0.36,6.1,0.071,17.0,102.0,0.9978,3.35,0.8,10.5)

# changing the input data to a numpy array
input_data_as_numpy_array = np.asarray(input_data)

# reshape the data as we are predicting the label for only one instance
input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

prediction = model.predict(input_data_reshaped)
print(prediction)

if (prediction[0]==1):
  print('Good Quality Wine')
else:
  print('Bad Quality Wine')

**Pls do upvote if this code is helpful**