# Wine Quality Prediction Using Machine Learning

# OBJECTIVE

The objective of this project is to develop an accurate and reliable machine learning model that can predict the quality of wine based on its physicochemical properties. By analyzing features such as acidity, sugar content, and alcohol levels, this model aims to assist winemakers in quality control and help consumers make informed purchasing decisions.

# DATA SOURCE

The data for the Wine Quality Prediction project comes from a publicly available dataset known as the white Wine Quality dataset . It includes measurements of various chemical properties of the wines, which are used to predict their quality.
The dataset consists of the following key features:-

Fixed acidity,
Volatile acidity,
Citric acid,
Residual sugar,
Chlorides,
Free sulfur dioxide,
Total sulfur dioxide,
Density,
pH,
Sulphates,
Alcohol,
Quality.

# Importing the libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

# Importing the dataset

In [None]:
dataset = pd.read_csv('winequality-white.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [None]:
print(X)

[[ 7.    0.27  0.36 ...  3.    0.45  8.8 ]
 [ 6.3   0.3   0.34 ...  3.3   0.49  9.5 ]
 [ 8.1   0.28  0.4  ...  3.26  0.44 10.1 ]
 ...
 [ 6.5   0.24  0.19 ...  2.99  0.46  9.4 ]
 [ 5.5   0.29  0.3  ...  3.34  0.38 12.8 ]
 [ 6.    0.21  0.38 ...  3.26  0.32 11.8 ]]


In [None]:
print(y)

[6 6 6 ... 6 7 6]


In [None]:
dataset.describe()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
count,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0
mean,6.854788,0.278241,0.334192,6.391415,0.045772,35.308085,138.360657,0.994027,3.188267,0.489847,10.514267,5.877909
std,0.843868,0.100795,0.12102,5.072058,0.021848,17.007137,42.498065,0.002991,0.151001,0.114126,1.230621,0.885639
min,3.8,0.08,0.0,0.6,0.009,2.0,9.0,0.98711,2.72,0.22,8.0,3.0
25%,6.3,0.21,0.27,1.7,0.036,23.0,108.0,0.991723,3.09,0.41,9.5,5.0
50%,6.8,0.26,0.32,5.2,0.043,34.0,134.0,0.99374,3.18,0.47,10.4,6.0
75%,7.3,0.32,0.39,9.9,0.05,46.0,167.0,0.9961,3.28,0.55,11.4,6.0
max,14.2,1.1,1.66,65.8,0.346,289.0,440.0,1.03898,3.82,1.08,14.2,9.0


In [None]:
dataset.columns

Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol', 'quality'],
      dtype='object')

In [None]:
dataset.shape

(4898, 12)

# Define Target Variable (y) and Feature Variables (X)

In [None]:
y = dataset['quality']
X = dataset[['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol']]

# Feature Scalling

In [None]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)

In [None]:
y.shape

(4898,)

In [None]:
X.shape

(4898, 11)

In [None]:
y

0       6
1       6
2       6
3       6
4       6
       ..
4893    6
4894    5
4895    6
4896    7
4897    6
Name: quality, Length: 4898, dtype: int64

In [None]:
X

array([[ 1.72096961e-01, -8.17699008e-02,  2.13280202e-01, ...,
        -1.24692128e+00, -3.49184257e-01, -1.39315246e+00],
       [-6.57501128e-01,  2.15895632e-01,  4.80011213e-02, ...,
         7.40028640e-01,  1.34184656e-03, -8.24275678e-01],
       [ 1.47575110e+00,  1.74519434e-02,  5.43838363e-01, ...,
         4.75101984e-01, -4.36815783e-01, -3.36667007e-01],
       ...,
       [-4.20473102e-01, -3.79435433e-01, -1.19159198e+00, ...,
        -1.31315295e+00, -2.61552731e-01, -9.05543789e-01],
       [-1.60561323e+00,  1.16673788e-01, -2.82557040e-01, ...,
         1.00495530e+00, -9.62604939e-01,  1.85757201e+00],
       [-1.01304317e+00, -6.77100966e-01,  3.78559282e-01, ...,
         4.75101984e-01, -1.48839409e+00,  1.04489089e+00]])

# Splitting the dataset into the Training set and Test set

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Training the SVR model on the Training set

In [None]:
from sklearn.svm import SVC
regressor = SVC()
regressor.fit(X_train, y_train)

# Predicting the Test set results

In [None]:
y_pred = regressor.predict(X_test)


In [None]:
y_pred.shape

(980,)

In [None]:
y_pred

array([5, 5, 6, 7, 6, 5, 6, 6, 6, 5, 6, 7, 5, 5, 6, 6, 6, 5, 6, 6, 6, 6,
       6, 6, 6, 7, 5, 5, 6, 6, 5, 6, 6, 6, 5, 6, 6, 7, 7, 6, 6, 6, 6, 6,
       5, 6, 5, 7, 6, 6, 7, 6, 5, 6, 6, 6, 6, 6, 5, 6, 5, 5, 6, 6, 6, 6,
       5, 5, 5, 5, 6, 5, 6, 7, 6, 6, 6, 5, 5, 7, 5, 5, 6, 6, 5, 6, 6, 6,
       6, 6, 6, 6, 5, 7, 5, 6, 5, 5, 6, 7, 6, 5, 6, 6, 5, 7, 6, 6, 6, 5,
       5, 6, 6, 5, 6, 7, 5, 5, 7, 6, 5, 5, 6, 5, 5, 7, 5, 6, 6, 6, 6, 6,
       7, 6, 6, 5, 6, 6, 5, 6, 6, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,
       5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 5, 7, 6, 6, 5, 6, 6, 6, 6, 6, 6, 6,
       5, 5, 6, 5, 6, 6, 6, 7, 6, 6, 6, 6, 6, 6, 6, 5, 5, 6, 6, 6, 5, 6,
       6, 6, 6, 6, 5, 6, 6, 6, 6, 5, 7, 6, 5, 7, 7, 6, 6, 7, 6, 6, 7, 7,
       6, 6, 6, 6, 5, 5, 6, 5, 6, 5, 6, 6, 5, 5, 5, 6, 6, 6, 6, 6, 5, 6,
       7, 6, 5, 6, 5, 6, 7, 6, 6, 6, 5, 6, 6, 5, 6, 6, 6, 7, 6, 6, 6, 6,
       5, 6, 6, 6, 6, 5, 7, 6, 5, 6, 5, 5, 6, 6, 5, 6, 6, 5, 5, 6, 6, 6,
       6, 5, 5, 6, 6, 6, 6, 6, 5, 5, 6, 5, 5, 6, 5,

# Evaluating the Model Performance

In [None]:
from sklearn.metrics import confusion_matrix, accuracy_score, classification_report
confusion_matrix(y_test,y_pred)


array([[  0,   0,   6,   3,   0,   0],
       [  0,   0,  33,  17,   1,   0],
       [  0,   0, 166, 129,   0,   0],
       [  0,   0,  74, 319,  16,   0],
       [  0,   0,   7, 131,  45,   0],
       [  0,   0,   0,  23,  10,   0]])

In [None]:
accuracy_score(y_test,y_pred)

0.5408163265306123

In [None]:
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           3       0.00      0.00      0.00         9
           4       0.00      0.00      0.00        51
           5       0.58      0.56      0.57       295
           6       0.51      0.78      0.62       409
           7       0.62      0.25      0.35       183
           8       0.00      0.00      0.00        33

    accuracy                           0.54       980
   macro avg       0.29      0.26      0.26       980
weighted avg       0.51      0.54      0.50       980



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


# EXPLANATION

In this wine quality prediction project, we employed the Support Vector Classifier (SVC) algorithm from the sklearn library, along with the pandas library for data manipulation. After thorough data exploration and preprocessing, including handling missing values and encoding categorical variables, we trained the SVC model to predict wine quality based on features such as acidity, sugar content, and alcohol level.

Despite our efforts, the model achieved an accuracy of 0.54, indicating moderate performance. While this accuracy can provide some insights, it may not be sufficient for practical use. Further model tuning, feature engineering, or trying different algorithms could potentially improve the accuracy and make the model more reliable for real-world applications.





