# Prediction of Wine Quality

### Parameters present in wine-quality csv files:
#### 1. Fixed Acidity <br> 2. Volatile Acidity <br> 3. Citric Acid <br> 4. Residual Sugar <br> 5. Chlorides <br> 6. Free Sulfur Dioxide <br> 7. Total Sulfur Dioxide <br> 8. Density <br> 9. pH <br> 10. Sulphates <br> 11. Alcohol Percentage (%) <br> 12. Quality

## Preliminary Data Exploration

In [1]:
import pandas as pd
import numpy as np

In [2]:
# We start with White Wine
df = pd.read_csv("winequality-white.csv")

In [4]:
df.info()
df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4898 entries, 0 to 4897
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         4898 non-null   float64
 1   volatile acidity      4898 non-null   float64
 2   citric acid           4898 non-null   float64
 3   residual sugar        4898 non-null   float64
 4   chlorides             4898 non-null   float64
 5   free sulfur dioxide   4898 non-null   float64
 6   total sulfur dioxide  4898 non-null   float64
 7   density               4898 non-null   float64
 8   pH                    4898 non-null   float64
 9   sulphates             4898 non-null   float64
 10  alcohol               4898 non-null   float64
 11  quality               4898 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 459.3 KB


Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
count,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0
mean,6.854788,0.278241,0.334192,6.391415,0.045772,35.308085,138.360657,0.994027,3.188267,0.489847,10.514267,5.877909
std,0.843868,0.100795,0.12102,5.072058,0.021848,17.007137,42.498065,0.002991,0.151001,0.114126,1.230621,0.885639
min,3.8,0.08,0.0,0.6,0.009,2.0,9.0,0.98711,2.72,0.22,8.0,3.0
25%,6.3,0.21,0.27,1.7,0.036,23.0,108.0,0.991723,3.09,0.41,9.5,5.0
50%,6.8,0.26,0.32,5.2,0.043,34.0,134.0,0.99374,3.18,0.47,10.4,6.0
75%,7.3,0.32,0.39,9.9,0.05,46.0,167.0,0.9961,3.28,0.55,11.4,6.0
max,14.2,1.1,1.66,65.8,0.346,289.0,440.0,1.03898,3.82,1.08,14.2,9.0


### Summary:<br>We have 12 columns including the quality. From the looks of it, all the parameters are continuous. Linear Regression could be used as a good preliminary test for setting a baseline. 

## Linear Regression - Prediction of Quality using the Parameters

In [6]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

In [14]:
df_X = df.copy()
df_X = df_X.drop(["quality"],axis=1)
df_y = df['quality']

X_train, X_test, y_train, y_test = train_test_split(df_X, df_y, test_size=0.2, random_state=1)

linear_regression1 = LinearRegression()
linear_regression1.fit(X_train, y_train)

y_pred = linear_regression1.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error of:",mse,"\nR2 Square:",r2)

def mean_absolute_percentage_error(y_true, y_pred): 
    y_true, y_pred = np.array(y_true), np.array(y_pred)
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

mape = mean_absolute_percentage_error(y_test, y_pred)
print(f'Mean Absolute Percentage Error (MAPE): {mape:.2f}%')
print(f'Accuracy based on MAPE: {100-mape:.2f}%')

Mean Squared Error of: 0.542548726356103 
R2 Square: 0.2901221946674388
Mean Absolute Percentage Error (MAPE): 10.20%
Accuracy based on MAPE: 89.80%


### Summary:<br> Based on LinearRegression we have received a prediction accuracy of 89.8%.

## Linear Regression - Feature Importance

In [17]:
linear_regression2 = LinearRegression()
linear_regression2.fit(df_X, df_y)
feature_importance = pd.DataFrame({'Feature': df_X.columns, 'Coefficient': linear_regression2.coef_})
feature_importance = feature_importance.sort_values(by='Coefficient', ascending=False)
print(feature_importance)

                 Feature  Coefficient
8                     pH     0.686344
9              sulphates     0.631476
10               alcohol     0.193476
3         residual sugar     0.081483
0          fixed acidity     0.065520
2            citric acid     0.022090
5    free sulfur dioxide     0.003733
6   total sulfur dioxide    -0.000286
4              chlorides    -0.247277
1       volatile acidity    -1.863177
7                density  -150.284181
