# Wine Quality Classifier

## Business Understanding 
Predicting consumer preference: The dataset contains a wealth of information on the physicochemical properties of the wine, such as alcohol content, acidity, and sugar content, along with the corresponding quality scores provided by human experts. This presents a unique opportunity for businesses to leverage machine learning techniques to build models that can predict how consumers might rate the quality of a particular wine based on its measurable characteristics.

#### The ability to accurately predict consumer preference can be immensely valuable for wineries and retailers in several ways:

**Optimizing wine production:** 

Wineries can utilize these models to identify specific grape varietals, vineyard practices, or fermentation techniques that consistently result in wines with quality profiles that resonate with target consumer segments. 

This can inform grape selection, vineyard management practices, and winemaking decisions, ultimately leading to the production of wines that are more likely to achieve commercial success.

### Targeted marketing:

Retailers can leverage these models to develop personalized recommendations for their customers. By understanding the quality attributes that are most important to individual customers, retailers can recommend specific wines that are likely to be a good match for their preferences. This can lead to increased customer satisfaction, loyalty, and sales.

In [6]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [7]:
data = pd.read_csv('winequality-red (1).csv')

data.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [8]:
data.shape

(1599, 12)

In [10]:
data.nunique()

fixed acidity            96
volatile acidity        143
citric acid              80
residual sugar           91
chlorides               153
free sulfur dioxide      60
total sulfur dioxide    144
density                 436
pH                       89
sulphates                96
alcohol                  65
quality                   6
dtype: int64

In [9]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         1599 non-null   float64
 1   volatile acidity      1599 non-null   float64
 2   citric acid           1599 non-null   float64
 3   residual sugar        1599 non-null   float64
 4   chlorides             1599 non-null   float64
 5   free sulfur dioxide   1599 non-null   float64
 6   total sulfur dioxide  1599 non-null   float64
 7   density               1599 non-null   float64
 8   pH                    1599 non-null   float64
 9   sulphates             1599 non-null   float64
 10  alcohol               1599 non-null   float64
 11  quality               1599 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 150.0 KB


# Data Understanding

## Data source

This data set was sourced from Kaggel website. 

## Data Description

- This dataset contains 1599 rows and 12 columns of data. 

- The data set contains:
    1. float64 = 11 variables
    2. int64 = 1 variable

### Variables Description
 - Acidity:

    1. fixed acidity,volatile acidity,cirtric acid and pH.

        - fixed acidity (g/l): This column represents the amount of tartaric acid present in the wine.  Higher values generally indicate higher acidity, which can contribute to a sour taste.

        - volatile acidity (g/l): This column indicates the level of volatile acids, primarily acetic acid, produced by bacterial activity during fermentation.  High values can indicate spoilage or poor winemaking practices.

        - citric acid (g/l):  This refers to the amount of citric acid, which contributes to the wine's tartness and freshness. 

        - pH: This column indicates the acidity level of the wine on a scale of 0 (highly acidic) to 14 (highly basic).  Wine typically falls between 3 and 4 pH.

- Residuals:
    1. residual sugar, chlorides

        - residual sugar (g/l): This specifies the amount of unconverted sugar remaining after fermentation.  Higher values suggest sweeter wines.

        - chlorides (g/l): This column indicates the level of chloride salts present, which can influence a wine's taste perception.

- Sulfur Dioxide
    1. free sulfur dioxide, total sulfur dioxide

        - free sulfur dioxide (mg/l):  This represents the amount of unbound sulfur dioxide, a preservative commonly added to wine to prevent oxidation and microbial growth.

        - total sulfur dioxide (mg/l): This includes both free and bound forms of sulfur dioxide. Excessive levels can affect taste and aroma.

- Other
    1. density, sulphates, alcohol  

        - sulphates (g/l): This signifies the level of sulfate salts, which can influence a wine's  minerality and mouthfeel.

        - alcohol (% vol): This column represents the percentage of alcohol content in the wine by volume.  

- Quality (target variables)

    1. quality (0-10):

        - This is the dependent variable,  a score between 0 (very bad) and 10 (very good) based on sensory evaluation by human experts.


In [16]:
# Check for missing values
null_num = []

for i in data.columns:
    x = data[i].isnull().sum()
    null_num.append(x)

null = pd.DataFrame(null_num,index=data.columns,columns=['Total Missing Values'])
null

Unnamed: 0,Total Missing Values
fixed acidity,0
volatile acidity,0
citric acid,0
residual sugar,0
chlorides,0
free sulfur dioxide,0
total sulfur dioxide,0
density,0
pH,0
sulphates,0
