## Wine Data Analysis
In This Project Our Aim Is To Perform Exploratory Data Analysis On Wine Quality Dataset Were We will Compare Red Wine With White Wine

<img src="https://www.teahub.io/photos/full/185-1852548_wallpaper-wine-bottle-cups-christmas-decoration-.jpg">

The Wine Quality dataset contains information about various physicochemical properties
of wines. The entire dataset is grouped into two categories: red wine and white wine. Each
wine has a quality label associated with it. The label is in the range of 0 to 10. In the next
section, we are going to download and load the dataset into Python and perform an initial
analysis to disclose what is inside it.

The main topics discussed in this Project include the following:
<ul>
<li>Disclosing the wine quality dataset</li>
<li>Analyzing Red Wine</li>
<li>Analyzing White Wine</li>
<li>Model Development & Evaluation</li>
<li>Further Analysis</li>
</ul>

<h4>Table Description</h4>

Red Wine = `https://raw.githubusercontent.com/mrblink2002/EDA_Wine_Quality/main/dataset/winequality_red.csv`
<br>
White Wine = `https://raw.githubusercontent.com/mrblink2002/EDA_Wine_Quality/main/dataset/winequality_white.csv`

Table short Description

Column|Description
---|-----
Fixed acidity | It indicates the amount of tartaric acid in wine and is measuredin g/dm3.
Volatile acidity | It indicates the amount of acetic acid in the wine. It is measured in g/dm3.
Citric acid:|It indicates the amount of citric acid in the wine. It is also measured in g/dm3.
Residual sugar| It indicates the amount of sugar left in the wine after the fermentation process is done. It is also measured in g/dm3.
Free sulfur dioxide| It measures the amount of sulfur dioxide (SO2) in freeform. It is also measured in g/dm3. 
Total sulfur dioxide| It measures the total amount of SO2 in the wine. Thischemical works as an antioxidant and antimicrobial agent. 
Density| It indicates the density of the wine and is measured in g/dm3.
pH| It indicates the pH value of the wine. The range of value is between 0 to 14.0, which indicates very high acidity, and 14 indicates basic acidity. 
Sulphates| It indicates the amount of potassium sulphate in the wine. It is also measured in g/dm3.
Alcohol| It indicates the alcohol content in the wine. 
Quality| It indicates the quality of the wine, which is ranged from 1 to 10. Here, the higher the value is, the better the wine. 

**Importing the Libraries**

In [1]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns 
import plotly.express as px 
import plotly.graph_objects as go

**Importing the Dataset**

In [3]:
white_wine = pd.read_csv("https://raw.githubusercontent.com/mrblink2002/EDA_Wine_Quality/main/dataset/winequality_white.csv", delimiter=";")
white_wine.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.0,0.27,0.36,20.7,0.045,45.0,170.0,1.001,3.0,0.45,8.8,6
1,6.3,0.3,0.34,1.6,0.049,14.0,132.0,0.994,3.3,0.49,9.5,6
2,8.1,0.28,0.4,6.9,0.05,30.0,97.0,0.9951,3.26,0.44,10.1,6
3,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6
4,7.2,0.23,0.32,8.5,0.058,47.0,186.0,0.9956,3.19,0.4,9.9,6


In [4]:
red_wine = pd.read_csv("https://raw.githubusercontent.com/mrblink2002/EDA_Wine_Quality/main/dataset/winequality_red.csv", delimiter = ";")
red_wine.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


**Exploring & Examing the Dataset**

In [5]:
red_wine.columns

Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol', 'quality'],
      dtype='object')

In [6]:
white_wine.columns

Index(['fixed acidity', 'volatile acidity', 'citric acid', 'residual sugar',
       'chlorides', 'free sulfur dioxide', 'total sulfur dioxide', 'density',
       'pH', 'sulphates', 'alcohol', 'quality'],
      dtype='object')

**Descriptive Statistics**

In [8]:
red_wine.loc[100:110]

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
100,8.3,0.61,0.3,2.1,0.084,11.0,50.0,0.9972,3.4,0.61,10.2,6
101,7.8,0.5,0.3,1.9,0.075,8.0,22.0,0.9959,3.31,0.56,10.4,6
102,8.1,0.545,0.18,1.9,0.08,13.0,35.0,0.9972,3.3,0.59,9.0,6
103,8.1,0.575,0.22,2.1,0.077,12.0,65.0,0.9967,3.29,0.51,9.2,5
104,7.2,0.49,0.24,2.2,0.07,5.0,36.0,0.996,3.33,0.48,9.4,5
105,8.1,0.575,0.22,2.1,0.077,12.0,65.0,0.9967,3.29,0.51,9.2,5
106,7.8,0.41,0.68,1.7,0.467,18.0,69.0,0.9973,3.08,1.31,9.3,5
107,6.2,0.63,0.31,1.7,0.088,15.0,64.0,0.9969,3.46,0.79,9.3,5
108,8.0,0.33,0.53,2.5,0.091,18.0,80.0,0.9976,3.37,0.8,9.6,6
109,8.1,0.785,0.52,2.0,0.122,37.0,153.0,0.9969,3.21,0.69,9.3,5


In [9]:
red_wine.dtypes

fixed acidity           float64
volatile acidity        float64
citric acid             float64
residual sugar          float64
chlorides               float64
free sulfur dioxide     float64
total sulfur dioxide    float64
density                 float64
pH                      float64
sulphates               float64
alcohol                 float64
quality                   int64
dtype: object

In [11]:
red_wine.describe()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
count,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0,1599.0
mean,8.319637,0.527821,0.270976,2.538806,0.087467,15.874922,46.467792,0.996747,3.311113,0.658149,10.422983,5.636023
std,1.741096,0.17906,0.194801,1.409928,0.047065,10.460157,32.895324,0.001887,0.154386,0.169507,1.065668,0.807569
min,4.6,0.12,0.0,0.9,0.012,1.0,6.0,0.99007,2.74,0.33,8.4,3.0
25%,7.1,0.39,0.09,1.9,0.07,7.0,22.0,0.9956,3.21,0.55,9.5,5.0
50%,7.9,0.52,0.26,2.2,0.079,14.0,38.0,0.99675,3.31,0.62,10.2,6.0
75%,9.2,0.64,0.42,2.6,0.09,21.0,62.0,0.997835,3.4,0.73,11.1,6.0
max,15.9,1.58,1.0,15.5,0.611,72.0,289.0,1.00369,4.01,2.0,14.9,8.0


**Data Wrangling**

In [12]:
red_wine.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   fixed acidity         1599 non-null   float64
 1   volatile acidity      1599 non-null   float64
 2   citric acid           1599 non-null   float64
 3   residual sugar        1599 non-null   float64
 4   chlorides             1599 non-null   float64
 5   free sulfur dioxide   1599 non-null   float64
 6   total sulfur dioxide  1599 non-null   float64
 7   density               1599 non-null   float64
 8   pH                    1599 non-null   float64
 9   sulphates             1599 non-null   float64
 10  alcohol               1599 non-null   float64
 11  quality               1599 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 150.0 KB


In [None]:
plt.figure(figsize=(8,4), dpi = 100)
sns.