# Exploratory Data Analysis on the Wine Quality Dataset with NUMPY

This project is an Exploratory Data Analysis on the Wine Quality Dataset. The Dataset contains 13 columns, including Id, fixed acidity, volatile acid, alcohol, quantity etc. 

This project is to master our skill with the Numpy Library. We will be implementing all neccessary Numpy functions such as importing data, index and slicing data, mean, max, standard dev etc.

#### Sit back, and enjoy 

## Import Library

In [2]:
import numpy as np

## Import Dataset

In [3]:
# import wine dataset into numpy
wines = np.genfromtxt('./WineQT.csv',
              comments="This wine quality dataset",
              delimiter=",", 
              skip_header=1,
              dtype=np.int32)

In [4]:
print("Wine Dataset: ", wines)

Wine Dataset:  [[   7    0    0 ...    9    5    0]
 [   7    0    0 ...    9    5    1]
 [   7    0    0 ...    9    5    2]
 ...
 [   6    0    0 ...   10    5 1594]
 [   5    0    0 ...   11    6 1595]
 [   5    0    0 ...   10    5 1597]]


## Understand the Data

In [5]:
print("The shape of the wine dataset is: ", wines.shape)
print("The dimension of the wine dataset is: ", wines.ndim)

The shape of the wine dataset is:  (1143, 13)
The dimension of the wine dataset is:  2


## Return the first 10 rows of the Dataset

In [6]:
# display the first 10 rows.

first_ten_rows = wines[0:11]

print("Display: \n", first_ten_rows)

Display: 
 [[ 7  0  0  1  0 11 34  0  3  0  9  5  0]
 [ 7  0  0  2  0 25 67  0  3  0  9  5  1]
 [ 7  0  0  2  0 15 54  0  3  0  9  5  2]
 [11  0  0  1  0 17 60  0  3  0  9  6  3]
 [ 7  0  0  1  0 11 34  0  3  0  9  5  4]
 [ 7  0  0  1  0 13 40  0  3  0  9  5  5]
 [ 7  0  0  1  0 15 59  0  3  0  9  5  6]
 [ 7  0  0  1  0 15 21  0  3  0 10  7  7]
 [ 7  0  0  2  0  9 18  0  3  0  9  7  8]
 [ 6  0  0  1  0 15 65  0  3  0  9  5 10]
 [ 5  0  0  1  0 16 59  0  3  0  9  5 12]]


## Return the 12th coloumn of the Dataset

In [10]:
# display the 12th column

twelth_col = wines[:, 11]
print("Display: \n", twelth_col)

Display: 
 [5 5 5 ... 5 6 5]


## Find the sum of all the elements  between Column 1 to 10

In [12]:
one_to_ten_col = wines[:,1:11].sum()

print("Summation: ", one_to_ten_col)

Summation:  87680


## Find the mean of the array

In [13]:
mean_of_wines = wines.mean()

print("Mean Value: ", mean_of_wines)

Mean Value:  68.86163268053032


In [14]:
sum_of_wine = wines.sum()
size_of_wine = wines.size

print("Mean Value: ", sum_of_wine/size_of_wine)

Mean Value:  68.86163268053032


## Find the standard deviation of the array

In [15]:
std_dev = wines.std()

print("Standard Deviation: ", std_dev)

Standard Deviation:  248.87134655790663


## Find the minimum value in the array

In [16]:
min_value = wines.min()
print("Minimum Value: ", min_value)

Minimum Value:  0


## Find the maximum value in the array

In [17]:
max_value = wines.max()
print("Maximum Value: ", max_value)

Maximum Value:  1597


## Find where the quality rating of the wine is higher than 5

In [26]:
quality_rating_col = wines[:,-2] # original column

quality_rating_greater_than_5 = quality_rating_col > 5 # boolean array

display = quality_rating_col[quality_rating_greater_than_5]

print("Original Column: ", quality_rating_col)
print("Quality greater than 5: ", display)

# find the total numbers of quality rating greater than 5
print("Total Numbers: ", display.size)


Original Column:  [5 5 5 ... 5 6 5]
Quality greater than 5:  [6 7 7 7 6 6 6 6 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 6 6 6 6 6 6 6 6 6
 6 6 6 6 6 7 6 7 7 6 6 6 6 6 6 6 7 6 6 6 6 6 6 7 7 6 6 6 6 7 8 6 6 6 6 6 6
 8 7 6 7 7 6 6 7 7 6 6 6 6 6 6 6 6 6 7 6 6 6 7 6 7 7 6 7 6 6 6 6 6 6 6 6 7
 7 6 6 7 7 7 6 6 6 7 6 6 6 8 6 7 6 6 6 6 7 6 7 6 6 7 7 7 6 6 6 7 6 6 6 8 7
 7 6 6 6 6 7 8 6 6 6 6 6 6 6 6 6 8 6 6 7 7 6 6 8 6 8 6 6 7 7 7 7 7 6 7 6 6
 7 7 6 6 6 6 6 6 6 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 7 8 6 6 6 6 6 6
 6 6 6 6 6 6 6 7 6 6 7 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
 6 6 6 6 6 6 6 6 6 6 6 6 6 7 6 7 6 6 7 6 6 6 7 7 6 7 7 7 6 6 6 6 6 7 7 7 6
 6 6 6 6 6 6 6 6 7 6 7 7 7 7 6 6 6 6 6 6 6 6 6 6 7 6 7 6 6 7 7 7 7 7 7 7 7
 7 7 6 7 6 6 6 6 7 7 6 6 6 6 6 6 6 6 7 7 7 7 7 7 6 6 6 7 6 6 6 6 7 6 6 7 6
 7 7 7 6 6 6 6 6 6 6 7 6 7 7 7 6 8 6 6 6 6 7 7 7 6 6 7 7 6 6 7 6 7 7 8 6 6
 7 6 6 6 6 6 6 7 6 6 7 6 6 6 6 6 6 8 6 7 6 6 7 7 7 6 6 6 6 6 6 6 6 6 7 6 6
 6 6 7 7 7 7 6 6 6 6 6 6 6 6 6 7 6 6 6 

## Find where the quality rating of the wine is equal to 10

In [28]:
def send_message(data):
    if len(data) == 0:
        print("No wine Quality Found")
    else:
        print(f"The total of {len(data)} quality rating is found greater than 10")
        

quality_rating_col = wines[:,-2] # original column

quality_rating_equal_10 = quality_rating_col == 10 # boolean array

display = quality_rating_col[quality_rating_equal_10]

print("Original Column: ", quality_rating_col)
print("Quality greater than 5: ", display)

# invoke function
send_message(display)



Original Column:  [5 5 5 ... 5 6 5]
Quality greater than 5:  []
No wine Quality Found


## Find where the quality rating is  greater than 5 and the alcohol rating is greater than 7

In [32]:
quality_greater_than_5_and_alcohol_greater_than_7 = (wines[:,-2] > 5) & (wines[:,-3] > 7)

wines[quality_greater_than_5_and_alcohol_greater_than_7]

array([[  11,    0,    0, ...,    9,    6,    3],
       [   7,    0,    0, ...,   10,    7,    7],
       [   7,    0,    0, ...,    9,    7,    8],
       ...,
       [   6,    0,    0, ...,   11,    6, 1592],
       [   6,    0,    0, ...,    9,    6, 1593],
       [   5,    0,    0, ...,   11,    6, 1595]])