# Exploring Wine Data - Plotting pandas Datasets with Matplotlib

### Introduction

Wine has been produced for thousands of years. The earliest known traces of wine are from Georgia (c. 6000 BC), Iran (c. 5000 BC), and Sicily (c. 4000 BC) although there is evidence of a similar alcoholic drink being consumed earlier in China (c. 7000 BC) [Wikipedia](https://en.wikipedia.org/wiki/Wine)

In [None]:
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt

### Get the dataset
Data Set Information:

These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.
[Wine Data Set at UCL](https://archive.ics.uci.edu/ml/datasets/Wine)


In [None]:
wine_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data'
 
#define column headers
wine_column_headers = ['Alcohol','Malic acid','Ash','Alcalinity of ash', 'Magnesium','Total phenols','Flavanoids','Nonflavanoid phenols','Proanthocyanins','Color intensity','Hue','OD280/OD315 of diluted wines','Proline']
wine_df = pd.read_csv(wine_url, names = wine_column_headers)

In [None]:
wine_df.head()

In [None]:
wine_df.describe()

In [None]:
#figure
fig, ax1 = plt.subplots()
fig.set_size_inches(13, 10)

#labels
ax1.set_xlabel('Alcohol')
ax1.set_ylabel('Color Intensity')
ax1.set_title('Relationship Between Color Intensity and Alcohol Content in Wines')

#plot 
plt.scatter(wine_df['Alcohol'], wine_df['Color intensity'] , color='g', s = wine_df['Proline']*.5, alpha =0.5);
#plt.scatter(wine_df['Alcohol'], wine_df['Color intensity'] , c=wine_df['Color intensity'], cmap = 'gist_rainbow', s = wine_df['Proline']*.5, alpha =0.5)
#cbar = plt.colorbar()

In [None]:
corr = wine_df.corr()

In [None]:
corr['Hue']

In [None]:
wine_df[['Alcohol', 'Hue', 'Color intensity']].boxplot(notch=True);