# Option 1: Data Visualisation Review with Pandas and Matplotlib

In this notebook we shall use Python to create some plots and visualisations for the ["Wine" dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_wine.html).

The data is the results of a chemical analysis of wines grown in the same
region in Italy by three different cultivators. There are thirteen different
measurements taken for different constituents found in the three types of
wine.

![](images/wine.jpeg)

## Loading a dataset

The dataset is loaded from the `data/` folder

In [None]:
import pandas as pd
wine = pd.read_csv('data/wine.csv')
wine.head()

### Exercise: Pandas

Before we get started creating the visualisations we shall use Pandas to get a better understanding of the data we will be working with.

1. how many rows and columns are present in the data?

2. which data types are used by each column?

3. verify that there are no missing values

4. rename 'od280/od315_of_diluted_wines' into something more meaningful, e.g. protein concentration.

5. how many wines are present of each class?

6. what is the median color intensity for class 0?

7. Create separate DataFrames for each of the different classes

In [None]:
class_one = wine.loc[wine['class']==0]
class_two = ...
class_three = ...

In [None]:
# %load answers/classes.py

## Pandas plotting

As well as being a great tool for data wrangling, we can also use Pandas to create plots directly from a DataFrame.

a. Create a boxplot for color intensity

In [None]:
# %load answers/pandas_a.py

b. Create a histogram for color intensity

In [None]:
# %load answers/pandas_b.py

c. Create a scatter plot of alcohol against color intensity

In [None]:
# %load answers/pandas_c.py

d. Create a bar chart to illustrate how many wines are present from each class.

*Hint: use the `.value_counts()` method to count how many wines belong to each class.*

In [None]:
# %load answers/pandas_d.py

e. Create a Pie chart for color intensity

*Hint: use the `.value_counts()` method to count how many wines belong to each class.

In [None]:
# %load answers/pandas_e.py

# Matplotlib

In [None]:
import matplotlib.pyplot as plt

a. Change the style of the plot below so that it has red crosses as markers. Add a title, x label and y label too.

In [None]:
plt.plot(wine['alcohol'], wine['color_intensity'], 'o')

In [None]:
# %load answers/matplotlib_a.py

b. Create a subplot that contains *three* scatterplots of color intensity against alcohol percentage, one for each class. Control the figure size so that the plots are displayed nicely together.

In [None]:
# %load answers/matplotlib_b.py

c. Plot *three* scatterplots of color intensity against alcohol percentage, one for each class, on the same Matplotlib axes.

In [None]:
# %load answers/matplotlib_c.py