# Plotting

### Questions
- How can I plot my data?
- How can I save my plot for publishing?

### Objectives
- Create a time series plot showing a single data set.
- Create a scatter plot showing relationship between two data sets.

## `matplotlib` is the most widely used scientific plotting library in Python

- The Jupyter Notebook will render plots inline if we ask it to using a “magic” command.
- Commonly use a sub-library called `matplotlib.pyplot` is often imported as alias `plt`.

In [None]:
%matplotlib inline

In [None]:
import matplotlib.pyplot as plt

## Simple plots can be created using 2 series of data (x and y axis)

In [None]:
time = [0, 1, 2, 3]
position = [0, 100, 200, 300]

plt.plot(time, position)

## we can then use `plt.xlabel` and `plt.ylabel` to label our x and y axis

## Plot data directly from a Pandas dataframe.

- We can also plot Pandas dataframes.
- This implicitly uses `matplotlib.pyplot`.

In [None]:
import pandas as pd

data = pd.read_csv('data/gapminder_gdp_oceania.csv', index_col='country')

In [None]:
# Plot GDP per capita for Australia per year


In [None]:
# Plot GDP per capita for New Zealand per year


**Exercise**: Plot them together in the same figure

In [None]:
# Extract year from last 4 characters of each column name
# The current column names are structured as 'gdpPercap_(year)', 
# so we want to keep the (year) part only for clarity when plotting GDP vs. years
# To do this we use strip(), which removes from the string the characters stated in the argument
# This method works on strings, so we call str before strip()

years = data.columns.str.strip('gdpPercap_')
print(years)

In [None]:
# Convert year values to integers, saving results back to dataframe

data.columns = years.astype(int)

In [None]:
# Plot GDP per capita for Australia and New Zealand per year


## Select and transform data, then plot it

- By default, DataFrame.plot plots with the rows as the X axis.
- We can transpose the data in order to plot multiple series using method `.T`.

**Exercise**: Label x and y axis

In [None]:
data.T.plot()

## Many styles of plot are available.

For example, do a bar plot using a fancier style.

In [None]:
plt.style.use('ggplot')

# bar plot
data.T.plot(kind='bar')

# add label
plt.ylabel('GDP per capita')

## Data can also be plotted by calling the matplotlib `plot` function directly

- The command is `plt.plot(x, y)`
- The color / format of markers can also be specified as an optical argument: e.g. ‘b-‘ is a blue line, ‘g–’ is a green dashed line.

In [None]:
years = data.columns
gdp_australia = data.loc['Australia']

plt.plot(years, gdp_australia, 'g--')

## Can plot many sets of data together

In [None]:
# Select two countries' worth of data.

gdp_australia = data.loc['Australia']
gdp_nz = data.loc['New Zealand']

In [None]:
# Plot with differently-colored markers.
plt.plot(years, gdp_australia, 'b-', label='Australia')
plt.plot(years, gdp_nz, 'g-', label='New Zealand')

# Add legend
plt.legend()

# Add labels
plt.xlabel('Year')
plt.ylabel('GDP per capita ($)')

In [None]:
# Defining location for the legend

plt.legend(loc='upper left')

**Exercise**
- Plot a scatter plot correlating the GDP of Australia and New Zealand
- Use either `plt.scatter` or `DataFrame.plot.scatter`

In [None]:
plt.scatter(gdp_australia, gdp_nz)

In [None]:
data.T.plot.scatter(x = 'Australia', y = 'New Zealand')

## Minima and Maxima

**Exercise**

1) Fill in the blanks below to plot the minimum GDP per capita over time for all the countries in Europe.

```
data_europe = pd.read_csv('data/gapminder_gdp_europe.csv', index_col='country')
data_europe.____.plot(label='min')
plt.legend(loc='best')
plt.xticks(rotation=90)
```

**Exercise**

2) Modify the code above to plot both the minimum and maximum GDP per capita over time for Europe.

## Correlations

**Exercise**: Create a scatter plot showing the relationship between the minimum and maximum GDP per capita among the countries in Asia for each year in the data set. (*Hint: use the function `.describe()` and rotate/transpose your table to get the min and max columns*)

You can adjust marker size by providing value to `s` while plotting your scatter plot.

**Exercise: More correlations**

This short program creates a plot showing the correlation between GDP and life expectancy for 2007, normalizing marker size by population:

In [None]:
data_all = pd.read_csv('data/gapminder_all.csv', index_col='country')
data_all.plot(kind='scatter', x='gdpPercap_2007', y='lifeExp_2007',
              s=data_all['pop_2007']/1e6)

**Discuss in pair**: what each argument in `.plot()` function does?

A good place to look is the documentation for the plot function - `help(data_all.plot)`.

## Saving your plot to a file

```plt.savefig(<filename>)```

In [None]:
data.plot(kind='bar')

# save your bar plot


## Making your plots accessible

- Always make sure your text is large enough to read. Use the fontsize parameter in xlabel, ylabel, title, and legend, and tick_params with labelsize to increase the text size of the numbers on your axes.
- Similarly, you should make your graph elements easy to see. Use s to increase the size of your scatterplot markers and linewidth to increase the sizes of your plot lines.
- Using color (and nothing else) to distinguish between different plot elements will make your plots unreadable to anyone who is colorblind, or who happens to have a black-and-white office printer. For lines, the linestyle parameter lets you use different types of lines. For scatterplots, marker lets you change the shape of your points. If you’re unsure about your colors, you can use Coblis or Color Oracle to simulate what your plots would look like to those with colorblindness.