# Basic Plotting with `pandas` -- Exercises

## Goal

Practice making basic plots from `DataFrames`

## Exercises

### 0. Import `pandas` and `matplotlib.pyplot`, then load the gapminder data set

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# This is actually a tab separated file (not comma); still use read_csv, but specify the tab separator
gapminder = pd.read_csv("https://raw.githubusercontent.com/jennybc/gapminder/master/inst/extdata/gapminder.tsv", sep="\t")

### 1. Make a line plot of the population of the United States versus year

Hint: Can you guess the name of this plot kind?  Its arguments are like those for `scatter`.

In [None]:
# We can use a subsetting operation followed by a plot command
gapminder[gapminder["country"] == "United States"].plot.line("year", "pop")

### 2. What does the distribution of life expectancy look for the year 1952?

Hint: you can use a `bins=NUMBER_OF_BINS` argument in the plot command to change the number of bins in a histogram

In [None]:
gapminder.loc[gapminder["year"] == 1952, "lifeExp"].plot.hist(bins=20)

### 3. Make a plot that compares the distribution of life expectancy for 1952 and 2007.  What does this visualization tell you about how things have changed across this time span?

In [None]:
# first subset to get just the years 1952 and 2007
dat_1952_2007 = gapminder[gapminder["year"].isin([1952, 2007])]

# next pivot the data to get the years along the columns with life expectancy as the values
dat_1952_2007.pivot(columns="year", values="lifeExp").plot.hist(bins = 20, alpha = 0.5)

### 4. Save the plot you created in 3. to a .png file.  Make sure the axes are properly labeled.

In [None]:
fig, axs = plt.subplots(figsize=(12, 6))
dat_1952_2007.pivot(columns="year", values="lifeExp").plot.hist(ax=axs, bins = 20, alpha = 0.5)
axs.set_xlabel("Life Expectancy (years)")
fig.savefig("life_exp_dist_1952_vs_2007.png")