# Working with Data in Python using `pandas`

<img src="https://pandas.pydata.org/_static/pandas_logo.png">

## Example: World Population Growth

Read in the UN population data

In [None]:
import pandas as pd

data = pd.read_csv('Data/population.csv')

Print the data

In [None]:
data

Print just the first few rows of data.

In [None]:
data.head()

Print column names

In [None]:
data.columns

### Select columns

In [None]:
my_columns = ['Year','Series','Value']

data[my_columns].head()

Select data based on a matching criterion.

In [None]:
year = 2005

data[data['Year'] == year].head(n=20)

In [None]:
series = "Population mid-year estimates (millions)"

data[data['Series'] == series]

In [None]:
# We can contruct more complex matching criteria. Here we want all 
# the mid-year population estimates for Canada.
query = (data["Region/Country/Area"] == "Canada") & \
        (data["Series"] == "Population mid-year estimates (millions)")

data[query]

In [None]:
# We can contruct more complex matching criteria. Here we want all 
# the mid-year population estimates for Canada.
query = (data["Region/Country/Area"] == "Germany") & \
        (data["Series"] == "Population mid-year estimates (millions)")

data[query]

In [None]:
import pandas as pd

world = pd.read_csv('Data/world_population.csv')

In [None]:
world.head(n=20)

In [None]:
world = world[::-1]

In [None]:
high = world[world["Variant"] == "High"]
med  = world[world["Variant"] == "Medium"]
low  = world[world["Variant"] == "Low"]

### Plot the world population by year for the three scenarios

In [None]:
import matplotlib.pyplot as plt

# Get the data for each variant, store as arrays
years_h = high["Year(s)"].values
years_m = med["Year(s)"].values
years_l = low["Year(s)"].values

# Population in thousands, convert to billions
pop_h = high["Value"].values / 1.0e6
pop_m = med["Value"].values / 1.0e6
pop_l = low["Value"].values / 1.0e6

# Plot population against against years
plt.plot(years_l, pop_l)
plt.plot(years_m, pop_m)
plt.plot(years_h, pop_h)
plt.legend(["Low", "Medium", "High"])
plt.grid(True, alpha=0.3)

## Learn More

You can learn more about `pandas` by visiting the [homepage](https://pandas.pydata.org/).

For a 10-minute tutorial, read "[10 minutes to pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html)".