# Percent change computation walkthrough
In this walkthrough, we're going to learn about computing new columns from existing columns. The ideal test of this is percent change. Percent change is a very common computation in data journalism, so knowing how to do it in Agate is important. As always, you start by importing Agate.

In [4]:
import warnings
warnings.filterwarnings('ignore')

In [5]:
import agate

Now get some data. We'll be using [county population estimates](https://www.dropbox.com/s/0n2ns9c90qjg2ch/population.csv?dl=0) from the Census Bureau.

In [6]:
counties = agate.Table.from_csv('../../Data/population.csv')

In [7]:
print(counties)

| column          | data_type |
| --------------- | --------- |
| STNAME          | Text      |
| CTYNAME         | Text      |
| POPESTIMATE2010 | Number    |
| POPESTIMATE2011 | Number    |
| POPESTIMATE2012 | Number    |
| POPESTIMATE2013 | Number    |
| POPESTIMATE2014 | Number    |
| POPESTIMATE2015 | Number    |
| POPESTIMATE2016 | Number    |



So the code for calculating a percent change is really quite easy. It's about the same as calculating a median or a mean. Instead of an aggregate, which works on the whole table column wise, we use compute, which works on the table row wise. Aggregate = single column. Compute = single row. Got it?

In [13]:
change = counties.compute([
    ('change10-16', agate.PercentChange('POPESTIMATE2010', 'POPESTIMATE2016')),
    ('change15-16', agate.PercentChange('POPESTIMATE2015', 'POPESTIMATE2016'))
])

And let's see what that looks like. 

In [14]:
change.print_table(max_rows=10)

| STNAME  | CTYNAME         | POPESTIMATE2010 | POPESTIMATE2011 | POPESTIMATE2012 | POPESTIMATE2013 | ... |
| ------- | --------------- | --------------- | --------------- | --------------- | --------------- | --- |
| Alabama | Autauga County  |          54,742 |          55,255 |          55,027 |          54,792 | ... |
| Alabama | Baldwin County  |         183,199 |         186,653 |         190,403 |         195,147 | ... |
| Alabama | Barbour County  |          27,348 |          27,326 |          27,132 |          26,938 | ... |
| Alabama | Bibb County     |          22,861 |          22,736 |          22,645 |          22,501 | ... |
| Alabama | Blount County   |          57,376 |          57,707 |          57,772 |          57,746 | ... |
| Alabama | Bullock County  |          10,892 |          10,722 |          10,654 |          10,576 | ... |
| Alabama | Butler County   |          20,938 |          20,848 |          20,665 |          20,330 | ... |
| Alabama | Calhoun County  

Oy. That's ugly. There's a handy little trick called select where we can only select the fields from the table we need to go on. In this case, we need a city, a state and the changes in violent crime and property crime. So we're going to create a new table only for the purposes of printing it out. 

In [15]:
for_printing = change.select(['STNAME', 'CTYNAME', 'change10-16', 'change15-16'])

In [16]:
for_printing.print_table(max_rows=10)

| STNAME  | CTYNAME         | change10-16 | change15-16 |
| ------- | --------------- | ----------- | ----------- |
| Alabama | Autauga County  |      1.231… |      0.692… |
| Alabama | Baldwin County  |     13.845… |      2.392… |
| Alabama | Barbour County  |     -5.057… |     -1.161… |
| Alabama | Bibb County     |     -0.954… |      0.363… |
| Alabama | Blount County   |      0.572… |      0.049… |
| Alabama | Bullock County  |     -4.866… |     -0.890… |
| Alabama | Butler County   |     -4.489… |     -0.636… |
| Alabama | Calhoun County  |     -3.256… |     -0.585… |
| Alabama | Chambers County |     -0.757… |     -0.587… |
| Alabama | Cherokee County |     -0.970… |     -0.004… |
| ...     | ...             |         ... |         ... |


Better. But lets sort things so it's more interesting.

In [17]:
sorted_counties = for_printing.order_by('change10-16', reverse=True)

In [18]:
sorted_counties.print_table(max_rows=20)

| STNAME       | CTYNAME            | change10-16 | change15-16 |
| ------------ | ------------------ | ----------- | ----------- |
| North Dakota | McKenzie County    |     97.265… |     -1.337… |
| North Dakota | Williams County    |     52.028… |     -2.967… |
| Texas        | Loving County      |     36.145… |     -1.739… |
| North Dakota | Mountrail County   |     32.668… |     -0.631… |
| Florida      | Sumter County      |     31.519… |      4.302… |
| Texas        | Hays County        |     29.214… |      5.086… |
| Utah         | Wasatch County     |     29.197… |      4.673… |
| North Dakota | Stark County       |     28.117… |     -2.925… |
| Iowa         | Dallas County      |     26.713… |      4.629… |
| Texas        | Kendall County     |     26.415… |      5.162… |
| Georgia      | Long County        |     25.584… |      3.584… |
| Texas        | Fort Bend County   |     25.541… |      3.837… |
| Georgia      | Forsyth County     |     25.026… |      4.188… |
| Florida 

Much better. Much much better. But not great. Let's round that number off. How to do that? Well, it's not as easy as you'd think for a lot of reasons.

To do this, we're going to have to write a function that rounds off the numbers. You do that like this:

In [21]:
from decimal import Decimal

def round_change(row):
    return row['change10-16'].quantize(Decimal('0.1'))

rounded_change = sorted_counties.compute([
    ('change_rounded', agate.Formula(agate.Number(), round_change))
])

So first things first, you import Decimal, a number library in Python that has some advantages over standard numbers. Then you create the function (`def round_change`) and the thing you pass into the function is the `row` -- which measn this function is going to get executed on each row of data. Functions have to return something, so you return `row['change'].quantize(Decimal('0.1'))` which says convert each change figure into a decimal with only one significant digit. The output looks like this:

In [20]:
rounded_change.print_table(max_rows=20)

| STNAME       | CTYNAME            | change10-16 | change15-16 | change_rounded |
| ------------ | ------------------ | ----------- | ----------- | -------------- |
| North Dakota | McKenzie County    |     97.265… |     -1.337… |           97.3 |
| North Dakota | Williams County    |     52.028… |     -2.967… |           52.0 |
| Texas        | Loving County      |     36.145… |     -1.739… |           36.1 |
| North Dakota | Mountrail County   |     32.668… |     -0.631… |           32.7 |
| Florida      | Sumter County      |     31.519… |      4.302… |           31.5 |
| Texas        | Hays County        |     29.214… |      5.086… |           29.2 |
| Utah         | Wasatch County     |     29.197… |      4.673… |           29.2 |
| North Dakota | Stark County       |     28.117… |     -2.925… |           28.1 |
| Iowa         | Dallas County      |     26.713… |      4.629… |           26.7 |
| Texas        | Kendall County     |     26.415… |      5.162… |           26.4 |
| Ge

That might seem like a lot of code, but you can reuse it. And most of it is boilerplate. 

# Assignment

1. Download [this dataset of voter registration totals](https://www.dropbox.com/s/ejaco2sv23uyuzf/registeredvoters.csv?dl=0) from the Nebraska Secretary of State. 
2. Calculate the percent change in total registered voters for every county from 2000 to 2016. 
3. Sort it fastest growing to fastest shrinking. Print it to the screen but limit it to 10.
4. Sort it fastest shrinking to fastest growing. Print it to the screen but limit it to 10.
5. Calculate the percent change in Republicans, Democrats and Independents for every county from 2000 to 2016.
6. Which counties are growing Republicans fastest? Which counties are growing Democrats fastest? Which counties are growing Non-Partisans fastest? 

Submit your analysis notebook to Canvas.