# Join walkthrough
In databases, when we have two tables that are related to each other by a common element, then our database is called a relational database. When all of our data is in a single file, that's called a flat file. 

Often, in data, we have one set of information stored in a table over here, and another set of information stored in a table over here. At the university, your student records are scattered in tables all over. Somewhere, there is a master student record, that has your name, birthdate, ID number, home address and other basic info. Then, over in the registrars office, we have the classes you took and the grades you received. Over here, we have the bursars office, which shows how much you owe in tuition and how much you've paid. If we wanted to get a single table together that showed how much you paid for each grade you got, we'd have to JOIN them together somehow. 

In Agate, like SQL, it's called a join. So let's do that. I've got two datasets that I want to join together and calculate a new number from something in both. 

Here is the question we're trying to answer: What Nebraska city has seen the largest growth in taxable sales since The Great Recession. 

In [1]:
import agate

import warnings
warnings.filterwarnings('ignore')

Now we need to import our two tables.

In [6]:
taxes2015 = agate.Table.from_csv('../../Data/taxes2015.csv')
print(taxes2015)
print(len(taxes2015.rows))

| column             | data_type |
| ------------------ | --------- |
| Year               | Number    |
| County             | Text      |
| City               | Text      |
| Net Taxable Sales  | Number    |
| Nebraska Sales Tax | Number    |

458


In [8]:
taxes2016 = agate.Table.from_csv('../../Data/taxes2016.csv')
print(taxes2016)
print(len(taxes2016.rows))

| column             | data_type |
| ------------------ | --------- |
| Year               | Number    |
| County             | Text      |
| City               | Text      |
| Net Taxable Sales  | Number    |
| Nebraska Sales Tax | Number    |

456


Join syntax could not be easier. It's create a new table, and set it equal to the table you want to start with dot join and then it's the table you want to join to your starting table, and then the fields you're going to join on, starting with your original table and then your second table. In my case, I created fields in the dataset that merged the name of the city. We can get away with it because there aren't two towns with the same name in the dataset. 

In [9]:
taxes = taxes2016.join(taxes2015, 'City', 'City', inner=True)

In [15]:
print(taxes)
print(len(taxes.rows))
taxes.print_table(max_rows=10)

| column              | data_type |
| ------------------- | --------- |
| Year                | Number    |
| County              | Text      |
| City                | Text      |
| Net Taxable Sales   | Number    |
| Nebraska Sales Tax  | Number    |
| Year2               | Number    |
| County2             | Text      |
| Net Taxable Sales2  | Number    |
| Nebraska Sales Tax2 | Number    |

452
|  Year | County   | City       | Net Taxable Sales | Nebraska Sales Tax | Year2 | ... |
| ----- | -------- | ---------- | ----------------- | ------------------ | ----- | --- |
| 2,016 | Adams    | Ayr        |            35,187 |           1,935.32 | 2,015 | ... |
| 2,016 | Adams    | Hastings   |       370,623,979 |      20,408,258.47 | 2,015 | ... |
| 2,016 | Adams    | Holstein   |         1,417,719 |          77,974.75 | 2,015 | ... |
| 2,016 | Adams    | Juniata    |         3,974,707 |         218,609.24 | 2,015 | ... |
| 2,016 | Adams    | Kenesaw    |         3,387,844 |         186

Note some cities dropped out. That'll be because of reporting problems or changes between the reports. If we were doing a story, we'd investigate and figure out what happened and if we could fix it.

But for purposes of this assignment, the rest is stuff you've done. We'll calculate a percent change, sort it and print it. 

In [16]:
change = taxes.compute([
    ('taxable_change', agate.PercentChange('Net Taxable Sales2', 'Net Taxable Sales')),
    ('salestax_change', agate.PercentChange('Nebraska Sales Tax2', 'Nebraska Sales Tax'))        
])

In [17]:
sorted_change = change.order_by('taxable_change', reverse=True)

In [18]:
for_printing = sorted_change.select(['City', 'taxable_change'])

In [19]:
for_printing.print_table()

| City         | taxable_change |
| ------------ | -------------- |
| Harrisburg   |       652.701… |
| Sparks       |       157.235… |
| Dixon        |       144.814… |
| Taylor       |        75.241… |
| Venango      |        71.804… |
| Winnebago    |        54.881… |
| Farwell      |        53.931… |
| Kennard      |        52.851… |
| Rulo         |        44.586… |
| Linwood      |        43.760… |
| Ohiowa       |        43.436… |
| Allen        |        42.461… |
| Chester      |        40.340… |
| Malcolm      |        40.131… |
| Ithaca       |        36.240… |
| Blue Springs |        33.589… |
| Danbury      |        30.319… |
| Kilgore      |        29.870… |
| Avoca        |        28.638… |
| Waverly      |        27.905… |
| ...          |            ... |


## Assignment for Tuesday

There are two data files: [frl15](https://www.dropbox.com/s/glbasqy9sitqql4/frl15.csv?dl=0) and [frl16](https://www.dropbox.com/s/z1xfqh5aila13zf/frl16.csv?dl=0). They are the Free and Reduced Lunch participation totals for every school in Nebraska. I want you to join them together into a single table and calculate the percent change from 2015 to 2016 and sort them by the largest change. Which school in Nebraska saw the largest increase in participation in free and reduced school lunches, which is a proxy for poverty? 

To complete this assignment, you are going to have to join the dataset together. There is a field called `codistsch` that is a unique identifier for the school. The problem? Agate is going to interpret it as a number. It is not. So you're going to have to implement [overriding Agate's type inferences](http://agate.readthedocs.io/en/1.6.0/cookbook/create.html#override-type-inference), which is not hard.