# 3. Joining data in pandas

In this notebook, we'll use pandas to join some relational data:
- `../data/country-codes.csv` -- a table of ISO country codes and country names
- `../data/country-population.csv` -- country population data from the U.N.

👉 [Read more about the `merge` method for joining dataframes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html)

In [None]:
import pandas as pd

When we read in the CSVs, we need to make sure that pandas doesn't parse the ISO codes as numbers, because we want to keep any leading zeroes. So in addition to the path to the CSV, we'll also use an argument called `dtype` to specify that the `code` columns need to be parsed as a string.

👉 You can find more information on the `dtype` argument [in the documentation for the `read_csv()` method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)

In [None]:
country_codes = pd.read_csv('../data/country-codes.csv', dtype={'code': str})

In [None]:
country_codes.head()

In [None]:
country_pop = pd.read_csv('../data/country-population.csv', dtype={'code': str})

In [None]:
country_pop.head()

### Join the data with the country codes lookup table

To join data in pandas, we can use the [`merge()`](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html) method. At minimum, you need to hand this method the two dataframes to join, plus specify the name of the column to join `on`. (If the columns have different names, you can use the `left_on` and `right_on` arguments -- the "left" dataframe is the first one you hand to the `merge` method.) 

In [None]:
merged = pd.merge(country_pop,
                  country_codes,
                  on='code')

In [None]:
merged.head()

### ✍️ Your turn

In the cells below, read in these two datasets and merge them:
- `../data/sdr-maintable.csv`: The main table of information from the Service Difficulty Reporting database maintained by the FAA.
- `../data/sdr-opcode.csv`: The lookup table that maps airline codes to airline names.

You'll want to join on the `OPCODE` column in the `sdr-maintable.csv` file and on the `CODE` column for the `sdr-opcode.csv` file, so you'll need to use the `left_on` and `right_on` arguments rather than `on`. Assign your newly joined dataframe to a new variable name.

Then:
- Select the columns you'd like to export to file
- Export the joined file to a CSV

### Joining on multiple columns

You can join on multiple columns, which can be useful when conducting an enterprise join to hunt for leads. Just pass in a list to the `on`/`left_on`/`right_on` arguments instead of a string, like this:

```python
merged = pd.merge(df1,
                  df2,
                  left_on=['lname', 'fname', 'zipcode'],
                  right_on=['last_name', 'first_name', 'zip'])

```