# Pandas II: Apply Functions, Creating New Columns, Concatenate and Merge

![hello](http://media.npr.org/assets/img/2010/12/09/panda_wave_wide-d933605fea3559c3fcaec7925f5c437b03895c18-s6-c30.jpg)

By the end of this lesson students will be able to:
- Demonstrate using df.apply()
- Demonstrate concatenate in pandas
- Demonstrate merge/join in pandas
- Describe the difference between inner, outer, left and right join

Outline:

| Lesson | Time |
|:---------------------------------- | :------------|
| Review pandas apply function | 10 minutes|
| Review importing/exploring data in pandas | 10 minutes |
| Introduce Concatenate| 10 minutes |
| Introduce Merge | 20 minutes |

# Apply

The .apply function takes a function and applies it across a specified column.  The syntax is:

    df['column'].apply(fucntion)
    or
    df.column.apply(function)
    
Let's practice this with a real dataframe

In [None]:
# first, import the csv located at ('https://raw.githubusercontent.com/plotly/datasets/master/2014_usa_states.csv') 
import pandas as pd


In [None]:
# show the first 5 rows of the dataframe to see what it contains



In [None]:
# Let's write a function to apply across the 'pop' column in the dataframe. Takes some input x, 
# does an operation to x, returns the altered x.




In [None]:
# Now we can apply the function to the column we want



In [None]:
# We can even assign the results to a new column


### Could I do the same thing with a lambda function?

In [None]:
# try it!

### What about a list comprehension?

In [None]:
#try it!

### For this lesson we'll be using three different dataframes. We already imported the first one. Let's import the other two.

In [None]:
airports = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2011_february_us_airport_traffic.csv')
agriculture = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/2011_us_ag_exports.csv')

In [None]:
# Show the first few columns of airports


In [None]:
# Show the first few columns of agriculture


In [None]:
# What does the category column describe - how can I determine what is in that column?


## Using Lambda functions on dataframes

In [None]:
#we won't use the category column, but let's walk through using lambdas to fix it (hint: we can use .replace())


In [None]:
#check the unique values again to make sure it worked!


In [None]:
# Let's check the state values and make sure there's 50 of them


# Concatenating and Merging Dataframes

## 1. Concatenate

Concatenate is used to add one dataframe to another, similar to appending. 

![Concat Image](https://pandas.pydata.org/pandas-docs/stable/_images/merging_concat_keys.png)

You can add multiple dataframes either on the column or row axis by setting axis to 1(columns) or 0 (rows)


I'll split the states dataframe in half so we can practice concatenating

In [None]:
states1 = states[0:25]
states2 = states[25:]

Let's check out the documentation to see how we can put them back together again: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html

In [None]:
## put the dataframes back together



**Check for understanding** What would the results be if I set axis=1?

## 2. Merging

Merging dataframes combines dataframes on common columns.

![merge](https://pandas.pydata.org/pandas-docs/stable/_images/merging_concat_axis1.png)

**Check for understanding** Work with a neighbor: Look at the pandas documentation for merge and identify and describe the arguments it takes.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html


Take a look at the dataframes above. What common key column could I use to merge two of them?

In [None]:
## Merge two of the dataframes



In [None]:
## Merge the remaining two dataframes



In [None]:
## Let's see how they look combined. Describe what happened.



### Inner, Outer, Left and Right Joins

Pandas can merge dataframes in different ways. The types of merges may be familiar to those of you who have worked in SQL

![Merge Types](http://www.shanelynn.ie/wp-content/uploads/2017/03/join-types-merge-names.jpg)

Let's try a couple of these with some simple dataframes to explore how they might be different.  These examples are taken from Chris Albon's excellent tutorial on merging in pandas (https://chrisalbon.com/python/pandas_join_merge_dataframe.html)

In [None]:
#first, let's create a few dataframes

raw_data = {
        'subject_id': ['1', '2', '3', '4', '5'],
        'first_name': ['Alex', 'Amy', 'Allen', 'Alice', 'Ayoung'],
        'last_name': ['Anderson', 'Ackerman', 'Ali', 'Aoni', 'Atiches']}
df_a = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name'])
df_a

In [None]:
raw_data = {
        'subject_id': ['4', '5', '6', '7', '8'],
        'first_name': ['Billy', 'Brian', 'Bran', 'Bryce', 'Betty'],
        'last_name': ['Bonder', 'Black', 'Balwner', 'Brice', 'Btisan']}
df_b = pd.DataFrame(raw_data, columns = ['subject_id', 'first_name', 'last_name'])
df_b

In [None]:
raw_data = {
        'subject_id': ['1', '2', '3', '4', '5', '7', '8', '9', '10', '11'],
        'test_id': [51, 15, 15, 61, 16, 14, 15, 1, 61, 16]}
df_n = pd.DataFrame(raw_data, columns = ['subject_id','test_id'])
df_n

## Merge with Outer Join

Full outer join produces the set of all records in Table A and Table B, with matching records from both sides where available. If there is no match, the missing side will contain null.

In [None]:
pd.merge(df_a, df_b, on='subject_id', how='outer')

The outer join combined all of the columns in the dataframes, that had matching keys, and filled in mull values where there was no data.


## Merge with Inner Join

In [None]:
pd.merge(df_a, df_b, on='subject_id', how='inner')

Only merged the keys that were in common between the two dataframes

## Merge with Right Join

In [None]:
pd.merge(df_a, df_b, on='subject_id', how='right')


Right join produces all of the records from table b with any matching records from table a. 

## Pandas Merged!
![pandas merged](https://media3.s-nbcnews.com/j/streams/2013/September/130923/4B9132043-tdy-130923-panda-baby-03.today-inline-large.jpg)

## Resouces:
 

http://www.shanelynn.ie/merge-join-dataframes-python-pandas-index-1/

https://chrisalbon.com/python/pandas_join_merge_dataframe.html

https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/   

https://www.analyticsvidhya.com/blog/2016/01/12-pandas-techniques-python-data-manipulation/