>>[<<Previous](mdlc.ipynb)    
>>[Index](index.ipynb)
## Merging Data
### [Merging DataFrames](#MD)
* [Merging company DataFrames](#McD)
* [Merging on a specific column](#Moasc)
* [Merging on columns with non-matching labels](#Mocwnl)
* [Merging on multiple columns](#Momc)


### [Joining DataFrames](#JD)
* [Joining by Index](#JbI)
* [Choosing a joining strategy](#Cajs)
* [Left & right merging on multiple columns](#L&rmomc)
* [Merging DataFrames with outer join](#MDwoj)

### [Ordered merges](#Om)
* [Using merge_ordered()](#Um)
* [Using merge_asof()](#Um)      


## Case Study - Summer Olympics
* [Medals in the Summer Olympics](#MitSO)
* [Loading Olympic edition DataFrame](#LOeD)
* [Loading IOC codes DataFrame](#LIcD)
* [Building medals DataFrame](#BmD)
* [Quantifying Performance](#QP)
* [Counting medals by country/edition in a pivot table](#Cmbciapt)
* [Computing fraction of medals per Olympic edition](#CfompOe)
* [Computing percentage change in fraction of medals won](#Cpcifomw)
* [Reshaping and plotting](#Rap)
* [Building hosts DataFrame](#BhD)
* [Reshaping for analysis](#Rfa)
* [Merging to compute influence](#Mtci)
* [Plotting influence of host country](#Piohc)
* [Final thoughts](#Ft)


In [1]:
import pandas as pd

In [52]:
revenue = pd.DataFrame.from_dict({ 
'city': ['Austin','Denver', 'SpringField', 'Mendocino' ],
'branch_id' :[10, 20, 30, 47],
'revenue' : [100, 83, 4, 200]
})

In [53]:
manager = pd.DataFrame.from_dict(
{
'city': ['Austin','Denver', 'SpringField', 'Mendocino' ],
'branch_id' :[10, 20, 30, 47],
	'manager':['Charles', 'Joel', 'Brett', 'Sally']
}	
)

<p id ='MD'><p>
### Merging DataFrames

<p id ='McD'><p>
### Merging company DataFrames

<p id ='Moasc'><p>
### Merging on a specific column

In [54]:
manager

Unnamed: 0,city,branch_id,manager
0,Austin,10,Charles
1,Denver,20,Joel
2,SpringField,30,Brett
3,Mendocino,47,Sally


In [55]:
revenue

Unnamed: 0,city,branch_id,revenue
0,Austin,10,100
1,Denver,20,83
2,SpringField,30,4
3,Mendocino,47,200


In [56]:
merge_by_city = pd.merge(manager, revenue, on= ['city'])
merge_by_city

Unnamed: 0,city,branch_id_x,manager,branch_id_y,revenue
0,Austin,10,Charles,10,100
1,Denver,20,Joel,20,83
2,SpringField,30,Brett,30,4
3,Mendocino,47,Sally,47,200


In [57]:
merge_by_id = pd.merge(manager, revenue, on= ['branch_id'])
merge_by_id

Unnamed: 0,city_x,branch_id,manager,city_y,revenue
0,Austin,10,Charles,Austin,100
1,Denver,20,Joel,Denver,83
2,SpringField,30,Brett,SpringField,4
3,Mendocino,47,Sally,Mendocino,200


In [58]:
merger = pd.merge(manager, revenue, on= ['city', 'branch_id'])
merger

Unnamed: 0,city,branch_id,manager,revenue
0,Austin,10,Charles,100
1,Denver,20,Joel,83
2,SpringField,30,Brett,4
3,Mendocino,47,Sally,200


<p id ='Mocwnl'><p>
### Merging on columns with non-matching labels

In [59]:
# lets mess manager data
manager=manager.rename(columns = {'city':'branch'})
manager

Unnamed: 0,branch,branch_id,manager
0,Austin,10,Charles
1,Denver,20,Joel
2,SpringField,30,Brett
3,Mendocino,47,Sally


In [60]:
revenue

Unnamed: 0,city,branch_id,revenue
0,Austin,10,100
1,Denver,20,83
2,SpringField,30,4
3,Mendocino,47,200


In [61]:
combined = pd.merge(manager, revenue, left_on='branch', right_on='city')
combined

Unnamed: 0,branch,branch_id_x,manager,city,branch_id_y,revenue
0,Austin,10,Charles,Austin,10,100
1,Denver,20,Joel,Denver,20,83
2,SpringField,30,Brett,SpringField,30,4
3,Mendocino,47,Sally,Mendocino,47,200


<p id ='Momc'><p>
### Merging on multiple columns

In [62]:
# Add 'state' column to revenue: revenue['state']
revenue['state'] = ['TX','CO','IL','CA']

# Add 'state' column to managers: managers['state']
manager['state'] = ['TX','CO','CA','MO']

In [63]:
revenue

Unnamed: 0,city,branch_id,revenue,state
0,Austin,10,100,TX
1,Denver,20,83,CO
2,SpringField,30,4,IL
3,Mendocino,47,200,CA


In [64]:
manager=manager.rename(columns = {'branch':'city'})
manager

Unnamed: 0,city,branch_id,manager,state
0,Austin,10,Charles,TX
1,Denver,20,Joel,CO
2,SpringField,30,Brett,CA
3,Mendocino,47,Sally,MO


In [65]:
combined = pd.merge(revenue, manager, on = ['city', 'state', 'branch_id'])
combined

Unnamed: 0,city,branch_id,revenue,state,manager
0,Austin,10,100,TX,Charles
1,Denver,20,83,CO,Joel


<p id ='JD'><p>
### Joining DataFrames

In [66]:
revenue = revenue.set_index('branch_id')
revenue

Unnamed: 0_level_0,city,revenue,state
branch_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
10,Austin,100,TX
20,Denver,83,CO
30,SpringField,4,IL
47,Mendocino,200,CA


In [67]:
manager = manager.set_index('branch_id')
manager

Unnamed: 0_level_0,city,manager,state
branch_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
10,Austin,Charles,TX
20,Denver,Joel,CO
30,SpringField,Brett,CA
47,Mendocino,Sally,MO


In [68]:
temp=manager.loc[30] 

In [69]:
manager.loc[30]=manager.loc[47] 

In [70]:
manager.loc[47]  =

city       Mendocino
manager        Sally
state             MO
Name: 47, dtype: object

<p id ='JbI'><p>
### Joining by Index

<p id ='Cajs'><p>
### Choosing a joining strategy

<p id ='L&rmomc'><p>
### Left & right merging on multiple columns

<p id ='MDwoj'><p>
### Merging DataFrames with outer join

<p id ='Om'><p>
### Ordered merges

<p id ='Um'><p>
### Using merge_ordered()

<p id ='Um'><p>
### Using merge_asof()

<p id ='MitSO'><p>
### Medals in the Summer Olympics

<p id ='LOeD'><p>
### Loading Olympic edition DataFrame

In [12]:
editions = pd.read_csv('./data/olympic/Summer Olympic medalists 1896 to 2008 - EDITIONS.tsv', delimiter='\t')

In [13]:
editions.head()

Unnamed: 0,Edition,Bronze,Gold,Silver,Grand Total,City,Country
0,1896,40,64,47,151,Athens,Greece
1,1900,142,178,192,512,Paris,France
2,1904,123,188,159,470,St. Louis,United States
3,1908,211,311,282,804,London,United Kingdom
4,1912,284,301,300,885,Stockholm,Sweden


In [21]:
editions = editions[['Edition', 'Grand Total', 'City', 'Country']]
editions.head()

Unnamed: 0,Edition,Grand Total,City,Country
0,1896,151,Athens,Greece
1,1900,512,Paris,France
2,1904,470,St. Louis,United States
3,1908,804,London,United Kingdom
4,1912,885,Stockholm,Sweden


<p id ='LIcD'><p>
### Loading IOC codes DataFrame

In [None]:
ioc_codes = pd.read_csv('./data/o')

<p id ='BmD'><p>
### Building medals DataFrame

<p id ='QP'><p>
### Quantifying Performance

<p id ='Cmbciapt'><p>
### Counting medals by country/edition in a pivot table

<p id ='CfompOe'><p>
### Computing fraction of medals per Olympic edition

<p id ='Cpcifomw'><p>
### Computing percentage change in fraction of medals won

<p id ='Rap'><p>
### Reshaping and plotting

<p id ='BhD'><p>
### Building hosts DataFrame

<p id ='Rfa'><p>
### Reshaping for analysis

<p id ='Mtci'><p>
### Merging to compute influence

<p id ='Piohc'><p>
### Plotting influence of host country

<p id ='Ft'><p>
### Final thoughts