## Introduction

In [1]:
import pandas as pd


In [2]:
happiness2015 = pd.read_csv("World_Happiness_2015.csv")

happiness2016 = pd.read_csv("World_Happiness_2016.csv")

happiness2017 = pd.read_csv("World_Happiness_2017.csv")

In [4]:
happiness2015.info()
happiness2016.info()
happiness2017.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 158 entries, 0 to 157
Data columns (total 12 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Country                        158 non-null    object 
 1   Region                         158 non-null    object 
 2   Happiness Rank                 158 non-null    int64  
 3   Happiness Score                158 non-null    float64
 4   Standard Error                 158 non-null    float64
 5   Economy (GDP per Capita)       158 non-null    float64
 6   Family                         158 non-null    float64
 7   Health (Life Expectancy)       158 non-null    float64
 8   Freedom                        158 non-null    float64
 9   Trust (Government Corruption)  158 non-null    float64
 10  Generosity                     158 non-null    float64
 11  Dystopia Residual              158 non-null    float64
dtypes: float64(9), int64(1), object(2)
memory usage: 1

In [5]:
happiness2015['Year'] = 2015
happiness2016['Year'] = 2016
happiness2017['Year'] = 2017

## Combining Dataframes with the Concat Function

In [6]:
head_2015 = happiness2015[['Country','Happiness Score', 'Year']].head(3)
head_2016 = happiness2016[['Country','Happiness Score', 'Year']].head(3)

In [8]:
concat_axis0 = pd.concat([head_2015, head_2016])
concat_axis0

Unnamed: 0,Country,Happiness Score,Year
0,Switzerland,7.587,2015
1,Iceland,7.561,2015
2,Denmark,7.527,2015
0,Denmark,7.526,2016
1,Switzerland,7.509,2016
2,Iceland,7.501,2016


In [9]:
concat_axis1 = pd.concat([head_2015, head_2016], axis=1)
concat_axis1

Unnamed: 0,Country,Happiness Score,Year,Country.1,Happiness Score.1,Year.1
0,Switzerland,7.587,2015,Denmark,7.526,2016
1,Iceland,7.561,2015,Switzerland,7.509,2016
2,Denmark,7.527,2015,Iceland,7.501,2016


In [10]:
question1 = concat_axis0.shape[0]
question2 = concat_axis1.shape[0]

## Combining Dataframes with the Concat Function Continued

![Jupyter](./Glue.svg)

In [11]:
head_2015 = happiness2015[['Year','Country','Happiness Score', 'Standard Error']].head(4)
head_2016 = happiness2016[['Country','Happiness Score', 'Year']].head(3)

In [12]:
head_2015 = happiness2015[['Year','Country','Happiness Score', 'Standard Error']].head(4)
head_2016 = happiness2016[['Country','Happiness Score', 'Year']].head(3)

concat_axis0 = pd.concat([head_2015, head_2016])

rows = concat_axis0.shape[0]

columns = concat_axis0.shape[1]

In [13]:
concat_axis0

Unnamed: 0,Year,Country,Happiness Score,Standard Error
0,2015,Switzerland,7.587,0.03411
1,2015,Iceland,7.561,0.04884
2,2015,Denmark,7.527,0.03328
3,2015,Norway,7.522,0.0388
0,2016,Denmark,7.526,
1,2016,Switzerland,7.509,
2,2016,Iceland,7.501,


## Combining Dataframes with Different Shapes Using the Concat Function

In [16]:
concat_update_index = pd.concat([head_2015, head_2016], ignore_index=True)
concat_update_index

Unnamed: 0,Year,Country,Happiness Score,Standard Error
0,2015,Switzerland,7.587,0.03411
1,2015,Iceland,7.561,0.04884
2,2015,Denmark,7.527,0.03328
3,2015,Norway,7.522,0.0388
4,2016,Denmark,7.526,
5,2016,Switzerland,7.509,
6,2016,Iceland,7.501,


## Joining Dataframes with the Merge Function

Note that unlike the `concat` function, the `merge` function only combines dataframes horizontally (axis=1) and can only combine two dataframes at a time. 

![Jupyter](./Merge_syntax.svg)

In [17]:
three_2015 = happiness2015[['Country','Happiness Rank','Year']].iloc[2:5]
three_2016 = happiness2016[['Country','Happiness Rank','Year']].iloc[2:5]

merged = pd.merge(three_2015, three_2016, on='Country')

In [18]:
merged

Unnamed: 0,Country,Happiness Rank_x,Year_x,Happiness Rank_y,Year_y
0,Norway,4,2015,4,2016


## Joining on Columns with the Merge Function

There are actually four different types of joins:

* **Inner**: only includes elements that appear in both dataframes with a common key
* **Outer**: includes all data from both dataframes
* **Left**: includes all of the rows from the "left" dataframe along with any rows from the "right" dataframe with a common key; the result retains all columns from both of the original dataframes
* **Right**: includes all of the rows from the "right" dataframe along with any rows from the "left" dataframe with a common key; the result retains all columns from both of the original dataframes

In [21]:
three_2015 = happiness2015[['Country','Happiness Rank','Year']].iloc[2:5]
three_2016 = happiness2016[['Country','Happiness Rank','Year']].iloc[2:5]

merged = pd.merge(left=three_2015, right=three_2016, on='Country')
merged

Unnamed: 0,Country,Happiness Rank_x,Year_x,Happiness Rank_y,Year_y
0,Norway,4,2015,4,2016


In [22]:
merged_left = pd.merge(left=three_2015, right=three_2016, on='Country', how='left')
merged_left

Unnamed: 0,Country,Happiness Rank_x,Year_x,Happiness Rank_y,Year_y
0,Denmark,3,2015,,
1,Norway,4,2015,4.0,2016.0
2,Canada,5,2015,,


In [24]:
merged_left_updated = pd.merge(left=three_2016, right=three_2015, on='Country', how='left')
merged_left_updated

Unnamed: 0,Country,Happiness Rank_x,Year_x,Happiness Rank_y,Year_y
0,Iceland,3,2016,,
1,Norway,4,2016,4.0,2015.0
2,Finland,5,2016,,


## Left Joins with the Merge Function

In summary, 

* we'd use a left join when we don't want to drop any data from the left dataframe.

* Note that a right join works the same as a left join, except it includes all of the rows from the "right" dataframe. 

In [25]:
three_2015 = happiness2015[['Country','Happiness Rank','Year']].iloc[2:5]
three_2016 = happiness2016[['Country','Happiness Rank','Year']].iloc[2:5]
merged = pd.merge(left=three_2015, right=three_2016, how='left', on='Country')
merged_updated = pd.merge(left=three_2016, right=three_2015, how = 'left', on='Country')

In [28]:
merged_suffixes = pd.merge(left=three_2015, right=three_2016, how='left', on='Country', suffixes=('_2015', '_2016'))
merged_suffixes

Unnamed: 0,Country,Happiness Rank_2015,Year_2015,Happiness Rank_2016,Year_2016
0,Denmark,3,2015,,
1,Norway,4,2015,4.0,2016.0
2,Canada,5,2015,,


In [30]:
merged_updated_suffixes = pd.merge(left=three_2016, right=three_2015, how = 'left', on='Country', suffixes=('_2016', '_2015'))
merged_updated_suffixes

Unnamed: 0,Country,Happiness Rank_2016,Year_2016,Happiness Rank_2015,Year_2015
0,Iceland,3,2016,,
1,Norway,4,2016,4.0,2015.0
2,Finland,5,2016,,


## Join on Index with the Merge Function

In [31]:
import pandas as pd
four_2015 = happiness2015[['Country','Happiness Rank','Year']].iloc[2:6]
three_2016 = happiness2016[['Country','Happiness Rank','Year']].iloc[2:5]
merge_index = pd.merge(left = four_2015,right = three_2016, left_index = True, right_index = True, suffixes = ('_2015','_2016'))

In [32]:
four_2015

Unnamed: 0,Country,Happiness Rank,Year
2,Denmark,3,2015
3,Norway,4,2015
4,Canada,5,2015
5,Finland,6,2015


In [33]:
three_2016

Unnamed: 0,Country,Happiness Rank,Year
2,Iceland,3,2016
3,Norway,4,2016
4,Finland,5,2016


In [34]:
merge_index

Unnamed: 0,Country_2015,Happiness Rank_2015,Year_2015,Country_2016,Happiness Rank_2016,Year_2016
2,Denmark,3,2015,Iceland,3,2016
3,Norway,4,2015,Norway,4,2016
4,Canada,5,2015,Finland,5,2016


In [35]:
merge_index_left = pd.merge(left = four_2015,right = three_2016, how= 'left', left_index = True, right_index = True, suffixes = ('_2015','_2016'))

In [36]:
merge_index_left

Unnamed: 0,Country_2015,Happiness Rank_2015,Year_2015,Country_2016,Happiness Rank_2016,Year_2016
2,Denmark,3,2015,Iceland,3.0,2016.0
3,Norway,4,2015,Norway,4.0,2016.0
4,Canada,5,2015,Finland,5.0,2016.0
5,Finland,6,2015,,,


## Challenge: Combine Data and Create a Visualization

<tbody><tr>
<th></th>
<th><span style="font-weight:bold">pd.concat()</span></th>
<th><span style="font-weight:bold">pd.merge()</span></th>
</tr>
<tr>
<td><span style="font-style:normal">Default Join Type</span></td>
<td><span style="font-weight:300;font-style:normal">Outer</span></td>
<td><span style="font-weight:300;font-style:normal">Inner</span></td>
</tr>
<tr>
<td><span style="font-style:normal">Can Combine More Than Two Dataframes at a Time?</span></td>
<td><span style="font-weight:300;font-style:normal">Yes</span></td>
<td><span style="font-weight:300;font-style:normal">No</span></td>
</tr>
<tr>
<td><span style="font-style:normal">Can Combine Dataframes Vertically</span><br><span style="font-style:normal">(axis=0) or Horizontally (axis=1)?</span><br></td>
<td><span style="font-weight:300;font-style:normal">Both</span></td>
<td><span style="font-weight:300;font-style:normal">Horizontally</span></td>
</tr>
<tr>
<td>Syntax</td>
<td><span style="font-weight:bold">Concat (Vertically)</span><br>concat([df1,df2,df3])<br><br><span style="font-weight:bold">Concat (Horizontally)</span><br>concat([df1,df2,df3], axis = 1)<br><br><br><br></td>
<td><span style="font-weight:bold">Merge (Join on Columns)</span><br>merge(left = df1, right = df2, how = 'join_type', on = 'Col')<br><br><span style="font-weight:bold">Merge (Join on Index)</span><br>merge(left = df1, right = df2, how = 'join_type', left_index = True, right_index = True)<br><br><br><br></td>
</tr>
</tbody>

In [37]:
happiness2017.rename(columns={'Happiness.Score': 'Happiness Score'}, inplace=True)

In [40]:
combined = pd.concat([happiness2015, happiness2016, happiness2017], ignore_index=True)

In [41]:
combined

Unnamed: 0,Country,Region,Happiness Rank,Happiness Score,Standard Error,Economy (GDP per Capita),Family,Health (Life Expectancy),Freedom,Trust (Government Corruption),...,Year,Lower Confidence Interval,Upper Confidence Interval,Happiness.Rank,Whisker.high,Whisker.low,Economy..GDP.per.Capita.,Health..Life.Expectancy.,Trust..Government.Corruption.,Dystopia.Residual
0,Switzerland,Western Europe,1.0,7.587,0.03411,1.39651,1.349510,0.94143,0.665570,0.41978,...,2015,,,,,,,,,
1,Iceland,Western Europe,2.0,7.561,0.04884,1.30232,1.402230,0.94784,0.628770,0.14145,...,2015,,,,,,,,,
2,Denmark,Western Europe,3.0,7.527,0.03328,1.32548,1.360580,0.87464,0.649380,0.48357,...,2015,,,,,,,,,
3,Norway,Western Europe,4.0,7.522,0.03880,1.45900,1.330950,0.88521,0.669730,0.36503,...,2015,,,,,,,,,
4,Canada,North America,5.0,7.427,0.03553,1.32629,1.322610,0.90563,0.632970,0.32957,...,2015,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
465,Rwanda,,,3.471,,,0.945707,,0.581844,,...,2017,,,151.0,3.543030,3.398970,0.368746,0.326425,0.455220,0.540061
466,Syria,,,3.462,,,0.396103,,0.081539,,...,2017,,,152.0,3.663669,3.260331,0.777153,0.500533,0.151347,1.061574
467,Tanzania,,,3.349,,,1.041990,,0.390018,,...,2017,,,153.0,3.461430,3.236570,0.511136,0.364509,0.066035,0.621130
468,Burundi,,,2.905,,,0.629794,,0.059901,,...,2017,,,154.0,3.074690,2.735310,0.091623,0.151611,0.084148,1.683024


In [42]:
pivot_table_combined = pd.pivot_table(combined, values=['Happiness Score'], index=['Year'])

In [43]:
pivot_table_combined

Unnamed: 0_level_0,Happiness Score
Year,Unnamed: 1_level_1
2015,5.375734
2016,5.382185
2017,5.354019
