## Data Wrangling: Clean, Transform, Merge, Reshape 
##### - Much of the programming work in data analysis and modeling is spent in data preparation.
##### - That is, data loading, cleaning, transforming, and rearranging.
##### - This is discussed and demonstrated below.

In [4]:
# Importing the necessary libraries 

import pandas as pd 
import numpy as np 

### Performing DataFrame Merges

In [14]:
# Creating Pandas DataFrames 

df1 = pd.DataFrame(
    np.arange(12).reshape(4,3),
    index=list("abcd")
)

df2 = pd.DataFrame(
    np.arange(9).reshape(3,3),
    index=list("abc")
)

In [13]:
# In the below operating, the merging will take place on the overallaping column names 
# as the keys.
# It is, however, good practice to specify the column names where merging will take place

pd.merge(df1, df2)

Unnamed: 0,0,1,2
0,0,1,2
1,3,4,5
2,6,7,8


In [16]:
data_1 = pd.DataFrame(
    {
        'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
        "data": range(7)
    }
)

data_2 = pd.DataFrame(
    {
        'key': ['a', 'b', 'd'],
        "data": range(3)
    }
)

data_2

Unnamed: 0,key,data
0,a,0
1,b,1
2,d,2


In [17]:
data_1

Unnamed: 0,key,data
0,b,0
1,b,1
2,a,2
3,c,3
4,a,4
5,a,5
6,b,6


In [18]:
pd.merge(data_1, data_2, on="key")

Unnamed: 0,key,data_x,data_y
0,b,0,1
1,b,1,1
2,a,2,0
3,a,4,0
4,a,5,0
5,b,6,1


In [30]:
# Creating a Population DataFrame 

df1 = pd.DataFrame({
    "Country": ["America", "Indonesia", "France"],
    "Location": ["New York", "Jakarta", "Paris"],
    "Population": [738100, 575030, 183305]
})

df2 = pd.DataFrame({
    "Country": ["America", "America", "Indonesia", "India", "France", "Greece"],
    "Location": ["New York", "Chicago", "Jakarta", "Mumbai", "Paris", "Yunani"],
    "Income": [1000, 1500, 1400, 1100, 900, 1200]
})

df1

Unnamed: 0,Country,Location,Population
0,America,New York,738100
1,Indonesia,Jakarta,575030
2,France,Paris,183305


In [31]:
df2

Unnamed: 0,Country,Location,Income
0,America,New York,1000
1,America,Chicago,1500
2,Indonesia,Jakarta,1400
3,India,Mumbai,1100
4,France,Paris,900
5,Greece,Yunani,1200


In [33]:
# Specifying the merge column as the key

pd.merge(df1, df2, on="Country")

Unnamed: 0,Country,Location_x,Population,Location_y,Income
0,America,New York,738100,New York,1000
1,America,New York,738100,Chicago,1500
2,Indonesia,Jakarta,575030,Jakarta,1400
3,France,Paris,183305,Paris,900


##### - In a case where the merge columns are different in each DataFrame, 
#####   you can specify them as parameters separately

In [34]:
# Specifying the merge columns seperately as paramters 

pd.merge(df1, df2, left_on="Country", right_on="Country")

Unnamed: 0,Country,Location_x,Population,Location_y,Income
0,America,New York,738100,New York,1000
1,America,New York,738100,Chicago,1500
2,Indonesia,Jakarta,575030,Jakarta,1400
3,France,Paris,183305,Paris,900


##### By default, merge will performs an "inner" join. 
##### However, you can specify the type of join that you want to achieve as with databases "outer", "inner", "left", or "right" joins.
##### - Check out the examples below 

In [35]:
# Performing a left join

pd.merge(df1, df2, right_on="Country", left_on="Country", how="left")

Unnamed: 0,Country,Location_x,Population,Location_y,Income
0,America,New York,738100,New York,1000
1,America,New York,738100,Chicago,1500
2,Indonesia,Jakarta,575030,Jakarta,1400
3,France,Paris,183305,Paris,900


In [36]:
# Performing a right join 

pd.merge(df1, df2, right_on="Country", left_on="Country", how="right")

Unnamed: 0,Country,Location_x,Population,Location_y,Income
0,America,New York,738100.0,New York,1000
1,America,New York,738100.0,Chicago,1500
2,Indonesia,Jakarta,575030.0,Jakarta,1400
3,India,,,Mumbai,1100
4,France,Paris,183305.0,Paris,900
5,Greece,,,Yunani,1200
