<img src="https://pandas.pydata.org/static/img/pandas.svg" width="250">

## <center> Merging DataFrames

+ **df.merge()** : Join (Left, Right, Inner)
+ **pd.concat()**: Union => Fatter or Taller
+ **df.append()**
+ **df.join()**: Join

In [1]:
import pandas as pd

In [2]:
df1 = pd.DataFrame({
    'letter': ['A', 'B', 'C', 'D'],
    'number': [1, 2, 3, 4]
})


df2 = pd.DataFrame({
    'letter': ['C', 'D', 'E', 'F'],
    'number': [3, 4, 5, 6]
})

In [3]:
df1

Unnamed: 0,letter,number
0,A,1
1,B,2
2,C,3
3,D,4


In [4]:
df2

Unnamed: 0,letter,number
0,C,3
1,D,4
2,E,5
3,F,6


-------

#  <b> Left Join
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f6/SQL_Join_-_01_A_Left_Join_B.svg/330px-SQL_Join_-_01_A_Left_Join_B.svg.png">

In [5]:
df1.merge(df2, how='left', on='number')

Unnamed: 0,letter_x,number,letter_y
0,A,1,
1,B,2,
2,C,3,C
3,D,4,D


--------

# <b> Inner join
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/1/18/SQL_Join_-_07_A_Inner_Join_B.svg/330px-SQL_Join_-_07_A_Inner_Join_B.svg.png">

In [6]:
df1.merge(df2, how='inner', left_on='number', right_on='number')

Unnamed: 0,letter_x,number,letter_y
0,C,3,C
1,D,4,D


-----

### <b> Right Join
<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5f/SQL_Join_-_03_A_Right_Join_B.svg/330px-SQL_Join_-_03_A_Right_Join_B.svg.png">

In [7]:
df1.merge(df2,  how='right', on='number')

Unnamed: 0,letter_x,number,letter_y
0,C,3,C
1,D,4,D
2,,5,E
3,,6,F


## We can also specify custom `suffix` on the joined result
+ `suffixes`

In [8]:
df1.merge(df2,  how='right', on='number', suffixes=('', '_right'))

Unnamed: 0,letter,number,letter_right
0,C,3,C
1,D,4,D
2,,5,E
3,,6,F


In [9]:
df1.merge(df2,  how='right', on='number', suffixes=('_Table1', '_Table2'))

Unnamed: 0,letter_Table1,number,letter_Table2
0,C,3,C
1,D,4,D
2,,5,E
3,,6,F


---------

# Union with `pd.concat`

+ we can 1) reset index using `.reset_index(drop=True)`
+ and then using `drop_duplicates()` to drop duplicates.
+ getting taller

In [14]:
df3 = pd.concat([df1,df2]) # including all duplicates

df3

Unnamed: 0,letter,number
0,A,1
1,B,2
2,C,3
3,D,4
0,C,3
1,D,4
2,E,5
3,F,6


### 1) Reset Index

In [25]:
df3 = pd.concat([df1,df2]).reset_index(drop=True)
df3

Unnamed: 0,letter,number
0,A,1
1,B,2
2,C,3
3,D,4
4,C,3
5,D,4
6,E,5
7,F,6


### 2) Drop duplicates

In [26]:
pd.concat([df1,df2]).drop_duplicates().reset_index()

Unnamed: 0,index,letter,number
0,0,A,1
1,1,B,2
2,2,C,3
3,3,D,4
4,2,E,5
5,3,F,6


--------

# Concatenate dataframes horizontally
+ combine as columns (expanding columns => fatter)

In [18]:
df1

Unnamed: 0,letter,number
0,A,1
1,B,2
2,C,3
3,D,4


In [19]:
df2

Unnamed: 0,letter,number
0,C,3
1,D,4
2,E,5
3,F,6


In [21]:
df4 = pd.concat([df1, df2], axis=1)
df4

Unnamed: 0,letter,number,letter.1,number.1
0,A,1,C,3
1,B,2,D,4
2,C,3,E,5
3,D,4,F,6


------

# Append new row to your dataframe

In [27]:
df3

Unnamed: 0,letter,number
0,A,1
1,B,2
2,C,3
3,D,4
4,C,3
5,D,4
6,E,5
7,F,6


In [29]:
new_row = pd.Series(['z', 26], index=df3.columns)
new_row

letter     z
number    26
dtype: object

### Append new row to df3

In [30]:
df3.append(new_row, ignore_index=True)

Unnamed: 0,letter,number
0,A,1
1,B,2
2,C,3
3,D,4
4,C,3
5,D,4
6,E,5
7,F,6
8,z,26


------

# Join along your index

+ we **don't need to specify criterias as It will be join by Index**
+ need to pass **suffix** to differentiate between column names

In [31]:
df2

Unnamed: 0,letter,number
0,C,3
1,D,4
2,E,5
3,F,6


In [33]:
join_df = pd.DataFrame({
    'letter': ['F', 'G', 'H', 'I'],
    'number': [6, 7, 8, 9]
})

join_df

Unnamed: 0,letter,number
0,F,6
1,G,7
2,H,8
3,I,9


In [41]:
df2.join(join_df, rsuffix='_Table2')

Unnamed: 0,letter,number,letter_Table2,number_Table2
0,C,3,F,6
1,D,4,G,7
2,E,5,H,8
3,F,6,I,9


In [42]:
df2.join(join_df, rsuffix='_Table2', lsuffix='_Table1')

Unnamed: 0,letter_Table1,number_Table1,letter_Table2,number_Table2
0,C,3,F,6
1,D,4,G,7
2,E,5,H,8
3,F,6,I,9
