# Pandas - Merging DataFrames
Merging means joining two tables together using a common column (like a key).

### For Example:
You have two lists of data, and you want to combine them based on something they share — like a student ID, country name, or product code.

In [1]:
# Importing pandas library
import pandas as pd


In [2]:
# Reading file 1
df1 = pd.read_csv('LOTR.csv')
df1


Unnamed: 0,FellowshipID,FirstName,Skills
0,1001,Frodo,Hiding
1,1002,Samwise,Gardening
2,1003,Gandalf,Spells
3,1004,Pippin,Fireworks


In [3]:
# Reading file 2
df2 = pd.read_csv('LOTR 2.csv')
df2


Unnamed: 0,FellowshipID,FirstName,Age
0,1001,Frodo,50
1,1002,Samwise,39
2,1006,Legolas,2931
3,1007,Elrond,6520
4,1008,Barromir,51


## Merge
- Combines two DataFrames using a common column.
- You choose the column to match (on='id').
- Most flexible and powerful.
- Use when: you want to match data.

In [4]:
# Merging bot dataframes
df1.merge(df2)

# Here it's making inner join by default but we can control that aswell


Unnamed: 0,FellowshipID,FirstName,Skills,Age
0,1001,Frodo,Hiding,50
1,1002,Samwise,Gardening,39


In [12]:
# We now merged both the dataframes using inner join
# how = '' let's you choose whatever join you want to do on the dataframes to merge them

df1.merge(df2, how = 'inner', on= 'FellowshipID')

# when you use on='' it shows other same columns as 2 different from both the tables
# FirstName_x is in first dataframe and FirstName_y is in second dataframe


Unnamed: 0,FellowshipID,FirstName_x,Skills,FirstName_y,Age
0,1001,Frodo,Hiding,Frodo,50.0
1,1002,Samwise,Gardening,Samwise,39.0
2,1003,Gandalf,Spells,,
3,1004,Pippin,Fireworks,,


In [10]:
# gives same output as df1.merge(df2)
df1.merge(df2, how = 'inner', on= ['FellowshipID', 'FirstName'])


Unnamed: 0,FellowshipID,FirstName,Skills,Age
0,1001,Frodo,Hiding,50
1,1002,Samwise,Gardening,39


In [13]:
# Making an outer join
df1.merge(df2, how = 'outer')


Unnamed: 0,FellowshipID,FirstName,Skills,Age
0,1001,Frodo,Hiding,50.0
1,1002,Samwise,Gardening,39.0
2,1003,Gandalf,Spells,
3,1004,Pippin,Fireworks,
4,1006,Legolas,,2931.0
5,1007,Elrond,,6520.0
6,1008,Barromir,,51.0


In [14]:
# Creating a left join
df1.merge(df2, how = 'left')


Unnamed: 0,FellowshipID,FirstName,Skills,Age
0,1001,Frodo,Hiding,50.0
1,1002,Samwise,Gardening,39.0
2,1003,Gandalf,Spells,
3,1004,Pippin,Fireworks,


In [15]:
# Creating a right join
df1.merge(df2, how = 'right')


Unnamed: 0,FellowshipID,FirstName,Skills,Age
0,1001,Frodo,Hiding,50
1,1002,Samwise,Gardening,39
2,1006,Legolas,,2931
3,1007,Elrond,,6520
4,1008,Barromir,,51


In [16]:
# Creating a cross join
# Takes one value from df1 and compares it to all values in df2 and repeats till the end
df1.merge(df2, how = 'cross')


Unnamed: 0,FellowshipID_x,FirstName_x,Skills,FellowshipID_y,FirstName_y,Age
0,1001,Frodo,Hiding,1001,Frodo,50
1,1001,Frodo,Hiding,1002,Samwise,39
2,1001,Frodo,Hiding,1006,Legolas,2931
3,1001,Frodo,Hiding,1007,Elrond,6520
4,1001,Frodo,Hiding,1008,Barromir,51
5,1002,Samwise,Gardening,1001,Frodo,50
6,1002,Samwise,Gardening,1002,Samwise,39
7,1002,Samwise,Gardening,1006,Legolas,2931
8,1002,Samwise,Gardening,1007,Elrond,6520
9,1002,Samwise,Gardening,1008,Barromir,51


## Joins
- A simpler way to combine DataFrames — it joins by index (row labels) by default.
- Joins using the index (unless you change it).
- Less flexible, but shorter syntax for quick joins.
- Use when: your index is the matching key.

In [20]:
# Similar to merge, a bit complicated
df1.join(df2, on = 'FellowshipID', how = 'outer', lsuffix = '_left', rsuffix = '_right')

# The table still isn't working properly


Unnamed: 0,FellowshipID,FellowshipID_left,FirstName_left,Skills,FellowshipID_right,FirstName_right,Age
,0,,,,1001.0,Frodo,50.0
,1,,,,1002.0,Samwise,39.0
,2,,,,1006.0,Legolas,2931.0
,3,,,,1007.0,Elrond,6520.0
,4,,,,1008.0,Barromir,51.0
0.0,1001,1001.0,Frodo,Hiding,,,
1.0,1002,1002.0,Samwise,Gardening,,,
2.0,1003,1003.0,Gandalf,Spells,,,
3.0,1004,1004.0,Pippin,Fireworks,,,


In [26]:
# Making their index as fellowshipID and doing outer join
df4 = df1.set_index('FellowshipID').join(df2.set_index('FellowshipID'),
              lsuffix = '_Left', rsuffix = '_Right',
              how = 'outer')
df4


Unnamed: 0_level_0,FirstName_Left,Skills,FirstName_Right,Age
FellowshipID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1001,Frodo,Hiding,Frodo,50.0
1002,Samwise,Gardening,Samwise,39.0
1003,Gandalf,Spells,,
1004,Pippin,Fireworks,,
1006,,,Legolas,2931.0
1007,,,Elrond,6520.0
1008,,,Barromir,51.0


In [27]:
# Doing inner join
df4 = df1.set_index('FellowshipID').join(df2.set_index('FellowshipID'),
              lsuffix = '_Left', rsuffix = '_Right',
              how = 'inner')
df4


Unnamed: 0_level_0,FirstName_Left,Skills,FirstName_Right,Age
FellowshipID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1001,Frodo,Hiding,Frodo,50
1002,Samwise,Gardening,Samwise,39


In [28]:
# Doing left join
df4 = df1.set_index('FellowshipID').join(df2.set_index('FellowshipID'),
              lsuffix = '_Left', rsuffix = '_Right',
              how = 'left')
df4


Unnamed: 0_level_0,FirstName_Left,Skills,FirstName_Right,Age
FellowshipID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1001,Frodo,Hiding,Frodo,50.0
1002,Samwise,Gardening,Samwise,39.0
1003,Gandalf,Spells,,
1004,Pippin,Fireworks,,


In [29]:
# Doing right join
df4 = df1.set_index('FellowshipID').join(df2.set_index('FellowshipID'),
              lsuffix = '_Left', rsuffix = '_Right',
              how = 'right')
df4


Unnamed: 0_level_0,FirstName_Left,Skills,FirstName_Right,Age
FellowshipID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1001,Frodo,Hiding,Frodo,50
1002,Samwise,Gardening,Samwise,39
1006,,,Legolas,2931
1007,,,Elrond,6520
1008,,,Barromir,51


In [30]:
# Doing cross join
# It compares each value of df1 with all the values in df2
df4 = df1.set_index('FellowshipID').join(df2.set_index('FellowshipID'),
              lsuffix = '_Left', rsuffix = '_Right',
              how = 'cross')
df4


Unnamed: 0,FirstName_Left,Skills,FirstName_Right,Age
0,Frodo,Hiding,Frodo,50
1,Frodo,Hiding,Samwise,39
2,Frodo,Hiding,Legolas,2931
3,Frodo,Hiding,Elrond,6520
4,Frodo,Hiding,Barromir,51
5,Samwise,Gardening,Frodo,50
6,Samwise,Gardening,Samwise,39
7,Samwise,Gardening,Legolas,2931
8,Samwise,Gardening,Elrond,6520
9,Samwise,Gardening,Barromir,51


## Concatenate
- Combines DataFrames by stacking them up or side-by-side, without matching anything.
- Tables stick together.
- No common key needed.
- Use when: you just want to stack rows or columns.


In [31]:
# Concatenating both dataframes
pd.concat([df1, df2])

# Puts one table on the top of the other

Unnamed: 0,FellowshipID,FirstName,Skills,Age
0,1001,Frodo,Hiding,
1,1002,Samwise,Gardening,
2,1003,Gandalf,Spells,
3,1004,Pippin,Fireworks,
0,1001,Frodo,,50.0
1,1002,Samwise,,39.0
2,1006,Legolas,,2931.0
3,1007,Elrond,,6520.0
4,1008,Barromir,,51.0


In [33]:
# Adding an inner join
pd.concat([df1, df2], join = 'inner')

# This will show the matching column in both the table and stack one on the other

Unnamed: 0,FellowshipID,FirstName
0,1001,Frodo
1,1002,Samwise
2,1003,Gandalf
3,1004,Pippin
0,1001,Frodo
1,1002,Samwise
2,1006,Legolas
3,1007,Elrond
4,1008,Barromir


In [34]:
# Adding an outer join
pd.concat([df1, df2], join = 'outer')

# This will show all the columns in both the table and stack one on the other

Unnamed: 0,FellowshipID,FirstName,Skills,Age
0,1001,Frodo,Hiding,
1,1002,Samwise,Gardening,
2,1003,Gandalf,Spells,
3,1004,Pippin,Fireworks,
0,1001,Frodo,,50.0
1,1002,Samwise,,39.0
2,1006,Legolas,,2931.0
3,1007,Elrond,,6520.0
4,1008,Barromir,,51.0


In [38]:
# Adding an outer join on axis 1
pd.concat([df1, df2], join = 'outer', axis = 1)

# This will add index properly and they will be added horizontally (side by side) not on the top

Unnamed: 0,FellowshipID,FirstName,Skills,FellowshipID.1,FirstName.1,Age
0,1001.0,Frodo,Hiding,1001,Frodo,50
1,1002.0,Samwise,Gardening,1002,Samwise,39
2,1003.0,Gandalf,Spells,1006,Legolas,2931
3,1004.0,Pippin,Fireworks,1007,Elrond,6520
4,,,,1008,Barromir,51
