## Pandas merging 
      In Pandas, the merge() function is used to combine two DataFrames based on one or more common columns or keys, 
      similar to SQL JOIN operations.
      The merge() function joins DataFrames by matching values in specified columns.
      It allows combining related data stored in separate tables into a single DataFrame.

In [4]:
import pandas as pd 

## --- Basic Merging ---

### Creating two sample DataFrames with a common 'ID' column

In [14]:
df_1 = pd.DataFrame({'ID': [1,2,3,4],
'Class': [5,6,7,8]})

df_2 = pd.DataFrame({'ID': [1,2,3,4],
'Name': ['E', 'F', 'G', 'H']})

In [16]:
# Merging the two DataFrames; it automatically finds the common column 'ID'
df_3 = pd.merge(df_1,df_2)
df_3

Unnamed: 0,ID,Class,Name
0,1,5,E
1,2,6,F
2,3,7,G
3,4,8,H


In [18]:
# Explicitly specifying the 'on' parameter to merge on the 'ID' column
df_3 = pd.merge(df_1,df_2,on = "ID")
df_3

Unnamed: 0,ID,Class,Name
0,1,5,E
1,2,6,F
2,3,7,G
3,4,8,H


### Updating DataFrames to have mismatched IDs 

In [26]:
df_1 = pd.DataFrame({'ID': [1,2,3,4,5],
'Class': [5,6,7,8,10]})

df_2 = pd.DataFrame({'ID': [1,2,3,4,6],
'Name': ['E', 'F', 'G', 'H','U']})

### Merging using joins and other parameters 

In [24]:
# Inner Join (default): Keeps only rows with IDs present in both DataFrames
df_3 = pd.merge(df_1,df_2)   # By default how = 'inner '
df_3

Unnamed: 0,ID,Class,Name
0,1,5,E
1,2,6,F
2,3,7,G
3,4,8,H


In [26]:
# Left Join: Keeps all records from the left table (df_1), adding NaNs where matches are missing
df_3 = pd.merge(df_1,df_2,how = 'left')  # Give complete record of left side table
df_3  

Unnamed: 0,ID,Class,Name
0,1,5,E
1,2,6,F
2,3,7,G
3,4,8,H


In [28]:
# Right Join: Keeps all records from the right table (df_2)
df_3 = pd.merge(df_1,df_2,how = 'right')  # Give complete record of right side table
df_3

Unnamed: 0,ID,Class,Name
0,1,5,E
1,2,6,F
2,3,7,G
3,4,8,H


In [30]:
# Outer Join: Keeps all records from both tables, filling missing data with NaN
df_3 = pd.merge(df_1,df_2,how = 'outer')  # Give complete record of both side table
df_3

Unnamed: 0,ID,Class,Name
0,1,5,E
1,2,6,F
2,3,7,G
3,4,8,H


In [32]:
# Using 'indicator=True' to show which table (left, right, or both) the record originated from
df_3 = pd.merge(df_1,df_2,how = 'outer',indicator = True)  # Indicate which record is where 
df_3

Unnamed: 0,ID,Class,Name,_merge
0,1,5,E,both
1,2,6,F,both
2,3,7,G,both
3,4,8,H,both


In [37]:
# Merging based on the index of the DataFrames and adding suffixes to overlapping column names
df_3 = pd.merge(df_1,df_2,left_index=True ,right_index=True , suffixes=['_Table_1','_Table_2'])  
df_3

Unnamed: 0,ID_Table_1,Class,ID_Table_2,Name
0,1,5,1,E
1,2,6,2,F
2,3,7,3,G
3,4,8,4,H
