# Merging Joining Concatenations

In [1]:
import numpy as np
import pandas as pd

## **Merging 2 Dataframes**

What is merge() in Pandas?

Pandas merge() works like SQL joins — combining two DataFrames based on one or more common columns or indexes.

✅ Syntax:

pd.merge(left, right, how='inner', on=None)


| Parameter       | Description                                             |
| --------------- | ------------------------------------------------------- |
| `left`, `right` | DataFrames to merge                                     |
| `how`           | Type of join: `'inner'`, `'outer'`, `'left'`, `'right'` |
| `on`            | Column name(s) to join on                               |


🔁 Types of Joins:

| Join Type | Code          | Description                            |
| --------- | ------------- | -------------------------------------- |
| `inner`   | `how='inner'` | Only matching rows                     |
| `left`    | `how='left'`  | All rows from left, matched from right |
| `right`   | `how='right'` | All rows from right, matched from left |
| `outer`   | `how='outer'` | All rows from both, NaN where no match |


In [2]:
employees = pd.DataFrame({
    'employee_id': [1, 2, 3, 4, 5],
    'name': ['John', 'Anna', 'Peter', 'Linda', 'Bob'],
    'department': ['HR', 'IT', 'Finance', 'IT', 'HR']
})

# DataFrame 2: Salary information
salaries = pd.DataFrame({
    'employee_id': [1, 2, 3, 5, 7],
    'salary': [60000, 80000, 65000, 70000, 90000],
    'bonus': [5000, 10000, 7000, 8000, 12000]
})


In [3]:
employees

Unnamed: 0,employee_id,name,department
0,1,John,HR
1,2,Anna,IT
2,3,Peter,Finance
3,4,Linda,IT
4,5,Bob,HR


In [4]:
salaries

Unnamed: 0,employee_id,salary,bonus
0,1,60000,5000
1,2,80000,10000
2,3,65000,7000
3,5,70000,8000
4,7,90000,12000


In [5]:
pd.merge(employees,salaries,on='employee_id',how='inner')

Unnamed: 0,employee_id,name,department,salary,bonus
0,1,John,HR,60000,5000
1,2,Anna,IT,80000,10000
2,3,Peter,Finance,65000,7000
3,5,Bob,HR,70000,8000


In [6]:
pd.merge(employees,salaries,on='employee_id',how='outer')

Unnamed: 0,employee_id,name,department,salary,bonus
0,1,John,HR,60000.0,5000.0
1,2,Anna,IT,80000.0,10000.0
2,3,Peter,Finance,65000.0,7000.0
3,4,Linda,IT,,
4,5,Bob,HR,70000.0,8000.0
5,7,,,90000.0,12000.0


In [7]:
pd.merge(employees,salaries,on='employee_id',how='left')

Unnamed: 0,employee_id,name,department,salary,bonus
0,1,John,HR,60000.0,5000.0
1,2,Anna,IT,80000.0,10000.0
2,3,Peter,Finance,65000.0,7000.0
3,4,Linda,IT,,
4,5,Bob,HR,70000.0,8000.0


In [8]:
pd.merge(employees,salaries,on='employee_id',how='right')

Unnamed: 0,employee_id,name,department,salary,bonus
0,1,John,HR,60000,5000
1,2,Anna,IT,80000,10000
2,3,Peter,Finance,65000,7000
3,5,Bob,HR,70000,8000
4,7,,,90000,12000


## **Concatinations of 2 Dataframes**

 Pandas me pd.concat() ka use hum 2 ya usse zyada DataFrames ko combine karne ke liye karte hain.



✅ Basic Syntax:

pd.concat([df1, df2], axis=0)  # Default: axis=0 (row-wise)


In [9]:
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2'],
    'C': ['C0', 'C1', 'C2']
})

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5'],
    'C': ['C3', 'C4', 'C5']
})

In [10]:
df1

Unnamed: 0,A,B,C
0,A0,B0,C0
1,A1,B1,C1
2,A2,B2,C2


In [11]:
df2

Unnamed: 0,A,B,C
0,A3,B3,C3
1,A4,B4,C4
2,A5,B5,C5


In [12]:
pd.concat([df1,df2]) # on basic of column

Unnamed: 0,A,B,C
0,A0,B0,C0
1,A1,B1,C1
2,A2,B2,C2
0,A3,B3,C3
1,A4,B4,C4
2,A5,B5,C5


In [13]:
pd.concat([df1,df2],axis=1) # on basix of row

Unnamed: 0,A,B,C,A.1,B.1,C.1
0,A0,B0,C0,A3,B3,C3
1,A1,B1,C1,A4,B4,C4
2,A2,B2,C2,A5,B5,C5


🧠 Summary:

| Axis | Operation                | Description              |
| ---- | ------------------------ | ------------------------ |
| 0    | `pd.concat(..., axis=0)` | Stack rows (Vertical)    |
| 1    | `pd.concat(..., axis=1)` | Add columns (Horizontal) |


# **Joining 2 DataFrames**

🔁 join() in Pandas

🧠 Purpose:

join() ka use hum DataFrames ko unke index ke basis par join karne ke liye karte hain (by default).

✅ Syntax:

df1.join(df2, how='left')


| Parameter | Description                                     |
| --------- | ----------------------------------------------- |
| `df2`     | DataFrame to join                               |
| `how`     | Type of join: 'left', 'right', 'outer', 'inner' |


In [14]:
df1 = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'index': [1, 2, 3]
})

df2 = pd.DataFrame({
    'score': [85, 98, 75]
}, index=[2, 3, 4])

In [15]:
df1

Unnamed: 0,name,index
0,Alice,1
1,Bob,2
2,Charlie,3


In [16]:
df2

Unnamed: 0,score
2,85
3,98
4,75


In [18]:
df1.join(df2,how='outer')

Unnamed: 0,name,index,score
0,Alice,1.0,
1,Bob,2.0,
2,Charlie,3.0,85.0
3,,,98.0
4,,,75.0


In [19]:
df2.join(df1,how='outer')

Unnamed: 0,score,name,index
0,,Alice,1.0
1,,Bob,2.0
2,85.0,Charlie,3.0
3,98.0,,
4,75.0,,


✅ Difference from merge():

| Feature     | `merge()`                 | `join()`                   |
| ----------- | ------------------------- | -------------------------- |
| Join on     | Column(s)                 | Index                      |
| Flexibility | More (like SQL joins)     | Simpler, index-based joins |
| Usage       | Most used in data merging | Used when index is key     |


🔚 Summary of All 3:

| Function   | Joins On           | Use Case                             |
| ---------- | ------------------ | ------------------------------------ |
| `concat()` | Axis (0 or 1)      | Stack data (rows/columns)            |
| `merge()`  | Column             | SQL-style joins on common fields     |
| `join()`   | Index (by default) | Simpler, index-based DataFrame joins |
