
---

## 📌 Merging & Joining Data in Pandas

Real-world datasets are often fragmented across multiple tables or files. **Pandas** provides powerful tools to combine these efficiently, often more flexibly than traditional SQL.

---

### 🔁 Merge: SQL-Style Joins

The `merge()` function in pandas works similarly to SQL joins, allowing combination based on one or more keys (columns).

#### 🟢 Inner Join *(Default)*

* Combines rows with **matching keys** in both DataFrames.
* Result contains only the common entries.

#### 🟡 Left Join

* Keeps **all rows from the left** DataFrame.
* Fills unmatched right side with `NaN`.

#### 🔵 Right Join

* Keeps **all rows from the right** DataFrame.
* Fills unmatched left side with `NaN`.

#### 🔴 Outer Join

* Combines **all rows from both** DataFrames.
* Fills missing data with `NaN` where no match exists.

---

### 📚 Concatenating DataFrames

Use `concat()` to stack or combine datasets **without** matching keys.

#### ⬇️ Vertical Concatenation (Rows)

* Appends one DataFrame below another.
* Useful when datasets have the **same columns**.

#### ⬅️ Horizontal Concatenation (Columns)

* Places DataFrames **side by side**.
* Ensure **index alignment** to avoid mismatches.

---

### 🔧 When to Use What?

| Use Case                              | Method                                         |
| ------------------------------------- | ---------------------------------------------- |
| SQL-style row joins on key(s)         | `merge()` or `.join()`                         |
| Combine datasets row-wise             | `concat()` (default axis=0)                    |
| Combine features/columns side-by-side | `concat(axis=1)`                               |
| Align by index                        | `.join()` or `merge()` with `right_index=True` |

---

### ✅ Summary

* Use `merge()` for SQL-style joins (inner, left, right, outer).
* Use `concat()` to stack datasets vertically or horizontally.
* Handle **missing keys or misaligned indexes** carefully for accurate results.

---

In [1]:
import pandas as pd

In [2]:
employees = pd.DataFrame({
    "EmpID": [1, 2, 3],
    "Name": ["Alice", "Bob", "Charlie"],
    "DeptID": [10, 20, 30]
})

departments = pd.DataFrame({
    "DeptID": [10, 20, 40],
    "DeptName": ["HR", "Engineering", "Marketing"]
})

In [3]:
employees

Unnamed: 0,EmpID,Name,DeptID
0,1,Alice,10
1,2,Bob,20
2,3,Charlie,30


In [4]:
departments

Unnamed: 0,DeptID,DeptName
0,10,HR
1,20,Engineering
2,40,Marketing


In [5]:
pd.merge(employees,departments,on="DeptID")

Unnamed: 0,EmpID,Name,DeptID,DeptName
0,1,Alice,10,HR
1,2,Bob,20,Engineering


In [7]:
pd.merge(employees,departments,on="DeptID",how="left")

Unnamed: 0,EmpID,Name,DeptID,DeptName
0,1,Alice,10,HR
1,2,Bob,20,Engineering
2,3,Charlie,30,


In [8]:
pd.merge(employees,departments,on="DeptID",how="right")

Unnamed: 0,EmpID,Name,DeptID,DeptName
0,1.0,Alice,10,HR
1,2.0,Bob,20,Engineering
2,,,40,Marketing


In [9]:
pd.merge(employees,departments,on="DeptID",how="outer")

Unnamed: 0,EmpID,Name,DeptID,DeptName
0,1.0,Alice,10,HR
1,2.0,Bob,20,Engineering
2,3.0,Charlie,30,
3,,,40,Marketing


In [10]:
df1 = pd.DataFrame({"Name": ["Alice", "Bob"]})
df2 = pd.DataFrame({"Name": ["Charlie", "David"]})

pd.concat([df1, df2])

Unnamed: 0,Name
0,Alice
1,Bob
0,Charlie
1,David


In [11]:
df1 = pd.DataFrame({"Name": ["Alice", "Bob"]})
df2 = pd.DataFrame({"Name": ["Charlie", "David"]})

In [12]:
df1

Unnamed: 0,Name
0,Alice
1,Bob


In [13]:
df2

Unnamed: 0,Name
0,Charlie
1,David


In [18]:
df1 = pd.DataFrame({"ID": [1, 2]})
df2 = pd.DataFrame({"Score": [90, 80]})

pd.concat([df1, df2], axis=1)

Unnamed: 0,ID,Score
0,1,90
1,2,80
