# üêº Pandas - Class 7: Merging, Joining & Concatenation
Welcome to **Class 7** of our Pandas series. Today we‚Äôll learn how to combine multiple DataFrames using different techniques.

## 1. Using `concat()` for Stacking Data
- `pd.concat()` combines DataFrames vertically (rows) or horizontally (columns).
- Use `axis=0` for vertical stacking (default), `axis=1` for horizontal.
- `ignore_index=True` can reset the index after concatenation.

In [1]:
import pandas as pd

# Dataset 1: Product details
df1 = pd.DataFrame({
    "ProductID": [101, 102, 103],
    "ProductName": ["Pen", "Notebook", "Marker"],
    "Price": [10, 50, 30]
})

# Dataset 2: More products (same columns as df1)
df2 = pd.DataFrame({
    "ProductID": [104, 105],
    "ProductName": ["Pencil", "Eraser"],
    "Price": [5, 8]
})

# Dataset 3: Additional info (different columns)
df3 = pd.DataFrame({
    "Stock": [100, 60, 80],
    "Rating": [4.5, 4.7, 4.8]
})



In [5]:
pd.concat([df1,df3],axis=0)

Unnamed: 0,ProductID,ProductName,Price,Stock,Rating
0,101.0,Pen,10.0,,
1,102.0,Notebook,50.0,,
2,103.0,Marker,30.0,,
0,,,,100.0,4.5
1,,,,60.0,4.7
2,,,,80.0,4.8


In [6]:
pd.concat([df1,df3],axis=1)

Unnamed: 0,ProductID,ProductName,Price,Stock,Rating
0,101,Pen,10,100,4.5
1,102,Notebook,50,60,4.7
2,103,Marker,30,80,4.8


In [7]:

# Dataset 1: Product details


# Dataset 2: More products (same columns as df1 ‚Üí good for vertical stacking)

# Dataset 3: Additional info (different columns ‚Üí good for horizontal stacking)


## 2. Using `merge()` for SQL-Style Joins
- `merge()` is similar to SQL joins.
- Specify `on` or `left_on`/`right_on` to choose key columns.
- Join types: `inner` (default), `left`, `right`, `outer`.

In [8]:
Products = pd.concat([df1,df2],ignore_index=True)
Products

Unnamed: 0,ProductID,ProductName,Price
0,101,Pen,10
1,102,Notebook,50
2,103,Marker,30
3,104,Pencil,5
4,105,Eraser,8


In [9]:
sales = pd.DataFrame({
    "ProductID": [101, 102, 103, 104, 105],
    "UnitsSold": [10, 20, 15, 5, 8]
})

In [10]:
pd.merge(Products,sales, on="ProductID",how="inner")

Unnamed: 0,ProductID,ProductName,Price,UnitsSold
0,101,Pen,10,10
1,102,Notebook,50,20
2,103,Marker,30,15
3,104,Pencil,5,5
4,105,Eraser,8,8


In [11]:
pd.merge(Products,sales, on="ProductID",how="left")

Unnamed: 0,ProductID,ProductName,Price,UnitsSold
0,101,Pen,10,10
1,102,Notebook,50,20
2,103,Marker,30,15
3,104,Pencil,5,5
4,105,Eraser,8,8


In [12]:
pd.merge(Products,sales, on="ProductID",how="right")

Unnamed: 0,ProductID,ProductName,Price,UnitsSold
0,101,Pen,10,10
1,102,Notebook,50,20
2,103,Marker,30,15
3,104,Pencil,5,5
4,105,Eraser,8,8


In [16]:
pd.merge(Products,sales, on="ProductID",how='outer')

Unnamed: 0,ProductID,ProductName,Price,UnitsSold
0,101,Pen,10,10
1,102,Notebook,50,20
2,103,Marker,30,15
3,104,Pencil,5,5
4,105,Eraser,8,8


## 3. Using `join()` with Index Alignment
- `join()` joins DataFrames on their index or on a key column.
- By default, joins on index; can specify `on='col'`.
- Useful for adding columns to an existing DataFrame.

In [17]:
Products = pd.concat([df1,df2],ignore_index=True)
Products

Unnamed: 0,ProductID,ProductName,Price
0,101,Pen,10
1,102,Notebook,50
2,103,Marker,30
3,104,Pencil,5
4,105,Eraser,8


In [32]:
stock = pd.DataFrame({
    "Stock": [101, 102, 103, 104, 105],
}, index= [10, 20, 15, 5, 8])

In [33]:
p_dataset = Products.set_index("ProductID")

In [34]:
p_dataset.join(stock)

Unnamed: 0_level_0,ProductName,Price,Stock
ProductID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
101,Pen,10,
102,Notebook,50,
103,Marker,30,
104,Pencil,5,
105,Eraser,8,


## 4. Practical Examples
- Combine sales and revenue data to see performance.
- Merge students with marks to link records.
- Practice with realistic datasets to understand alignment and missing data handling.

In [38]:
import pandas as pd
# sales data
sales = pd.DataFrame({
    "ProductID":[201,202,203,204],
    "UnitsSold":[500,300,400,250]
})
#Remove data (missing productID 204 extra 205)
revenue = pd.DataFrame({
    "ProductID":[201,202,203,205],
    "Revenue":[10000,8000,6000,500]
})
merged_df =pd.merge(sales,revenue,on="ProductID",how="outer")
merged_df


Unnamed: 0,ProductID,UnitsSold,Revenue
0,201,500.0,10000.0
1,202,300.0,8000.0
2,203,400.0,6000.0
3,204,250.0,
4,205,,500.0


## Mini Practice
1. Create three DataFrames: Products, Sales, and Revenue.
2. Concatenate them vertically and horizontally.
3. Merge Sales and Revenue using different join types.
4. Join Products with Sales based on index or a key.
5. Analyze missing values after joins.

In [13]:

# 1. Create three DataFrames: Products, Sales, and Revenue

# 2. Concatenate them vertically and horizontally
# Write your code here

# 3. Merge Sales and Revenue using different join types (inner, left, right, outer)
# Write your code here

# 4. Join Products with Sales based on index or a key
# Write your code here

# 5. Analyze missing values after joins
# Use isna(), info(), or describe() to explore gaps
# Write your code here


---
## Summary
- Used `concat()` for stacking data vertically and horizontally.
- Performed SQL-style joins with `merge()`.
- Applied `join()` with index alignment.
- Explored practical use-cases like sales vs revenue, students vs marks.