# **Data Selection & Indexing**

In [1]:
import pandas as pd

## 2. **Indexing and Reindexing**

Indexing & reindexing are critical in pandas because **indexes provide structure, alignment, and efficient data access**. They're more than just row numbers — they can represent time series, hierarchical data, or unique identifiers.

### 🔹 6. `set_index()` – Set a column as the index

#### ✅ **Purpose**: Move one or more columns into the index.

#### ✅ **Syntax**

```python
df.set_index('column_name', inplace=False)
```

In [5]:
data = {
    'ID': [101, 102, 103],
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35]
}
df = pd.DataFrame(data)

df

Unnamed: 0,ID,Name,Age
0,101,Alice,25
1,102,Bob,30
2,103,Charlie,35


In [7]:
df1 = df.set_index('ID')
df1

Unnamed: 0_level_0,Name,Age
ID,Unnamed: 1_level_1,Unnamed: 2_level_1
101,Alice,25
102,Bob,30
103,Charlie,35


#### ✅ **Real-time Example**

In transaction data, `TransactionID` can be set as the index for faster lookup:

```python
df.set_index('TransactionID', inplace=True)
```

### 🔹 7. `reset_index()` – Convert index back into a column

#### ✅ **Purpose**: Remove index and return it as a regular column.

#### ✅ **Syntax**

```python
df.reset_index(drop=False, inplace=False)
```

* `drop=True`: Discards the index completely.
* `drop=False`: Moves index to a column.

In [8]:
df11 = df1.reset_index()
df11

Unnamed: 0,ID,Name,Age
0,101,Alice,25
1,102,Bob,30
2,103,Charlie,35


In [10]:
df111 = df1.reset_index(drop=True)
df111

Unnamed: 0,Name,Age
0,Alice,25
1,Bob,30
2,Charlie,35


#### ✅ **Real-time Example**

When preparing indexed data for CSV export or joining:

```python
# Move index back to column before exporting
df.reset_index().to_csv('output.csv', index=False)
```

### 🔹 8. `reindex()` – Change row order or add new indices

#### ✅ **Purpose**: Reorder or expand the index.


#### ✅ **Syntax**

```python
df.reindex(new_index)
```

In [12]:
df = pd.DataFrame({
    'A': [10, 20, 30]
}, index=['x', 'y', 'z'])

df

Unnamed: 0,A
x,10
y,20
z,30


In [13]:
new_index = ['x', 'y', 'z', 'a', 'b']
df_re = df.reindex(new_index)

df_re

Unnamed: 0,A
x,10.0
y,20.0
z,30.0
a,
b,


#### ✅ **Real-time Example**

Fill in missing dates in a time series:

```python
# Fill missing dates in time series
all_days = pd.date_range(start='2023-01-01', end='2023-01-10')
df_reindexed = df.reindex(all_days)
```

### 🔹 9. `reindex_like()` – Reindex to match another DataFrame

#### ✅ **Purpose**: Make one DataFrame's structure match another.

In [16]:
df1 = pd.DataFrame({'A': [1, 2, 3]}, index=[0, 1, 2])
df2 = pd.DataFrame({'B': ['a', 'b']}, index=[0, 1])

print(df1)
print(df2)

   A
0  1
1  2
2  3
   B
0  a
1  b


In [17]:
df1.reindex_like(df2)

Unnamed: 0,B
0,
1,


In [18]:
df2.reindex_like(df1)

Unnamed: 0,A
0,
1,
2,


#### ✅ **Real-time Example**

Reindex a forecasted dataset to align with actuals:

```python
forecast_df = forecast_df.reindex_like(actuals_df)
```

### 🔹 10. Directly setting index using `df.index = ...`

In [20]:
df

Unnamed: 0,A
x,10
y,20
z,30


In [22]:
df.index = ['a', 'b', 'c']
df

Unnamed: 0,A
a,10
b,20
c,30


#### ✅ **Real-time Use Case**

Replace auto-generated numeric indices with real-world identifiers:

```python
# Suppose customer_id_list = [101, 102, 103]
df.index = customer_id_list
```

### 🔹 11. Hierarchical Indexing (MultiIndex)

#### ✅ **Purpose**: Index on multiple columns — useful for grouped or nested data.

#### ✅ **Syntax**

```python
df.set_index(['col1', 'col2'])
```


In [24]:
df = pd.DataFrame({
    'Year': [2020, 2020, 2021, 2021],
    'Quarter': ['Q1', 'Q2', 'Q1', 'Q2'],
    'Sales': [200, 210, 250, 260]
})
df

Unnamed: 0,Year,Quarter,Sales
0,2020,Q1,200
1,2020,Q2,210
2,2021,Q1,250
3,2021,Q2,260


In [27]:
df.set_index(['Year', 'Quarter'])

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales
Year,Quarter,Unnamed: 2_level_1
2020,Q1,200
2020,Q2,210
2021,Q1,250
2021,Q2,260


#### ✅ **Real-time Example**

Used in:

* Sales by Region and Quarter
* Time Series by Category and Subcategory
* Nested API response flattening


### 🔸 Summary Table: Indexing & Reindexing

| Method           | Use Case                            | Inplace | Returns   |
| ---------------- | ----------------------------------- | ------- | --------- |
| `set_index()`    | Set one/more columns as index       | ✅       | DataFrame |
| `reset_index()`  | Convert index back to column(s)     | ✅       | DataFrame |
| `reindex()`      | Change index (add/remove/reorder)   | ✅       | DataFrame |
| `reindex_like()` | Match another DataFrame's structure | ✅       | DataFrame |
| `df.index = ...` | Manual index assignment             | ✅       | DataFrame |
| MultiIndex       | Nested/grouped datasets             | ✅       | DataFrame |

---

## ✅ Best Practices

| Situation                            | Recommendation                      |
| ------------------------------------ | ----------------------------------- |
| Working with unique keys (e.g., IDs) | Use `set_index()`                   |
| Exporting or merging                 | Use `reset_index()`                 |
| Aligning data                        | Use `reindex()` or `reindex_like()` |
| Grouped data                         | Use MultiIndex                      |
| Avoiding errors                      | Always double-check index alignment |

<center><b>Thanks</b></center>