## Reindexing
- means to create a new object with the values rearranged to align with the new index

---

### **Key Points**
1. **Reindexing Rows and Columns**:
   - The `reindex()` method is used to conform a DataFrame or Series to a new index, with optional filling of missing values.

2. **New Labels**:
   - When the new labels are not present in the original data, `NaN` values are introduced by default.

3. **Key Parameters**:
   - **`index`**: New labels for rows.
   - **`columns`**: New labels for columns (applicable to DataFrames).
   - **`method`**: Filling method for missing values (`'ffill'`, `'bfill'`, etc.).
   - **`fill_value`**: Value to use for missing entries.
   - **`copy`**: Whether to make a copy of the original data (default is `True`).

---

### **Syntax**
```python
DataFrame.reindex(index=None, columns=None, method=None, fill_value=None, copy=True)
```

---


### **Use Cases of Reindexing**
1. **Aligning Data**: Align data from different sources to a common index.
2. **Adding New Indices**: Add new rows/columns for future data.
3. **Rearranging**: Rearrange rows/columns in a specific order.
4. **Missing Data Handling**: Specify how to handle missing data with fill methods.

Reindexing is a powerful feature in Pandas for data alignment and reshaping.

In [1]:
import numpy as np 
import pandas as pd
from pandas import Series, DataFrame

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [2]:
obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
obj

d    4.5
b    7.2
a   -5.3
c    3.6
dtype: float64

In [9]:
# calling reindex will rearranges the data according to the new index, introducing missing values if any index values were not already present
obj2 = obj.reindex(['a', 'c', 'd', 'b', 'e', 'f'])
obj2

a   -5.3
c    3.6
d    4.5
b    7.2
e    NaN
f    NaN
dtype: float64

In [15]:
# replace missing values with a constant
obj2.fillna({'e': 34,'f': 45}) # pass only a scalar or dict

a    -5.3
c     3.6
d     4.5
b     7.2
e    34.0
f    45.0
dtype: float64

In [23]:
# forward fill 
obj3 = pd.Series(['blue', 'purple', 'yellow'], index=[0,2,4])
obj3

obj3.reindex(np.arange(7), method='ffill')

0      blue
2    purple
4    yellow
dtype: object

0      blue
1      blue
2    purple
3    purple
4    yellow
5    yellow
6    yellow
dtype: object

In [22]:
# backward fill
# (method='bfill')

In [25]:
# DataFrame

frame = pd.DataFrame(np.arange(9).reshape((3,3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California'])
frame

frame2 = frame.reindex(index=['a', 'b', 'c', 'd'])
frame2

Unnamed: 0,Ohio,Texas,California
a,0,1,2
c,3,4,5
d,6,7,8


Unnamed: 0,Ohio,Texas,California
a,0.0,1.0,2.0
b,,,
c,3.0,4.0,5.0
d,6.0,7.0,8.0


In [27]:
# the columns can be reindexed with the columns keyword
states = ['Texas', 'Utah', 'California']
frame.reindex(columns=states)

# Output: since Ohio was not in states, the data for that col is dropped from the result

Unnamed: 0,Texas,Utah,California
a,1,,2
c,4,,5
d,7,,8


In [28]:
# another way:
# pass the new axis labels as a positional argument and then specify the axis to reindex with the axis keyword

frame.reindex(states, axis='columns')

Unnamed: 0,Texas,Utah,California
a,1,,2
c,4,,5
d,7,,8


In [30]:
# reindex by using loc operator
# only works if all the new index labels already exist in the DataFrame,
# (whereas reindex will insert missing data for new labels)

frame.loc[['a','d','c'], ['California', 'Texas']]

Unnamed: 0,California
a,2
d,8
c,5





Here’s a comprehensive table of **all arguments** for the `reindex()` method in Pandas, highlighting the most important ones:

| **Argument**     | **Type**                  | **Description**                                                                                  |
|-------------------|---------------------------|--------------------------------------------------------------------------------------------------|
| **`labels`**     | `array-like` or `None`    | New labels for the index. Alias for the `index` parameter.                                       |
| **`index`**      | `array-like` or `None`    | New labels for the rows (primary axis).                                                         |
| **`columns`**    | `array-like` or `None`    | New labels for the columns (secondary axis).                                                    |
| **`axis`**       | `int` or `str`            | Axis to reindex: `0` or `'index'` for rows, `1` or `'columns'` for columns.                     |
| **`level`**      | `int` or `str`            | Level(s) to reindex in a MultiIndex (applies to rows or columns).                               |
| **`fill_value`** | `scalar`                  | Value to use for filling missing values introduced by reindexing.                               |
| **`method`**     | `{'ffill', 'bfill', None}`| Filling method for gaps: `'ffill'` for forward fill, `'bfill'` for backward fill.               |
| **`limit`**      | `int`                     | Maximum number of consecutive NaN values to fill when using a fill method.                      |
| **`tolerance`**  | `int`, `float`, or `array-like` | Maximum distance between original and new labels for inexact matches during reindexing.        |
| **`copy`**       | `bool`                    | If `True` (default), returns a new object; if `False`, modifies the data in place where possible.|

---

### **Most Important Arguments**
1. **`index`**: Defines new row labels.
2. **`columns`**: Defines new column labels.
3. **`fill_value`**: Specifies a value to fill in for missing data.
4. **`method`**: Determines how missing values are filled (`'ffill'`, `'bfill'`).
5. **`level`**: Reindex specific levels in a MultiIndex.
6. **`copy`**: Whether to modify in place or return a new object.

---
