- **resampling** refers to the process of converting a time series from one frequency to another.
- aggregating higher frequency data to lower frequency is called **downsampling**, while converting lower frequency to higher frequency is called **upsampling**.
- not all resampling falls into either of these categories; for example, converting W-WED (weekly on Wednesday) to W-FRI is neither upsampling nor downsampling.

- pandas objects are equipped with a `resample` method, which is the workhorse function for all frequency conversion.
- it has a similar API to groupby; you call resample to group the data, then call an aggregation function.

In [12]:
import numpy as np 
import pandas as pd 

In [13]:
dates = pd.date_range("2000-01-01", periods=100)
ts = pd.Series(np.random.standard_normal(len(dates)), index=dates)
ts

2000-01-01    0.395299
2000-01-02    0.597467
2000-01-03    0.341001
2000-01-04    2.312406
2000-01-05   -0.741688
                ...   
2000-04-05    1.643592
2000-04-06   -0.280640
2000-04-07   -0.735129
2000-04-08   -1.035427
2000-04-09    0.763317
Freq: D, Length: 100, dtype: float64

In [14]:
ts.resample("ME").mean()

2000-01-31    0.108772
2000-02-29   -0.103421
2000-03-31   -0.083714
2000-04-30   -0.034421
Freq: ME, dtype: float64

In [15]:
ts.resample("ME", kind="period").mean()

  ts.resample("ME", kind="period").mean()


2000-01    0.108772
2000-02   -0.103421
2000-03   -0.083714
2000-04   -0.034421
Freq: M, dtype: float64

`resample` is a flexible method that can be used to process large time series.


#### **Key Arguments in `resample()`**

| Argument     | Description                                                                                   |
|--------------|-----------------------------------------------------------------------------------------------|
| `rule`       | String representing the **resampling frequency** (e.g., `"M"` for monthly, `"15min"` for 15 minutes). |
| `how`        | *Deprecated.* You now use methods like `.mean()`, `.sum()`, etc. directly after `resample()`. |
| `axis`       | Axis to resample (default is `0`, which means rows).                                          |
| `on`         | Column to use instead of the index for resampling (useful for DataFrames without a DateTimeIndex). |
| `level`      | Use a specific level of a MultiIndex (if applicable) for resampling.                          |
| `label`      | Whether to label bins with the `right` or `left` edge (`'right'` is default).                 |
| `closed`     | Whether to treat intervals as closed on the `'right'` or `'left'` edge (for time bins).       |
| `loffset`    | (Deprecated) Time offset to shift the resampled time labels.                                 |
| `kind`       | Return a Series or DataFrame with `timestamp` index or `period` index (`'timestamp'` or `'period'`). |
| `convention` | When resampling periods, determines whether to use the start or end of the period.            |
| `base`       | (Deprecated) For backward compatibility. Use `offsets` instead.                               |
| `fill_method`| How to fill missing values in upsampling (`ffill`, `bfill`).                                  |


---- 

## [ Downsampling ]

#### What is Downsampling?

**Downsampling** means converting **high-frequency data** (e.g., daily data) into **low-frequency** (e.g., monthly). You're summarizing data into **larger time chunks** — like averages per month, week, etc.

#### Key Points

1. **You don’t need perfect time intervals.**  
   Even if your original data isn’t perfectly regular, pandas can **cut it into bins** of your desired frequency (like months or weeks).

2. **Time is sliced into bins (chunks)** based on the frequency you choose.  
   For monthly data, pandas divides the time into 1-month blocks.

3. **Bins are half-open intervals**  
   Each data point belongs to **only one bin**.  
   (Like 2023-01-31 will belong to **January**, not February.)

4. **Defaults can be tricky:**  
   By default:
   - Some frequencies (like `"M"`, `"A"`, `"Q"`) are **closed on the right** (include the last day).  
   - Others are **closed on the left** (include the start day).
   - This affects which interval a point falls into.

5. **Labeling matters:**  
   You can choose to label each bin with:
   - The **start** of the interval (e.g., `2023-01-01`)
   - Or the **end** of the interval (e.g., `2023-01-31`)


In [16]:
dates = pd.date_range("2000-01-01", periods=12, freq="min")
ts = pd.Series(np.arange(len(dates)), index=dates)
ts

2000-01-01 00:00:00     0
2000-01-01 00:01:00     1
2000-01-01 00:02:00     2
2000-01-01 00:03:00     3
2000-01-01 00:04:00     4
2000-01-01 00:05:00     5
2000-01-01 00:06:00     6
2000-01-01 00:07:00     7
2000-01-01 00:08:00     8
2000-01-01 00:09:00     9
2000-01-01 00:10:00    10
2000-01-01 00:11:00    11
Freq: min, dtype: int64

In [17]:
# suppose you wanted to aggregate this data into five-minute chunks or bars by taking the sum of each group

ts.resample("5min").sum()

# the frequency you pass defines bin edges in five-minute increments
# for this frequency, by default the left bin edge is inclsuve, so the 00:00 value is included in the 00:00 to 00:05 interval, and the 00:05 value is excluded from that interval

2000-01-01 00:00:00    10
2000-01-01 00:05:00    35
2000-01-01 00:10:00    21
Freq: 5min, dtype: int64

In [18]:
ts.resample("5min", closed="right").sum()

1999-12-31 23:55:00     0
2000-01-01 00:00:00    15
2000-01-01 00:05:00    40
2000-01-01 00:10:00    11
Freq: 5min, dtype: int64

In [19]:
# the resulting time series is labeled by the timestamps from the left side of each bin
# by passing label="right" you can label them with the right bin edge

ts.resample("5min", closed="right", label="right").sum()

2000-01-01 00:00:00     0
2000-01-01 00:05:00    15
2000-01-01 00:10:00    40
2000-01-01 00:15:00    11
Freq: 5min, dtype: int64

In [20]:
# to shift the result index by some amount, say subtracting one second from the right edge to make it more clear which interval the timestamp refers to.
# to do this, add an offset to the resulting index

from pandas.tseries.frequencies import to_offset

result = ts.resample("5min", closed="right", label="right").sum()
result.index = result.index + to_offset("-1s")
result

1999-12-31 23:59:59     0
2000-01-01 00:04:59    15
2000-01-01 00:09:59    40
2000-01-01 00:14:59    11
Freq: 5min, dtype: int64

Open-high-low-close (OHLC) resampling

#### **What is OHLC Resampling?**

**OHLC** stands for:

- **Open** → First value in the time interval  
- **High** → Maximum value in the interval  
- **Low** → Minimum value in the interval  
- **Close** → Last value in the interval

#### Why is it used?

In **finance**, especially for stock data, it's common to look at these four values to understand price movement over a time window (e.g., per minute, per hour, per day, etc.).


In [23]:
ts = pd.Series(np.random.permutation(np.arange(len(dates))), index=dates)

ts.resample("5min").ohlc()

Unnamed: 0,open,high,low,close
2000-01-01 00:00:00,2,8,2,8
2000-01-01 00:05:00,5,11,0,9
2000-01-01 00:10:00,1,10,1,10


## [ Upsampling and Interpolation ]

In [24]:
# upsampling is converting from a lower frequency to a higher frequency, where no aggregation is needed.

frame = pd.DataFrame(np.random.standard_normal((2, 4)),
                     index=pd.date_range("2000-01-01", periods=2,
                     freq="W-WED"),
                     columns=["Colorado", "Texas", "New York", "Ohio"])
frame

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-0.61046,1.186832,-0.778359,-0.186423
2000-01-12,-1.183543,0.773072,-0.876089,0.801438


In [25]:
# when you are using an aggregation function with this data, there is only one value per group, and missing values result in the gaps.
# we use the asfreq method to convert to the higher frequency without any aggregation

df_daily = frame.resample("D").asfreq()
df_daily

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-0.61046,1.186832,-0.778359,-0.186423
2000-01-06,,,,
2000-01-07,,,,
2000-01-08,,,,
2000-01-09,,,,
2000-01-10,,,,
2000-01-11,,,,
2000-01-12,-1.183543,0.773072,-0.876089,0.801438


In [26]:
# suppose you wanted to fill forward each weekly value on the non-wednesdays.
# the same filling or interpolation methods available in the fillna and reindex methods are available for resampling

frame.resample("D").ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-0.61046,1.186832,-0.778359,-0.186423
2000-01-06,-0.61046,1.186832,-0.778359,-0.186423
2000-01-07,-0.61046,1.186832,-0.778359,-0.186423
2000-01-08,-0.61046,1.186832,-0.778359,-0.186423
2000-01-09,-0.61046,1.186832,-0.778359,-0.186423
2000-01-10,-0.61046,1.186832,-0.778359,-0.186423
2000-01-11,-0.61046,1.186832,-0.778359,-0.186423
2000-01-12,-1.183543,0.773072,-0.876089,0.801438


In [27]:
# you can similarly choose to only fill a certain number of periods forward to limit how far to continue using an observed value

frame.resample("D").ffill(limit=2)

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-05,-0.61046,1.186832,-0.778359,-0.186423
2000-01-06,-0.61046,1.186832,-0.778359,-0.186423
2000-01-07,-0.61046,1.186832,-0.778359,-0.186423
2000-01-08,,,,
2000-01-09,,,,
2000-01-10,,,,
2000-01-11,,,,
2000-01-12,-1.183543,0.773072,-0.876089,0.801438


In [28]:
# the new date index need not coincide with old one at all
frame.resample("W-THU").ffill()

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01-06,-0.61046,1.186832,-0.778359,-0.186423
2000-01-13,-1.183543,0.773072,-0.876089,0.801438


## [ Resampling with Periods ]

In [31]:
# Resampling data indexed by periods is similar to timestamps:
frame = pd.DataFrame(np.random.standard_normal((24, 4)),
                     index=pd.period_range("1-2000", "12-2001",
                     freq="M"),
                     columns=["Colorado", "Texas", "New York", "Ohio"])
frame

Unnamed: 0,Colorado,Texas,New York,Ohio
2000-01,0.710167,-0.578509,-1.088563,0.15906
2000-02,-0.673981,-0.312867,-1.127645,1.304953
2000-03,0.579929,-0.118643,-1.481204,-0.152895
2000-04,-0.7517,-0.27853,-0.331975,-2.855526
2000-05,0.108504,-0.146423,-0.524415,0.440979
2000-06,0.787169,-1.957123,-0.264088,-2.033483
2000-07,0.573226,-0.508758,-1.552187,-1.689251
2000-08,0.889459,1.436143,0.713995,0.504853
2000-09,-0.503482,-0.251782,-1.353599,0.299712
2000-10,-2.040798,-0.018108,2.18643,-0.385476


In [33]:
annual_frame = frame.resample("Y-DEC").mean()
annual_frame

  annual_frame = frame.resample("Y-DEC").mean()


Unnamed: 0,Colorado,Texas,New York,Ohio
2000,-0.007336,-0.2643,-0.225689,-0.457763
2001,-0.177954,-0.144177,-0.452602,-0.017718


In [34]:
# Upsampling is more nuanced, as before resampling you must make a decision about which end of the time span in the new frequency to place the values. The convention argument defaults to "start" but can also be "end"

# Q-DEC: Quarterly, year ending in December

annual_frame.resample("Q-DEC").ffill()

  annual_frame.resample("Q-DEC").ffill()


Unnamed: 0,Colorado,Texas,New York,Ohio
2000Q1,-0.007336,-0.2643,-0.225689,-0.457763
2000Q2,-0.007336,-0.2643,-0.225689,-0.457763
2000Q3,-0.007336,-0.2643,-0.225689,-0.457763
2000Q4,-0.007336,-0.2643,-0.225689,-0.457763
2001Q1,-0.177954,-0.144177,-0.452602,-0.017718
2001Q2,-0.177954,-0.144177,-0.452602,-0.017718
2001Q3,-0.177954,-0.144177,-0.452602,-0.017718
2001Q4,-0.177954,-0.144177,-0.452602,-0.017718


In [35]:
annual_frame.resample("Q-DEC", convention="end").asfreq()

  annual_frame.resample("Q-DEC", convention="end").asfreq()
  annual_frame.resample("Q-DEC", convention="end").asfreq()


Unnamed: 0,Colorado,Texas,New York,Ohio
2000Q4,-0.007336,-0.2643,-0.225689,-0.457763
2001Q1,,,,
2001Q2,,,,
2001Q3,,,,
2001Q4,-0.177954,-0.144177,-0.452602,-0.017718


Let's break down the concepts of **upsampling** and **downsampling** with `PeriodIndex` in pandas, particularly the rules about **target frequencies**.

---

### 🕰️ **PeriodIndex vs DatetimeIndex**

- **`PeriodIndex`**: Represents **time spans**. For example, a period can represent a month (e.g., "January 2024") or a year (e.g., "2024").
- **`DatetimeIndex`**: Represents **specific points in time** (e.g., "2024-01-01 00:00:00").

---

### 📉 **Downsampling:**

In **downsampling**, you want to **reduce the frequency** of your data (e.g., converting minute-level data to hourly data).

- **Rule**: The target frequency must be a **subperiod** of the source frequency.
  - **Subperiod** means a smaller time span within the original period.
  - **Example**: If your source data is at a **daily** frequency (`"D"`), you can downsample to weekly (`"W"`) or monthly (`"M"`), but you cannot downsample to a **larger** period, such as yearly (`"A"`).

**Example**:

- Data at **daily frequency** → Can downsample to **weekly** (`"W"`) or **monthly** (`"M"`) frequencies.
  
```python
# Example: Downsample daily data to monthly
ts.resample('M').mean()  # valid: daily → monthly
```

---

### 📈 **Upsampling:**

In **upsampling**, you want to **increase the frequency** of your data (e.g., converting hourly data to minute-level data).

- **Rule**: The target frequency must be a **superperiod** of the source frequency.
  - **Superperiod** means a larger time span that the original period can fit within.
  - **Example**: If your source data is at a **monthly** frequency (`"M"`), you can upsample to **daily** (`"D"`) or **hourly** (`"H"`), but you cannot upsample to a **smaller** period, such as minute-level (`"T"`), as that would not make sense.

**Example**:

- Data at **monthly frequency** → Can upsample to **daily** (`"D"`) or **hourly** (`"H"`) frequencies.
  
```python
# Example: Upsample monthly data to daily
ts.resample('D').ffill()  # valid: monthly → daily
```

---

### 🚫 **Invalid Operations:**

- **Downsampling**: You cannot downsample a **larger period** to a **smaller** period.
  - **Invalid**: Trying to downsample from **monthly data** (`"M"`) to **minute data** (`"T"`) doesn't make sense.
  
```python
# This would be invalid
ts.resample('T').mean()  # invalid: monthly → minute
```

- **Upsampling**: You cannot upsample a **smaller period** to a **larger** period.
  - **Invalid**: Trying to upsample from **minute data** (`"T"`) to **yearly data** (`"A"`) doesn't make sense.
  
```python
# This would be invalid
ts.resample('A').ffill()  # invalid: minute → yearly
```

---

### 🚀 **In summary**:

- **Downsampling**: You can reduce the frequency to a **subperiod** (e.g., daily → monthly).
- **Upsampling**: You can increase the frequency to a **superperiod** (e.g., monthly → daily).



In [36]:
# if these rules are not satisfied, an exception will be raised. This mainly affects the quarterly, annual, and weekly frequencies; 
#  for example, the time spans defined by Q-MAR only line up with A-MAR, A-JUN, A-SEP, and A-DEC


annual_frame.resample("Q-MAR").ffill()

  annual_frame.resample("Q-MAR").ffill()


Unnamed: 0,Colorado,Texas,New York,Ohio
2000Q4,-0.007336,-0.2643,-0.225689,-0.457763
2001Q1,-0.007336,-0.2643,-0.225689,-0.457763
2001Q2,-0.007336,-0.2643,-0.225689,-0.457763
2001Q3,-0.007336,-0.2643,-0.225689,-0.457763
2001Q4,-0.177954,-0.144177,-0.452602,-0.017718
2002Q1,-0.177954,-0.144177,-0.452602,-0.017718
2002Q2,-0.177954,-0.144177,-0.452602,-0.017718
2002Q3,-0.177954,-0.144177,-0.452602,-0.017718


## [ Group Time Resampling ]

In [37]:
# For time series data, the resample method is semantically a group operation based on a time intervalization. Here’s a small example table

N = 15
times = pd.date_range("2017-05-20 00:00", freq="1min", periods=N)
df = pd.DataFrame({"time": times, "value": np.arange(N)})
df

Unnamed: 0,time,value
0,2017-05-20 00:00:00,0
1,2017-05-20 00:01:00,1
2,2017-05-20 00:02:00,2
3,2017-05-20 00:03:00,3
4,2017-05-20 00:04:00,4
5,2017-05-20 00:05:00,5
6,2017-05-20 00:06:00,6
7,2017-05-20 00:07:00,7
8,2017-05-20 00:08:00,8
9,2017-05-20 00:09:00,9


In [38]:
# here we can index by "time" and then resample
df.set_index("time").resample("5min").count()

Unnamed: 0_level_0,value
time,Unnamed: 1_level_1
2017-05-20 00:00:00,5
2017-05-20 00:05:00,5
2017-05-20 00:10:00,5


In [39]:
# suppose that a dataframe contains multiple time series, marked by an aditional group key column
df2 = pd.DataFrame({"time": times.repeat(3),
                    "key": np.tile(["a", "b", "c"], N),
                    "value": np.arange(N * 3.)})
df2

Unnamed: 0,time,key,value
0,2017-05-20 00:00:00,a,0.0
1,2017-05-20 00:00:00,b,1.0
2,2017-05-20 00:00:00,c,2.0
3,2017-05-20 00:01:00,a,3.0
4,2017-05-20 00:01:00,b,4.0
5,2017-05-20 00:01:00,c,5.0
6,2017-05-20 00:02:00,a,6.0
7,2017-05-20 00:02:00,b,7.0
8,2017-05-20 00:02:00,c,8.0
9,2017-05-20 00:03:00,a,9.0


In [40]:
# to do the same resampling for each value of "key", we introduce the pandas.Grouper object

time_key = pd.Grouper(freq="5min")

# we can then set the time index, group by "key" and time_key, and aggregate
resampled = (df2.set_index("time")
             .groupby(["key", time_key])
             .sum())
resampled

Unnamed: 0_level_0,Unnamed: 1_level_0,value
key,time,Unnamed: 2_level_1
a,2017-05-20 00:00:00,30.0
a,2017-05-20 00:05:00,105.0
a,2017-05-20 00:10:00,180.0
b,2017-05-20 00:00:00,35.0
b,2017-05-20 00:05:00,110.0
b,2017-05-20 00:10:00,185.0
c,2017-05-20 00:00:00,40.0
c,2017-05-20 00:05:00,115.0
c,2017-05-20 00:10:00,190.0


In [41]:
resampled.reset_index()

Unnamed: 0,key,time,value
0,a,2017-05-20 00:00:00,30.0
1,a,2017-05-20 00:05:00,105.0
2,a,2017-05-20 00:10:00,180.0
3,b,2017-05-20 00:00:00,35.0
4,b,2017-05-20 00:05:00,110.0
5,b,2017-05-20 00:10:00,185.0
6,c,2017-05-20 00:00:00,40.0
7,c,2017-05-20 00:05:00,115.0
8,c,2017-05-20 00:10:00,190.0


One constraint with using pandas.Grouper is that the time must be the index of the Series or DataFrame.