## [ Date Ranges, Frequencies, and Shifting ]
- In pandas, a **generic time series** is one where the time points (timestamps) **do not follow a regular pattern**.  
  → For example, data collected on random days.

- This is **okay for many cases**, like when data is naturally irregular (e.g., manual logs, alerts, etc.).

- But **sometimes we want to work with a fixed rhythm**, like:
  - **daily**
  - **monthly**
  - **every 15 minutes**, etc.

- Even if that means **some time points will be missing**, it’s still useful for:
  - consistent analysis
  - plotting
  - comparing across equal time intervals

- **Good news:**  
  Pandas provides **many built-in frequencies** and tools (like `.resample()`, `.asfreq()`) to **convert** or **reorganize** your time series into fixed intervals.


#### Bottom Line:
- Generic time series = irregular dates.
- Fixed-frequency = regular intervals (daily, monthly, etc.).
- Pandas can **convert between them** using powerful tools like `resample()`.

---

#### Generic Time Series in Pandas 
- In pandas it is simply a Series or DataFrame that uses date/time values as the index, but is not tied to any specific frequency (like daily, weekly, monthly, etc)
- It's just time-stamped data, ordered by time, but not necessarily at fixed intervals.

#### Contrast With Fixed-Frequency Time Series
- Fixed frequency: Data collected every day, every hour, etc. -> `freq='D'`, `freq='H'`
- Generic: Dates can be random or unevenly spaced

#### Summary
- A generic time series is any time-indexed data in pandas
- Useful when data is recorded irregularly (e.g., sensor readings, manual logs)
- You can still sort, slice, and analyze it like regular time series

In [56]:
import numpy as np 
import pandas as pd 
from datetime import datetime 

dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
         datetime(2011, 1, 7), datetime(2011, 1, 8),
         datetime(2011, 1, 10), datetime(2011, 1, 12)]

ts = pd.Series(np.random.standard_normal(6), index=dates)
ts

2011-01-02   -1.319853
2011-01-05    1.093538
2011-01-07    0.200472
2011-01-08   -1.563153
2011-01-10    0.471333
2011-01-12   -0.841870
dtype: float64

In [57]:
# we can convert the sample time series to fixed daily frequency by calling 'resample'
resampler = ts.resample("D")
resampler
# the string "D" is interpreted as daily frequency

# conversion between frequenciess or resampling is a big enough topic to have its own section later
# first we have to learn, how to use the base frequencies and multiple thereof

<pandas.core.resample.DatetimeIndexResampler object at 0x7523b36f5810>

## [ Generating Date Ranges ]
It is responsible for generating a DatetimeIndex with an indicated length according to a particular frequency

In [58]:
index = pd.date_range("2025-04-01", "2025-06-01")
index

DatetimeIndex(['2025-04-01', '2025-04-02', '2025-04-03', '2025-04-04',
               '2025-04-05', '2025-04-06', '2025-04-07', '2025-04-08',
               '2025-04-09', '2025-04-10', '2025-04-11', '2025-04-12',
               '2025-04-13', '2025-04-14', '2025-04-15', '2025-04-16',
               '2025-04-17', '2025-04-18', '2025-04-19', '2025-04-20',
               '2025-04-21', '2025-04-22', '2025-04-23', '2025-04-24',
               '2025-04-25', '2025-04-26', '2025-04-27', '2025-04-28',
               '2025-04-29', '2025-04-30', '2025-05-01', '2025-05-02',
               '2025-05-03', '2025-05-04', '2025-05-05', '2025-05-06',
               '2025-05-07', '2025-05-08', '2025-05-09', '2025-05-10',
               '2025-05-11', '2025-05-12', '2025-05-13', '2025-05-14',
               '2025-05-15', '2025-05-16', '2025-05-17', '2025-05-18',
               '2025-05-19', '2025-05-20', '2025-05-21', '2025-05-22',
               '2025-05-23', '2025-05-24', '2025-05-25', '2025-05-26',
      

In [59]:
# by default, pandas.date_range generates daily timestamps.
# if we pass only a start or end date, we must pass a number of periods to generate

pd.date_range(start="2025-04-01", periods=20)

DatetimeIndex(['2025-04-01', '2025-04-02', '2025-04-03', '2025-04-04',
               '2025-04-05', '2025-04-06', '2025-04-07', '2025-04-08',
               '2025-04-09', '2025-04-10', '2025-04-11', '2025-04-12',
               '2025-04-13', '2025-04-14', '2025-04-15', '2025-04-16',
               '2025-04-17', '2025-04-18', '2025-04-19', '2025-04-20'],
              dtype='datetime64[ns]', freq='D')

In [60]:
pd.date_range(end="2025-04-20", periods=20)

DatetimeIndex(['2025-04-01', '2025-04-02', '2025-04-03', '2025-04-04',
               '2025-04-05', '2025-04-06', '2025-04-07', '2025-04-08',
               '2025-04-09', '2025-04-10', '2025-04-11', '2025-04-12',
               '2025-04-13', '2025-04-14', '2025-04-15', '2025-04-16',
               '2025-04-17', '2025-04-18', '2025-04-19', '2025-04-20'],
              dtype='datetime64[ns]', freq='D')

In [61]:
# the start and end dates define strict boundaries for the generated date index.
# example:
    # if you wanted a date index containing the last business day of each month, you would pass the "BME" frequency  and only dates falling on or inside the date interval will be included

pd.date_range("2024-01-01", "2026-01-01", freq="BME")

DatetimeIndex(['2024-01-31', '2024-02-29', '2024-03-29', '2024-04-30',
               '2024-05-31', '2024-06-28', '2024-07-31', '2024-08-30',
               '2024-09-30', '2024-10-31', '2024-11-29', '2024-12-31',
               '2025-01-31', '2025-02-28', '2025-03-31', '2025-04-30',
               '2025-05-30', '2025-06-30', '2025-07-31', '2025-08-29',
               '2025-09-30', '2025-10-31', '2025-11-28', '2025-12-31'],
              dtype='datetime64[ns]', freq='BME')


###  **Date Offset Aliases (Frequencies) in Pandas**

#### 📅 **Calendar Frequencies**

| Code      | Description                         |
|-----------|-------------------------------------|
| `B`       | Business day frequency              |
| `C`       | Custom business day (requires calendar) |
| `D`       | Calendar day frequency              |
| `W`       | Weekly frequency                    |
| `W-MON`...`W-SUN` | Weekly on specific weekday   |
| `M`       | Month end frequency                 |
| `MS`      | Month start frequency               |
| `BM`      | Business month end frequency        |
| `BMS`     | Business month start frequency      |
| `CBM`     | Custom business month end frequency |
| `CBMS`    | Custom business month start frequency|
| `Q`       | Quarter end frequency               |
| `QS`      | Quarter start frequency             |
| `BQ`      | Business quarter end frequency      |
| `BQS`     | Business quarter start frequency    |
| `A` or `Y`| Year end frequency                  |
| `AS` or `YS` | Year start frequency             |
| `BA` or `BY`| Business year end frequency       |
| `BAS` or `BYS` | Business year start frequency  |
| `BH`      | Business hour frequency             |



#### ⏰ **Time Frequencies**

| Code      | Description                         |
|-----------|-------------------------------------|
| `H`       | Hourly frequency                    |
| `T` or `min` | Minutely frequency               |
| `S`       | Secondly frequency                  |
| `L` or `ms` | Millisecond frequency             |
| `U` or `us` | Microsecond frequency             |
| `N`       | Nanosecond frequency                |


#### 🧪 **Special / Experimental Frequencies**

| Code      | Description                         |
|-----------|-------------------------------------|
| `WOM-1MON` to `WOM-5SUN` | Week of month (e.g., 2nd Friday of month) |
| `Q-JAN`, `Q-FEB`, ..., `Q-DEC` | Quarter ending in a specific month |
| `A-JAN`, `A-FEB`, ..., `A-DEC` | Year ending in a specific month    |


#### 🛠️ **Custom Frequencies**

You can also define **custom intervals** using multiples:

| Example  | Meaning                            |
|----------|------------------------------------|
| `2D`     | Every 2 days                       |
| `15T`    | Every 15 minutes                   |
| `3H`     | Every 3 hours                      |
| `10S`    | Every 10 seconds                   |
| `5B`     | Every 5 business days              |


### 📝 Notes:

- Some composite/custom frequencies (like `2W-MON`) are also valid.
- `C`, `CBM`, `CBMS`, `CBQ`, `CBQS` require a custom business calendar.
- For business/financial data, these variations are helpful for real-world schedules.


In [62]:
# pandas.date_range by default preserves the time (if any) of the start or end timestamp
pd.date_range("2025-05-02 12:56:31", periods=5)

DatetimeIndex(['2025-05-02 12:56:31', '2025-05-03 12:56:31',
               '2025-05-04 12:56:31', '2025-05-05 12:56:31',
               '2025-05-06 12:56:31'],
              dtype='datetime64[ns]', freq='D')

In [63]:
# sometimes we will have start or end dates with time information but want to generate a set of timestamps normalized to midnight as a convention.
# to do this, there is a normalize option

pd.date_range("2025-01-01 12:56:31", periods=5, normalize=True)

# When you create or work with timestamps (dates and times), some may include time parts like 10:30 AM, 15:00, etc. But in many time series tasks, we don't need the time — we only care about the date part, and we want all timestamps to be set to midnight (00:00:00).

# This is called normalizing the timestamp.

DatetimeIndex(['2025-01-01', '2025-01-02', '2025-01-03', '2025-01-04',
               '2025-01-05'],
              dtype='datetime64[ns]', freq='D')

## [ Frequencies and Date Offsets ]
- frequencies in pandas are composed of a "base frequency" and a multiplier
- base frequencies are typically referred to by a string alias, like "M" for monthly or "H" for hourly.
- for each base frequency, there is an object referred to as a `date offset`. 

In [64]:
# for example, hourly frequency can be represented with the Hour class
from pandas.tseries.offsets import Hour, Minute
hour = Hour()
hour

<Hour>

In [65]:
# we can define a multiple of an offset by passing an integer
four_hours = Hour(4)
four_hours

<4 * Hours>

In [66]:
# in most applications, you would never need to explicitly create one of these objects; instead you'd use a string like "H" or "4H".
#putting an integer before the base frequency creates a multiple 

pd.date_range("2025-01-01", "2025-01-04 23:59", freq="4h")

DatetimeIndex(['2025-01-01 00:00:00', '2025-01-01 04:00:00',
               '2025-01-01 08:00:00', '2025-01-01 12:00:00',
               '2025-01-01 16:00:00', '2025-01-01 20:00:00',
               '2025-01-02 00:00:00', '2025-01-02 04:00:00',
               '2025-01-02 08:00:00', '2025-01-02 12:00:00',
               '2025-01-02 16:00:00', '2025-01-02 20:00:00',
               '2025-01-03 00:00:00', '2025-01-03 04:00:00',
               '2025-01-03 08:00:00', '2025-01-03 12:00:00',
               '2025-01-03 16:00:00', '2025-01-03 20:00:00',
               '2025-01-04 00:00:00', '2025-01-04 04:00:00',
               '2025-01-04 08:00:00', '2025-01-04 12:00:00',
               '2025-01-04 16:00:00', '2025-01-04 20:00:00'],
              dtype='datetime64[ns]', freq='4h')

In [67]:
# many offsets can be combined by addition
Hour(2) + Minute(30)

<150 * Minutes>

In [68]:
# similarly, we can pass frequency strings, like "1h30min", that will efectively be parsed to the same expression

pd.date_range("2000-01-01", periods=10, freq="1h30min")

DatetimeIndex(['2000-01-01 00:00:00', '2000-01-01 01:30:00',
               '2000-01-01 03:00:00', '2000-01-01 04:30:00',
               '2000-01-01 06:00:00', '2000-01-01 07:30:00',
               '2000-01-01 09:00:00', '2000-01-01 10:30:00',
               '2000-01-01 12:00:00', '2000-01-01 13:30:00'],
              dtype='datetime64[ns]', freq='90min')

- some frequencies describe points in time that are not evenly spaced.
- for example, "M" (calendar month end) and "BM" (last business/weekday of month) depend on the number of days in a month and, in the latter case, whether the month ends on a weekend or not.
- We refer to these as anchored offsets.
- Users can define their own custom frequency classes to provide date logic not available in pandas

In [69]:
## Week of month dates
    # one useful frequency class is "week of month" starting with WOM. 
    # this enables you to get dates like the third Friday of each month

monthly_dates = pd.date_range("2025-01-01", "2025-10-01", freq="WOM-3FRI")
# monthly_dates
list(monthly_dates)

[Timestamp('2025-01-17 00:00:00'),
 Timestamp('2025-02-21 00:00:00'),
 Timestamp('2025-03-21 00:00:00'),
 Timestamp('2025-04-18 00:00:00'),
 Timestamp('2025-05-16 00:00:00'),
 Timestamp('2025-06-20 00:00:00'),
 Timestamp('2025-07-18 00:00:00'),
 Timestamp('2025-08-15 00:00:00'),
 Timestamp('2025-09-19 00:00:00')]

## [ Shifting (Leading and Lagging) Data ]
- 'Shifting' refers to moving data backward and forward through time.
- Both Series and DataFrame have a 'shift' method for doing naive shifts forward or backward, leaving the index unmodified

In [70]:
ts = pd.Series(np.random.standard_normal(4), index=pd.date_range("2000-01-01", periods=4, freq="MS"))
ts

2000-01-01    1.405658
2000-02-01   -0.739220
2000-03-01    0.927926
2000-04-01    0.656792
Freq: MS, dtype: float64

In [71]:
ts.shift(2)

2000-01-01         NaN
2000-02-01         NaN
2000-03-01    1.405658
2000-04-01   -0.739220
Freq: MS, dtype: float64

In [72]:
ts.shift(-2)

2000-01-01    0.927926
2000-02-01    0.656792
2000-03-01         NaN
2000-04-01         NaN
Freq: MS, dtype: float64

When we shift like this, missing data is introduced either at the start or the end of the time series.


- A common use of `shift` is calculating the percentage change between each value and the one before it in a time series or multiple time series as DataFrame columns
- This is expressed as: `ts / ts.shift(1) - 1`

In [73]:
print(ts)
print(ts / ts.shift(1) - 1)

2000-01-01    1.405658
2000-02-01   -0.739220
2000-03-01    0.927926
2000-04-01    0.656792
Freq: MS, dtype: float64
2000-01-01         NaN
2000-02-01   -1.525889
2000-03-01   -2.255278
2000-04-01   -0.292194
Freq: MS, dtype: float64


- Naive Shift (default behavior)
    - this shifts the data values down by 1 but leaves the index (timestamps) unchanged.
    - The first value becomes NaN because there's no earlier data to use

- Timestamp-Aware Shift
    - if your time series has a known frequency, you can pass it to `shift()`
    - instead of moving data down, move the timestamps forward by 1 ..

In [74]:
ts.shift(2, freq="MS")

2000-03-01    1.405658
2000-04-01   -0.739220
2000-05-01    0.927926
2000-06-01    0.656792
Freq: MS, dtype: float64

In [75]:
# other frequencies can be passed, too, giving you some flexibility in how to lead and lag the data
ts.shift(3, freq="D")

2000-01-04    1.405658
2000-02-04   -0.739220
2000-03-04    0.927926
2000-04-04    0.656792
dtype: float64

In [76]:
ts.shift(1, freq="90min")

2000-01-01 01:30:00    1.405658
2000-02-01 01:30:00   -0.739220
2000-03-01 01:30:00    0.927926
2000-04-01 01:30:00    0.656792
dtype: float64

### Shifting Dates with Offsets
- pandas provide special objects like `MonthEnd`, `YearBegin`, `Week`, etc., to represent **calendar-based time offsets** (called **anchored offsets**).
- we can add these to `datetime` of `Timestamp` objects to shift dates according to calendar logic

##### What does "roll forward" mean?
if you add an offset like MonthEnd() to a date that is not already the end of the month, pandas will "roll forward" to the next month-end

##### Summary
- Anchored offset (like MonthEnd) aligns with a calendar rule.
- If the date is not aligned, it rolls forward to the next valid date.
- Useful for financial, monthly reporting, forecasting, etc

In [77]:
# the pandas date offsets can also be used with datetime or Timestamp objects
from pandas.tseries.offsets import Day, MonthEnd

now = datetime(2025, 4, 15)

now + 3 * Day()

Timestamp('2025-04-18 00:00:00')

- In pandas, an offset is a special object that represents a time-based increment -- like "1 day", "2 months", etc.
- An offset is a rule-based time interval you can add to or subtract from a date (like a `datetime` or `Timestamp`) to move forward or backward in time.

- Why use offsets?
    - To shift dates meaningfully (e.g., to the end of the month).
    - To resample time series (e.g., daily to monthly)
    - To generate date ranges (e.g., every 3rd Friday).
    - To align irregular dates to regular calendar periods.

In [78]:
# If you add an anchored offset like MonthEnd, the first increment will “roll forward” a date to the next date according to the frequency rule

now + MonthEnd()

Timestamp('2025-04-30 00:00:00')

In [79]:
now + MonthEnd(2)

Timestamp('2025-05-31 00:00:00')

In [80]:
# anchored offsets can explicitly "roll" dates forward or backward by simply using their `rollforward` and `rollback` methods, respectively

offset = MonthEnd()

offset.rollforward(now)

Timestamp('2025-04-30 00:00:00')

In [81]:
offset.rollback(now)

Timestamp('2025-03-31 00:00:00')

In [82]:
# a creative use of date offsets is to use these methods with groupby

ts = pd.Series(np.random.standard_normal(20), index=pd.date_range("2000-01-15", periods=20, freq="4d"))

ts

2000-01-15    0.510918
2000-01-19   -0.574625
2000-01-23    1.115692
2000-01-27    0.627061
2000-01-31    1.934287
2000-02-04    0.418243
2000-02-08   -0.049342
2000-02-12    0.516524
2000-02-16    0.576193
2000-02-20    0.043990
2000-02-24   -1.297040
2000-02-28    0.352054
2000-03-03    1.245451
2000-03-07   -0.286880
2000-03-11    0.090543
2000-03-15   -1.114484
2000-03-19    0.535872
2000-03-23    0.706531
2000-03-27    1.092861
2000-03-31   -0.062034
Freq: 4D, dtype: float64

In [83]:
ts.groupby(MonthEnd().rollforward).mean()

2000-01-31    0.722667
2000-02-29    0.080089
2000-03-31    0.275982
dtype: float64

In [84]:
# an easier and faster way to do this is with resample

ts.resample("ME").mean()

2000-01-31    0.722667
2000-02-29    0.080089
2000-03-31    0.275982
Freq: ME, dtype: float64