# **Day 13 - Working with Time Series ⏰**

#### **Goal:** Work with date/time-indexed data and master time series manipulation.

#### **Topics To Cover:** `DatetimeIndex`, `Timestamp`, `PeriodIndex`, `Timedelta` and `TimedeltaIndex`, Changing Frequency (`.resample()`), Time Shifting (`.shift()`), Rolling Window Calculations (`.rolling()`).
----

## **Introduction ⏰ Working with Time Series in Pandas**

**Time Series** data is simply a sequence of data points indexed (or listed) in chronological time order. Unlike regular tabular data, the **order** of the observations is inherently meaningful. Pandas was originally developed for financial time series analysis, making it the most robust and powerful Python library for this kind of data.

### **Key Time Series Concepts in Pandas**

| Concept | Definition | Pandas Class |
| :--- | :--- | :--- |
| **Time Series** | A sequence of data points indexed by time. | `pd.Series` or `pd.DataFrame` with a `DatetimeIndex` |
| **Timestamp** | A specific point in time (e.g., 2025-09-21 07:35:13). | `pd.Timestamp` |
| **DatetimeIndex** | A collection of `Timestamp` objects used to index a DataFrame. This is the **most fundamental** structure for time series work. | `pd.DatetimeIndex` |
| **Timedelta** | A duration or difference between two points in time (e.g., 5 hours, 3 days). | `pd.Timedelta` |
| **Period** | A span or interval of time, often used for fixed-frequency periods (e.g., the month of July 2025). | `pd.Period` |

### **Importance in Real Life and AIML**

Understanding time series is crucial because the real world is inherently sequential. From **predicting financial markets** (stock and crypto prices) and **forecasting energy demand** to tracking environmental changes, time series analysis provides the foundation for decision-making.

For an AIML student, this domain is paramount for building **Forecasting Models**. Traditional machine learning models (like Linear Regression) assume independence between data points, but time series models (like **ARIMA**, **Prophet**, and deep learning models like **LSTMs** and **RNNs**) explicitly model the time dependency. Mastering the Pandas tools like `.resample()` and `.shift()` is the necessary **Data Engineering** step to prepare sequential data for these complex AIML algorithms.

----

## **Let's Begin: Loading and Preparing the Data**

The **Hourly Crypto & Stocks Market Data** dataset is perfect for high-frequency time series analysis! It contains detailed timestamps (`Hour`, `Minute`, `Second`) which allow us to practice resampling and windowing.

First, let's load the necessary library and the `stocks.csv` dataset.

In [1]:
#import necessary libraries
import pandas as pd
import numpy as np

# Load the stocks.csv file.
# Note: For time series data, it is a best practice to use the 'parse_dates'
# argument to tell Pandas to immediately recognize and convert the date column.
try:
    df = pd.read_csv(r'..\data\stocks.csv', parse_dates=['timestamp'])
except FileNotFoundError:
    print("Error: 'stocks.csv' not found. Ensure the file is in the current directory.")
    df = None

if df is not None:
    print("DataFrame successfully loaded.")
    print("\n--- Initial Info ---")
    df.info()

DataFrame successfully loaded.

--- Initial Info ---
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 157118 entries, 0 to 157117
Data columns (total 9 columns):
 #   Column     Non-Null Count   Dtype         
---  ------     --------------   -----         
 0   timestamp  157118 non-null  datetime64[ns]
 1   name       157118 non-null  object        
 2   last       157118 non-null  float64       
 3   high       157118 non-null  float64       
 4   low        157118 non-null  float64       
 5   chg_       157118 non-null  float64       
 6   chg_%      157118 non-null  object        
 7   vol_       157118 non-null  object        
 8   time       157118 non-null  object        
dtypes: datetime64[ns](1), float64(4), object(4)
memory usage: 10.8+ MB


In [2]:
# For working with Time Series the first step is to make timestamp or column that contains datetime set as dataframe's index

# check the 'timestamp' column data type
print(f"The current data type of 'timestamp' column: {df['timestamp'].dtype}")

# convert to datetime64[ns] if not already
pd.to_datetime(df['timestamp'], errors='coerce')

# set timestamp as official DatetimeIndex of DataFrame
df.set_index('timestamp', inplace=True)

# Now your DataFrame have DatetimeIndex -> 'timestamp'
print("\n--- First 5 rows ---")
df.head()

The current data type of 'timestamp' column: datetime64[ns]

--- First 5 rows ---


Unnamed: 0_level_0,name,last,high,low,chg_,chg_%,vol_,time
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2025-10-02 17:00:05,Boeing,216.3,217.4,215.31,1.1,+0.51%,1.22M,9:58:40
2025-10-02 17:00:05,Chevron,156.2,156.38,153.96,1.62,+1.05%,730.93K,9:59:01
2025-10-02 17:00:05,Citigroup,97.77,99.45,97.62,-0.94,-0.95%,1.19M,9:58:18
2025-10-02 17:00:05,Caterpillar,492.63,495.98,486.65,11.81,+2.46%,599.89K,9:59:01
2025-10-02 17:00:05,Microsoft,517.67,521.6,516.5,-2.04,-0.39%,2.53M,9:58:02


---

## **13.1 Foundational Time Structures 🧱**
This section explores the core data types Pandas uses to represent time. Before manipulating time, we must ensure our data is stored in the proper datetime64[ns] format. Pandas uses specialized, high-performance data types to handle dates and times, making time-based calculations extremely fast. Understanding these types is the necessary foundation for all advanced time series operations.

### **13.1.1 Timestamp & DatetimeIndex**
These objects represent specific points in time. The Timestamp is a scalar value (a single date/time), and the DatetimeIndex is an array of these Timestamps that acts as the backbone of your time series DataFrame.

| Key Method/Attribute | Purpose                                                        | Parameters/Notes                                                                 |
|-----------------------|----------------------------------------------------------------|----------------------------------------------------------------------------------|
| `pd.to_datetime()`    | Converts objects (strings, integers) into `Timestamp` objects. | Essential for initial data cleaning. Use `errors='coerce'` to turn invalid dates into `NaT` (Not a Time). |
| `.dt` Accessor        | Enables access to date/time properties of a `DatetimeIndex`.   | Used to extract components like year, month, day, or weekday.                     |
| `.dt.day_name()`      | Extracts the full name of the day of the week.                 | Useful for time series analysis involving weekly seasonality.                     |
| `.dt.is_month_start`  | Boolean check to see if a date is the first day of the month.  | One of many Boolean properties (e.g., `is_quarter_end`, `is_year_start`).         |


**Let's do some practice**

***Using `.dt` accessor:***
The .dt accessor is NOT an attribute of a scalar DatetimeIndex itself. It is designed to be used on a Pandas Series that contains datetime objects. In the following table the "col" should be the column with datetime64[ns] data type else you get error

| Property       | Value Returned                                      | Example (DatetimeIndex) | Example (.dt on Series) |
|----------------|-----------------------------------------------------|--------------------------|--------------------------|
| .year          | Integer (e.g., 2025)                                | `df.index.year`            | `df["col"].dt.year  `      |
| .month         | Integer (e.g., 10)                                  | `df.index.month`           | `df["col"].dt.month  `     |
| .day           | Integer (e.g., 2)                                   | `df.index.day`             | `df["col"].dt.day  `       |
| .hour          | Integer (0-23)                                      | `df.index.hour`            | `df["col"].dt.hour  `      |
| .minute        | Integer (0-59)                                      | `df.index.minute`          | `df["col"].dt.minute  `    |
| .day_name()    | Full name of the day (e.g., 'Thursday')             | `df.index.day_name()`      | `df["col"].dt.day_name()`  |
| .month_name()  | Full name of the month (e.g., 'October')            | `df.index.month_name()`    | `df["col"].dt.month_name()`|
| .is_month_start| Boolean (True or False)                             | `df.index.is_month_start`  | `df["col"].dt.is_month_start` |
| .weekday       | Integer day of the week (Monday=0, Sunday=6)        | `df.index.weekday`         | `df["col"].dt.weekday  `   |


In [3]:
# extract year from DatatimeIndex and create new column 'year'
df['year'] = df.index.year
df.head()

Unnamed: 0_level_0,name,last,high,low,chg_,chg_%,vol_,time,year
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
2025-10-02 17:00:05,Boeing,216.3,217.4,215.31,1.1,+0.51%,1.22M,9:58:40,2025
2025-10-02 17:00:05,Chevron,156.2,156.38,153.96,1.62,+1.05%,730.93K,9:59:01,2025
2025-10-02 17:00:05,Citigroup,97.77,99.45,97.62,-0.94,-0.95%,1.19M,9:58:18,2025
2025-10-02 17:00:05,Caterpillar,492.63,495.98,486.65,11.81,+2.46%,599.89K,9:59:01,2025
2025-10-02 17:00:05,Microsoft,517.67,521.6,516.5,-2.04,-0.39%,2.53M,9:58:02,2025


In [4]:
# extract month from DatatimeIndex and create new column 'month'
df['month'] = df.index.month # if you want numeric
df['month name'] = df.index.month_name()
df.tail()

Unnamed: 0_level_0,name,last,high,low,chg_,chg_%,vol_,time,year,month,month name
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2025-03-17 00:45:59,UnitedHealth,488.65,489.45,478.24,7.13,+1.48%,3.92M,14/03,2025,3,March
2025-03-17 00:45:59,Verizon,43.57,43.77,43.01,-0.14,-0.32%,17.43M,14/03,2025,3,March
2025-03-17 00:45:59,Visa A,331.8,332.77,326.38,3.25,+0.99%,7.81M,14/03,2025,3,March
2025-03-17 00:45:59,Walmart,85.35,85.37,84.06,0.85,+1.01%,35.5M,14/03,2025,3,March
2025-03-17 00:45:59,Walt Disney,98.64,99.1,97.42,1.77,+1.83%,10.2M,14/03,2025,3,March


In [5]:
# df['weekday'] = df.index.weekday
df['weekday'] = df.index.day_of_week
df['weekday name'] = df.index.day_name()
df.head()

Unnamed: 0_level_0,name,last,high,low,chg_,chg_%,vol_,time,year,month,month name,weekday,weekday name
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2025-10-02 17:00:05,Boeing,216.3,217.4,215.31,1.1,+0.51%,1.22M,9:58:40,2025,10,October,3,Thursday
2025-10-02 17:00:05,Chevron,156.2,156.38,153.96,1.62,+1.05%,730.93K,9:59:01,2025,10,October,3,Thursday
2025-10-02 17:00:05,Citigroup,97.77,99.45,97.62,-0.94,-0.95%,1.19M,9:58:18,2025,10,October,3,Thursday
2025-10-02 17:00:05,Caterpillar,492.63,495.98,486.65,11.81,+2.46%,599.89K,9:59:01,2025,10,October,3,Thursday
2025-10-02 17:00:05,Microsoft,517.67,521.6,516.5,-2.04,-0.39%,2.53M,9:58:02,2025,10,October,3,Thursday


In [6]:
# df['month start'] = df.index.is_month_start # Indicates whether the date is the first day of the month.
df['quater end'] = df.index.is_quarter_end
df.head()

Unnamed: 0_level_0,name,last,high,low,chg_,chg_%,vol_,time,year,month,month name,weekday,weekday name,quater end
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2025-10-02 17:00:05,Boeing,216.3,217.4,215.31,1.1,+0.51%,1.22M,9:58:40,2025,10,October,3,Thursday,False
2025-10-02 17:00:05,Chevron,156.2,156.38,153.96,1.62,+1.05%,730.93K,9:59:01,2025,10,October,3,Thursday,False
2025-10-02 17:00:05,Citigroup,97.77,99.45,97.62,-0.94,-0.95%,1.19M,9:58:18,2025,10,October,3,Thursday,False
2025-10-02 17:00:05,Caterpillar,492.63,495.98,486.65,11.81,+2.46%,599.89K,9:59:01,2025,10,October,3,Thursday,False
2025-10-02 17:00:05,Microsoft,517.67,521.6,516.5,-2.04,-0.39%,2.53M,9:58:02,2025,10,October,3,Thursday,False


----

### **13.1.2 Timedelta() & TimedeltaIndex**
A Timedelta represents a duration, or the difference between two Timestamp objects. They are essential for calculating lag times, lead times, and measuring the time elapsed between events.

| Key Method/Attribute | Purpose                                          | Parameters/Notes                                                                 |
|-----------------------|--------------------------------------------------|----------------------------------------------------------------------------------|
| `pd.Timedelta()`      | Creates a scalar time duration object.           | Takes arguments like weeks, days, hours, minutes, etc.                           |
| `pd.to_timedelta()`   | Converts columns/series of strings into a `TimedeltaIndex`. | Accepts duration strings like `'5 days'`, `'12h'`, `'30m'`.                      |
| Arithmetic            | Adding/Subtracting `Timedelta` from `Timestamp`. | You can add a `Timedelta` to a `Timestamp` to get a new `Timestamp`.             |

**Let's do some practice**

In [7]:
# 1. Create a Timedelta object (2 days)
two_days = pd.Timedelta(days=2)
print(f"Timedelta object created: {two_days}")

# 2. Get the first timestamp from the index
first_time = df.index[0]
print(f"First Timestamp: {first_time}")

# 3. Calculate the new time point (Arithmetic)
new_time = first_time + two_days

print(f"Time 2 Days Later: {new_time}")

Timedelta object created: 2 days 00:00:00
First Timestamp: 2025-10-02 17:00:05
Time 2 Days Later: 2025-10-04 17:00:05


In [8]:
# 1. Find start and end points
start_time = df.index.min()
end_time = df.index.max()

# 2. Calculate the time span (Timedelta)
time_span = end_time - start_time

# 3. Use .total_seconds()
total_seconds = time_span.total_seconds()

print(f"Data Start: {start_time}")
print(f"Data End: {end_time}")
print(f"Total Time Span (Timedelta): {time_span}")
print(f"Total Span in Seconds: {total_seconds:,.0f} seconds")

Data Start: 2025-03-17 00:45:59
Data End: 2025-10-02 17:00:05
Total Time Span (Timedelta): 199 days 16:14:06
Total Span in Seconds: 17,252,046 seconds


In [9]:
# Create a sample DataFrame for practice
data = {
    'duration_str': ['5 days', '12 hours', '3 minutes', '1.5 seconds'],
    'duration_num': [60, 120, 180, 240] # Assume these are minutes
}
temp_df = pd.DataFrame(data)

# 1. Convert duration strings (Pandas can infer units)
temp_df['td_from_str'] = pd.to_timedelta(temp_df['duration_str'])

# 2. Convert numeric data (Must specify unit='m' for minutes)
temp_df['td_from_num'] = pd.to_timedelta(temp_df['duration_num'], unit='m')

print(f"\nData Type of 'td_from_num': {temp_df['td_from_num'].dtype}")
temp_df


Data Type of 'td_from_num': timedelta64[ns]


Unnamed: 0,duration_str,duration_num,td_from_str,td_from_num
0,5 days,60,5 days 00:00:00,0 days 01:00:00
1,12 hours,120,0 days 12:00:00,0 days 02:00:00
2,3 minutes,180,0 days 00:03:00,0 days 03:00:00
3,1.5 seconds,240,0 days 00:00:01.500000,0 days 04:00:00


----

## **13.2 Time Shifting & Lagging Data**

Time Shifting is a core operation in time series analysis. It involves moving data backward or forward in time, often to compare the current value to a past (lagged) or future (leaded) value. This is essential for calculating returns, finding correlations, and creating features for forecasting.

**13.2.1 `shift()`:**
This is the primary method for time shifting in Pandas. It shifts the data relative to the index, meaning the index values stay the same, but the data in the rows moves up or down.

Key Parameters:
* `periods`: Shifts the data N periods along the time axis.

    * Positive N (e.g., 1): Shifts data down, aligning a value with the next timestamp (creating a lag).
    * Negative N (e.g., −1): Shifts data up, aligning a value with the previous timestamp (creating a lead).
* `freq`: This parameter is optional. Shifts the data based on a specified time offset (e.g., '1D', '3H') instead of periods. Only use if the index is non-uniform or you need to insert empty time periods.

In [10]:
# --- STEP 1: Filter Data ---
boeing_last = df[df['name'] == 'Boeing']['last'].copy()
boeing_df = boeing_last.to_frame() # Convert Series back to DataFrame for multiple columns

# --- STEP 2: Practice Shifting ---
# Lagging: shifts data DOWN, index remains same. Current row gets the PAST value.
boeing_df['lag_1'] = boeing_df['last'].shift(periods=1)

# Leading: shifts data UP, index remains same. Current row gets the FUTURE value.
boeing_df['lead_1'] = boeing_df['last'].shift(periods=-1)

print("\n--- First 5 rows ---")
print(boeing_df.head())
print()
print("\n--- Last 5 rows ---")
print(boeing_df.tail())


--- First 5 rows ---
                      last  lag_1  lead_1
timestamp                                
2025-10-02 17:00:05  216.3    NaN   215.2
2025-10-02 16:30:06  215.2  216.3   215.2
2025-10-02 16:00:00  215.2  215.2   215.2
2025-10-02 15:30:02  215.2  215.2   215.2
2025-10-02 15:00:03  215.2  215.2   215.2


--- Last 5 rows ---
                       last   lag_1  lead_1
timestamp                                  
2025-03-17 04:46:00  161.81  161.81  161.81
2025-03-17 03:45:59  161.81  161.81  161.81
2025-03-17 02:46:01  161.81  161.81  161.81
2025-03-17 01:46:00  161.81  161.81  161.81
2025-03-17 00:45:59  161.81  161.81     NaN


----

## **13.3 Changing Frequency (Resampling) 📅**
Resampling is the process of changing the frequency of your time series data. This is necessary when your data is too granular (e.g., seconds) or not granular enough (e.g., yearly) for the analysis you need to perform.

The .resample() method is similar to a .groupby() operation, but it groups rows based on a time interval rather than a categorical value.

**13.3.1 `resample()`**
The `.resample()` method requires two main components:

* A Frequency String (Rule): A string that specifies the new time interval (e.g., 'D' for daily, 'W' for weekly).

* An Aggregation Method: A function to summarize the data within each new interval (e.g., .mean(), .sum(), .max()).

Key Parameters:
| Key Method/Parameter             | Purpose                                                                 | Notes                                                                 |
|----------------------------------|-------------------------------------------------------------------------|----------------------------------------------------------------------|
| `.resample(rule='...')`            | Groups the data into new time buckets based on the rule.                | Downsampling (e.g., hourly → daily) requires aggregation (e.g., .mean()). |
| rule (string or DateOffset)      | Defines the frequency of the resampling window.                         | Examples: 'D' (Daily), 'W' (Weekly), 'M' (Month-end), 'Q' (Quarterly), 'A' (Year-end), 'H' (Hourly), 'T' or 'min' (Minutes), 'S' (Seconds). |
| `.mean()`, `.sum()`, `.min()`, `.max()`  | Aggregation methods applied to each resampled bucket.                   | Must follow resample(). Use .ohlc() for financial data.               |
| `.ohlc()`                          | Returns Open, High, Low, Close values for each resampled bucket.        | Especially used in time-series/finance contexts.                      |
| label ('left' or 'right')        | Controls labeling of resampled bins.                                    | Example: 'left' → label by start of bin, 'right' → by end.           |
| closed ('left' or 'right')       | Defines which side of interval is inclusive when binning.               | Default depends on frequency.                                         |
| origin                           | Defines the reference point for binning.                                | Options: 'epoch', 'start', 'end', or specific timestamp.              |
| offset                           | Shifts the resampling window by a fixed offset.                         | Example: rule='7D', offset='2D'.                                      |
| loffset                          | Deprecated, use `offset` or adjust labels manually.                     | Previously used to adjust bin labels.                                 |
| on                               | Column to use instead of index for resampling.                          | Useful if datetime info is in a column, not the index.                |
| axis                             | Axis to resample on.                                                    | Default is axis=0 (rows).                                             |
| group_keys                       | When resampling on multiple groupers, controls inclusion of group keys. | Rarely used, more for groupby + resample.                            |

**Let's do some practice:**

In [11]:
# --- STEP 1 & 2: Filter and Resample ---
# Filter to get only the 'last' price for 'Boeing'
boeing_last = df[df['name'] == 'Boeing']['last']

# Resample to a daily frequency ('D') and calculate the mean
daily_data = boeing_last.resample(rule='D').mean().to_frame()

print(f"Shows the resampled to a daily frequency last stock price of boeing.")
daily_data.head()

Shows the resampled to a daily frequency last stock price of boeing.


Unnamed: 0_level_0,last
timestamp,Unnamed: 1_level_1
2025-03-17,161.684167
2025-03-18,161.542083
2025-03-19,164.963333
2025-03-20,172.493333
2025-03-21,174.865


In [12]:
# Upsample from Daily ('D') to Hourly ('h') and fill the gaps using ffill
upsampled_hourly = daily_data.resample(rule='h').ffill()

# Print a 2-day period to show the hourly fills clearly
upsampled_hourly['2025-05-15':'2025-05-16']

Unnamed: 0_level_0,last
timestamp,Unnamed: 1_level_1
2025-05-15 00:00:00,205.295
2025-05-15 01:00:00,205.295
2025-05-15 02:00:00,205.295
2025-05-15 03:00:00,205.295
2025-05-15 04:00:00,205.295
2025-05-15 05:00:00,205.295
2025-05-15 06:00:00,205.295
2025-05-15 07:00:00,205.295
2025-05-15 08:00:00,205.295
2025-05-15 09:00:00,205.295


---

## **13.4 Rolling Window & Expanding Calculations**
Rolling Window analysis (often called Moving Averages) calculates a statistic (like the mean or standard deviation) over a fixed, sliding window of observations. This technique is used to smooth out short-term fluctuations and highlight longer-term trends.

**13.4.1 `rolling()`:** 
This method works by taking a window of a specified size and applying a function to the data within that window.

Key parameters:
| Parameter         | Purpose                                                                 | Notes                                                                 |
|-------------------|-------------------------------------------------------------------------|----------------------------------------------------------------------|
| window            | Size of the moving window (int, offset string, or BaseIndexer).         | Example: `3` (3 periods), `'7D'` (7 days). Required.                 |
| min_periods       | Minimum number of observations in window to have a value.               | Default = window size.                                                |
| center            | If True, sets labels at the center of the window instead of the right.  | Useful for aligning results in plots.                                 |
| win_type          | Apply a specific window function (string).                             | Examples: `'boxcar'`, `'triang'`, `'blackman'`, `'hamming'`, `'bartlett'`, `'parzen'`, `'hann'`, `'kaiser'`. |
| axis              | Axis along which to apply the rolling window.                          | Default = 0 (rows).                                                   |
| closed            | Which side of the window is inclusive: `'right'`, `'left'`, `'both'`, `'neither'`. | Only available with offset-based windows (e.g., `'7D'`).              |
| method            | Calculation method: `'single'` or `'table'`.                           | `'table'` is experimental, for DataFrames with multiple columns.      |


In [13]:
# --- STEP 1: Filter Data ---
boeing_last = df[df['name'] == 'Boeing']['last'].copy()
boeing_df = boeing_last.to_frame()

# --- STEP 2: Practice Rolling Window ---
# Calculate the 10-period Rolling Mean
boeing_df['MA_10'] = boeing_df['last'].rolling(window=10).mean()

boeing_df.head(15)

Unnamed: 0_level_0,last,MA_10
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1
2025-10-02 17:00:05,216.3,
2025-10-02 16:30:06,215.2,
2025-10-02 16:00:00,215.2,
2025-10-02 15:30:02,215.2,
2025-10-02 15:00:03,215.2,
2025-10-02 14:30:01,215.2,
2025-10-02 14:00:02,215.2,
2025-10-02 13:30:00,215.2,
2025-10-02 13:00:01,215.2,
2025-10-02 12:30:01,215.2,215.31


In [14]:
# Practice Rolling Window with Time Offset and Volatility ---
# Calculate the 3-Day Rolling Standard Deviation (Volatility)
# '3D' = 3 calendar days
boeing_df['Volatility_3D'] = boeing_df['last'].rolling(window='3D', min_periods=1).std()

boeing_df.head(15)

Unnamed: 0_level_0,last,MA_10,Volatility_3D
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2025-10-02 17:00:05,216.3,,
2025-10-02 16:30:06,215.2,,0.777817
2025-10-02 16:00:00,215.2,,0.635085
2025-10-02 15:30:02,215.2,,0.55
2025-10-02 15:00:03,215.2,,0.491935
2025-10-02 14:30:01,215.2,,0.449073
2025-10-02 14:00:02,215.2,,0.415761
2025-10-02 13:30:00,215.2,,0.388909
2025-10-02 13:00:01,215.2,,0.366667
2025-10-02 12:30:01,215.2,215.31,0.347851


**13.4.2 `expanding()`:** This method calculates a statistic by considering all prior data points up to the current timestamp. The window starts at the beginning of the time series and grows (expands) with each new observation.

Key Parameters:
| Parameter    | Purpose                                                               | Notes                                                                 |
|--------------|----------------------------------------------------------------------|----------------------------------------------------------------------|
| min_periods  | Minimum number of observations in the expanding window required.      | Default = 1. If fewer than `min_periods`, result is NaN.              |
| axis         | Axis along which the expanding calculation is applied.                | Default = 0 (rows).                                                   |
| method       | Calculation method: `'single'` or `'table'`.                          | `'table'` is experimental, for DataFrames with multiple columns.      |

*Usage Notes*

* After `.expanding()`, you must call an aggregation/stat function, e.g.:

    * `.expanding().mean()` → expanding (cumulative) mean

    * `.expanding().sum()` → cumulative sum

    * `.expanding().max()`, .expanding().var(), .expanding().apply(func)

* It’s very similar to `.rolling()`, but instead of a sliding window, the window starts at the first element and expands until the current index.

In [15]:
# Calculate the Expanding Mean (Cumulative Average)
boeing_df['Cumulative_Mean'] = boeing_df['last'].expanding(min_periods=1).mean()

boeing_df.head(15)

Unnamed: 0_level_0,last,MA_10,Volatility_3D,Cumulative_Mean
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2025-10-02 17:00:05,216.3,,,216.3
2025-10-02 16:30:06,215.2,,0.777817,215.75
2025-10-02 16:00:00,215.2,,0.635085,215.566667
2025-10-02 15:30:02,215.2,,0.55,215.475
2025-10-02 15:00:03,215.2,,0.491935,215.42
2025-10-02 14:30:01,215.2,,0.449073,215.383333
2025-10-02 14:00:02,215.2,,0.415761,215.357143
2025-10-02 13:30:00,215.2,,0.388909,215.3375
2025-10-02 13:00:01,215.2,,0.366667,215.322222
2025-10-02 12:30:01,215.2,215.31,0.347851,215.31


***

## **Summary & Key Takeaways 📝**

| Concept | Purpose | Key Method/Tool |
| :--- | :--- | :--- |
| **Foundational Structures** | Representing specific time points and durations. | **`df.index.property`** (e.g., `.year`), **`pd.Timedelta()`** |
| **Time Component Extraction** | Breaking dates/times into usable features (e.g., year, month, hour). | **`df.index.year`**, **`df.index.day_name()`** |
| **Lagging/Leading Data** | Comparing a current value to a past or future value. | **`.shift(N)`** (Positive $N$ for lag, Negative $N$ for lead) |
| **Changing Frequency** | Converting data from one time granularity to another (e.g., hourly to daily). | **`.resample(rule='...')`** |
| **Smoothing/Trend Analysis** | Calculating statistics over a fixed, sliding window to see trends. | **`.rolling(window=N)`** |
| **Cumulative Metrics** | Calculating statistics based on all data from the beginning up to the current point. | **`.expanding()`** |


***

### **Common Confusion ❓**

| Point of Confusion | Clarification |
| :--- | :--- |
| **`.dt` vs. Index Attributes** | The **`.dt` accessor** is used on a Pandas **Series** of datetime objects. For a **`DatetimeIndex`** (the DataFrame's index), properties like `.year`, `.month`, and methods like `.day_name()` are accessed **directly** (e.g., `df.index.year`). |
| **`df.shift(1)` vs. `df.resample().shift(1)`** | **`.shift(1)`** moves the data row-by-row, regardless of time gaps. **`.resample().shift(1)`** (used in advanced scenarios) shifts the data by the new time *frequency*. Stick to standard `.shift(N)` for simple lag/lead features. |
| **Time Offset vs. Period Offset** | **`window=10`** means 10 *observations*. **`window='3D'`** means all observations within the last 3 *calendar days*. Use the time offset when your data is irregularly spaced. |
| **`Timedelta` vs. `DateOffset`** | A **`Timedelta`** is a fixed duration (e.g., always 5 days). A **`pd.DateOffset()`** is a business-aware duration (e.g., '1 Month' means the same day in the next month, handling month-end differences). |


***

### **When to Use Which? (Resample, Rolling, Expanding) 💡**

These three methods are the "Big Three" of time series aggregation, but they serve completely different purposes.

#### **1. `.resample(rule='...')`**
* **Goal:** Change the **Granularity** (change the size of the time buckets).
* **Logic:** **Bucket-based aggregation.** Data is grouped by new, fixed time intervals (e.g., every Monday, every 5 minutes).
* **Use Case:** Converting 1-minute stock prices to **Daily** Open-High-Low-Close (OHLC), or converting sales data from daily to **Monthly** sums.

#### **2. `.rolling(window=N)`**
* **Goal:** **Smooth** out noise and reveal underlying **Trends**.
* **Logic:** **Sliding Window aggregation.** For any given point, the calculation only looks at the fixed $N$ points immediately preceding it.
* **Use Case:** Calculating a **10-day Moving Average (MA)** to see the current trend, or calculating a **30-day Rolling Standard Deviation** to track volatility.

#### **3. `.expanding()`**
* **Goal:** Compute **Cumulative** metrics based on full history.
* **Logic:** **Growing Window aggregation.** The window starts at the first data point and includes *everything* up to the current point.
* **Use Case:** Finding the **Cumulative Average** price of a stock since its IPO, or tracking the total **Cumulative Sum** of revenue over the fiscal year.