# üìò Section 8: Time Series and Date Operations in Pandas

**Level:** Intermediate ‚Üí Advanced

Time-series analysis is one of Pandas' most powerful capabilities ‚Äî enabling you to work efficiently with time-indexed data such as stock prices, IoT sensor readings, or sales trends.

In this section, we‚Äôll cover:
- Converting strings to datetime objects
- Indexing, resampling, and shifting time-series
- Rolling windows and moving averages
- Handling missing or irregular time intervals
- Real-world time-series applications

---

## üîπ 8.1 Creating and Parsing Date Columns

Let's create a dataset that simulates **daily sales** for an online retail company over 2 months.

In [None]:
import pandas as pd
import numpy as np

# Simulate daily sales data for 2 months
date_rng = pd.date_range(start='2024-01-01', end='2024-02-29', freq='D')
np.random.seed(42)

sales_data = pd.DataFrame({
    'date': date_rng,
    'sales': np.random.randint(200, 1000, size=len(date_rng)),
    'returns': np.random.randint(0, 50, size=len(date_rng))
})

sales_data.head()

### Converting to Datetime and Setting Index

You can convert columns to datetime using `pd.to_datetime()` and set it as an index for time-based operations.

In [None]:
sales_data['date'] = pd.to_datetime(sales_data['date'])
sales_data = sales_data.set_index('date')
sales_data.info()

## üîπ 8.2 Resampling and Frequency Conversion

Resampling allows you to aggregate data to a different frequency ‚Äî for example, **weekly or monthly sales summaries**.

We‚Äôll use `resample()` with common frequency aliases:
- `'W'` ‚Üí Weekly
- `'M'` ‚Üí Monthly
- `'Q'` ‚Üí Quarterly
- `'A'` ‚Üí Annual

In [None]:
# Weekly total sales
weekly_sales = sales_data['sales'].resample('W').sum()
weekly_sales.head()

In [None]:
# Monthly summary (sales and returns)
monthly_summary = sales_data.resample('M').agg({'sales': 'sum', 'returns': 'sum'})
monthly_summary

## üîπ 8.3 Rolling Windows and Moving Averages

Rolling operations help analyze trends and smooth out short-term fluctuations.

For example, compute a **7-day moving average** for sales and returns.

In [None]:
sales_data['7d_avg_sales'] = sales_data['sales'].rolling(window=7).mean()
sales_data['7d_avg_returns'] = sales_data['returns'].rolling(window=7).mean()

sales_data[['sales', '7d_avg_sales']].head(10)

## üîπ 8.4 Time Shifting and Lag Features

We can shift time-series data forward or backward to analyze lag effects ‚Äî e.g., **previous week‚Äôs sales** or **forecasting features**.

In [None]:
sales_data['prev_day_sales'] = sales_data['sales'].shift(1)
sales_data['sales_diff'] = sales_data['sales'] - sales_data['prev_day_sales']
sales_data[['sales', 'prev_day_sales', 'sales_diff']].head(10)

## üîπ 8.5 Handling Missing Dates and Forward-Filling

Real-world time-series data often has missing timestamps. Pandas allows you to **reindex** and **fill** missing values easily.

In [None]:
# Drop some random days
irregular_sales = sales_data.sample(frac=0.9).sort_index()

# Reindex to restore full date range
fixed_sales = irregular_sales.reindex(pd.date_range('2024-01-01', '2024-02-29', freq='D'))

# Forward fill missing values
fixed_sales_ffill = fixed_sales.ffill()
fixed_sales_ffill.head(10)

## ‚öôÔ∏è Under the Hood

- Pandas‚Äô **DatetimeIndex** supports efficient slicing, arithmetic, and resampling.
- The **`resample()`** method internally uses groupby-like bins for date ranges.
- **`rolling()`** uses an internal window object that slides over fixed-size segments.
- Operations are **vectorized** and highly optimized in C.

---

## üíº Real-World Problem 1 ‚Äî Sales Performance Over Time

**Scenario:** You are analyzing daily sales data to detect revenue trends and possible slowdowns.

**Tasks:**
1. Calculate weekly sales growth rate.
2. Identify weeks with negative growth.
3. Visualize top 5 weeks by total revenue.

In [None]:
# Weekly sales growth rate
weekly = sales_data['sales'].resample('W').sum().to_frame()
weekly['growth_rate'] = weekly['sales'].pct_change().round(3)

# Weeks with negative growth
negative_growth = weekly[weekly['growth_rate'] < 0]
print('Weeks with negative growth:')
display(negative_growth)

# Top 5 weeks by total revenue
top_weeks = weekly.sort_values(by='sales', ascending=False).head(5)
top_weeks

## üåç Real-World Problem 2 ‚Äî IoT Sensor Data Cleaning

**Scenario:** A manufacturing facility records sensor readings every minute. Some readings are missing due to downtime.

**Goal:** Resample data to 10-minute intervals and interpolate missing values for smoother trend analysis.

In [None]:
# Simulate 1-minute sensor data
rng = pd.date_range('2024-01-01', periods=300, freq='T')
sensor = pd.DataFrame({'timestamp': rng, 'temperature': np.random.normal(25, 2, size=300)})
sensor = sensor.sample(frac=0.9).sort_values('timestamp').set_index('timestamp')

# Resample and interpolate
sensor_resampled = sensor.resample('10T').mean().interpolate(method='linear')
sensor_resampled.head(10)

## ‚úÖ Best Practices / Pitfalls

‚úÖ Always ensure your time column is in `datetime64` format.
‚úÖ Use `resample()` before performing rolling computations.
‚ö†Ô∏è Avoid `apply()` for resampling ‚Äî prefer built-in vectorized methods.
‚öôÔ∏è When dealing with high-frequency data (seconds/milliseconds), consider using **`pd.to_timedelta()`**.
üìä Use `.asfreq()` for precise time reindexing instead of `resample()` when no aggregation is needed.

---

## üí™ Challenge Exercise

**Task:** Given hourly energy consumption data for multiple buildings, perform the following:
1. Convert timestamps to daily frequency and compute total energy per day.
2. Find top 3 buildings with most stable (lowest variance) consumption.
3. Detect and visualize weekends with unusually high energy usage.

_(Try implementing it on your own!)_

---
# --- End of Section 8 ‚Äî Continue to Section 9 ---