# Feature Engineering for Time Series

This notebook demonstrates how to create features from time series data using TimeSmith's featurizers.

## What You'll Learn

- Lag features
- Rolling window statistics
- Time-based features
- Differencing
- Seasonal features
- Combining features with FeatureUnion

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from timesmith import (
    LagFeaturizer,
    RollingFeaturizer,
    TimeFeaturizer,
    DifferencingFeaturizer,
    SeasonalFeaturizer,
    FeatureUnion,
)

np.random.seed(42)
print("Feature engineering tools loaded!")

## 1. Create Time Series Data

Let's create a time series with interesting patterns.

In [None]:
# Create time series with trend and seasonality
dates = pd.date_range('2020-01-01', periods=100, freq='D')
trend = np.linspace(100, 150, len(dates))
seasonal = 10 * np.sin(2 * np.pi * np.arange(len(dates)) / 7)
noise = np.random.normal(0, 5, len(dates))
y = pd.Series(trend + seasonal + noise, index=dates)

plt.figure(figsize=(12, 5))
plt.plot(y.index, y.values, linewidth=2)
plt.title('Original Time Series', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Value')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

print(f"Original shape: {y.shape}")

## 2. Lag Features

Create features from past values.

In [None]:
# Create lag features
lag_featurizer = LagFeaturizer(lags=[1, 2, 3, 7])
lag_features = lag_featurizer.fit_transform(y)

print(f"Original shape: {y.shape}")
print(f"Features shape: {lag_features.shape}")
print(f"\nFeature columns: {list(lag_features.columns)}")
print(f"\nFirst few rows:")
print(lag_features.head())

## 3. Rolling Window Features

Compute rolling statistics over windows.

In [None]:
# Rolling statistics
rolling_featurizer = RollingFeaturizer(window=7, functions=['mean', 'std', 'min', 'max'])
rolling_features = rolling_featurizer.fit_transform(y)

print(f"Rolling features shape: {rolling_features.shape}")
print(f"\nFeature columns: {list(rolling_features.columns)}")
print(f"\nFirst few rows:")
print(rolling_features.head())

# Visualize
fig, axes = plt.subplots(2, 1, figsize=(12, 8))
axes[0].plot(y.index, y.values, label='Original', linewidth=2)
axes[0].plot(rolling_features.index, rolling_features['rolling_mean'], 
            label='Rolling Mean', linewidth=2, linestyle='--')
axes[0].set_title('Original vs Rolling Mean', fontweight='bold')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].plot(rolling_features.index, rolling_features['rolling_std'], 
            label='Rolling Std', linewidth=2, color='orange')
axes[1].set_title('Rolling Standard Deviation', fontweight='bold')
axes[1].set_xlabel('Date')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 4. Time Features

Extract time-based features from the index.

In [None]:
# Time-based features
time_featurizer = TimeFeaturizer(features=['hour', 'day_of_week', 'month', 'is_weekend'])
time_features = time_featurizer.fit_transform(y)

print(f"Time features shape: {time_features.shape}")
print(f"\nFeature columns: {list(time_features.columns)}")
print(f"\nFirst few rows:")
print(time_features.head())

## 5. Differencing

Create differenced features to remove trends.

In [None]:
# Differencing
diff_featurizer = DifferencingFeaturizer(order=1)
diff_features = diff_featurizer.fit_transform(y)

print(f"Differenced series shape: {diff_features.shape}")
print(f"\nFirst few values:")
print(diff_features.head())

# Visualize
fig, axes = plt.subplots(2, 1, figsize=(12, 8))
axes[0].plot(y.index, y.values, linewidth=2, label='Original')
axes[0].set_title('Original Series', fontweight='bold')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

axes[1].plot(diff_features.index, diff_features.values, linewidth=2, label='Differenced', color='orange')
axes[1].set_title('First Difference (Detrended)', fontweight='bold')
axes[1].set_xlabel('Date')
axes[1].legend()
axes[1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 6. Seasonal Features

Extract seasonal patterns.

In [None]:
# Seasonal features
seasonal_featurizer = SeasonalFeaturizer(seasonal_period=7)
seasonal_features = seasonal_featurizer.fit_transform(y)

print(f"Seasonal features shape: {seasonal_features.shape}")
print(f"\nFeature columns: {list(seasonal_features.columns)}")
print(f"\nFirst few rows:")
print(seasonal_features.head())

## 7. Feature Union: Combine Multiple Featurizers

Combine features from multiple sources.

In [None]:
# Combine multiple featurizers
feature_union = FeatureUnion([
    ('lags', LagFeaturizer(lags=[1, 2, 3])),
    ('rolling', RollingFeaturizer(window=7, functions=['mean', 'std'])),
    ('time', TimeFeaturizer(features=['day_of_week', 'month']))
])

all_features = feature_union.fit_transform(y)

print(f"Combined features shape: {all_features.shape}")
print(f"\nTotal features: {all_features.shape[1]}")
print(f"\nFeature columns (first 10): {list(all_features.columns)[:10]}")
print(f"\nFirst few rows:")
print(all_features.head())

## Summary

You've learned:
- How to create lag features from past values
- How to compute rolling window statistics
- How to extract time-based features
- How to difference series to remove trends
- How to extract seasonal patterns
- How to combine multiple featurizers with FeatureUnion

**Tips:**
- Lag features capture temporal dependencies
- Rolling features capture local patterns
- Time features capture calendar effects
- Differencing helps with non-stationary data
- FeatureUnion makes it easy to combine different feature types