
# TimeSeriesGenerator — Pedagogical Documentation

This guide explains how to use the **`TimeSeriesGenerator`** to create realistic synthetic time series for teaching
forecasting. It focuses on *intuition first*, followed by precise parameter details and reproducible recipes you can
copy in class.

---

## 1) What is this?
A small, self-contained Python class that produces time series with configurable components:
- **Trend:** linear, polynomial, or piecewise (with change-points).
- **Seasonality:** one or many periodic patterns with optional harmonics.
- **Cycle:** slow, long-term oscillations (like business cycles).
- **Noise:** iid Gaussian or **ARMA**(p, q) style autocorrelated errors.
- **Heteroskedasticity:** time-varying uncertainty (ARCH(1) demo).
- **Regime switches:** hidden Markov states affecting level and noise.
- **Exogenous variables (X):** external drivers with a linear effect on `y`.
- **Outliers & Missingness:** inject bad data to test robustness.
- **Multivariate** correlation via shared latent factors.
- **Calendar features:** year, month, weekday, and weekly/yearly Fourier encodings.
- **Train/Val/Test splits:** quick wide-format dataset for modeling labs.

You also get helper plots to visualize the data and ground-truth components.

---

## 2) Quickstart

```python
gen = TimeSeriesGenerator(random_state=42)

df, meta = gen.make(
    n_series=2,
    n_steps=300,
    freq="D",
    trend={"type": "piecewise", "knots": [120, 220], "slopes": [0.02, -0.01, 0.03], "intercept": 5.0},
    seasonality=[{"period": 7.0, "amplitude": 1.5, "harmonics": 2},
                 {"period": 30.0, "amplitude": 0.8}],
    noise={"ar": [0.6], "ma": [0.2], "sigma": 0.8},
    outliers={"prob": 0.01, "scale": 6.0},
    missing={"prob": 0.02, "block_prob": 0.01, "block_max": 4},
    add_calendar=True
)

# Plot series 0 and its components
gen.plot_series(df, series_id=0)
gen.plot_decompose(meta, series_id=0)
```

To create a *ready-to-train* split in wide format:

```python
ds = gen.make_train_ready(
    n_series=3, n_steps=500, horizon=24,
    trend={"type": "linear", "slope": 0.03, "intercept": 2.0},
    seasonality=[{"period": 7.0, "amplitude": 2.0, "harmonics": 2}],
    noise={"ar": [0.5], "ma": [], "sigma": 1.0}
)

train, val, test = ds["train"], ds["val"], ds["test"]
```

---

## 3) Intuition for Each Component

### 3.1 Trend
- **Linear:** `y_t = intercept + slope * t`. Useful for steady growth/decline.
- **Polynomial:** adds curvature. Coefficients `[c0, c1, c2, ...]` build \( c_0 + c_1 t + c_2 t^2 + \cdots \).
- **Piecewise linear:** multiple linear segments with change-points (knots). Great to simulate structural breaks.

### 3.2 Seasonality
- Repeating patterns with **period** \( P \). E.g., weekly \( P=7 \), yearly \( P \approx 365.25 \).
- **Harmonics:** richer shapes by summing sines at multiples of the base frequency:
  \[ s_t = \sum_{h=1}^{H} \frac{A}{h}\sin\!\big(2\pi h (t+\phi)/P\big) \]

### 3.3 Cycles
- Slow oscillations (long \( P \)) sometimes with **frequency drift** to mimic non-stationarity.

### 3.4 Noise
- **iid**: purely random shocks.
- **ARMA(p, q):** autocorrelated errors; today’s noise depends on previous noise and shocks.

### 3.5 Heteroskedasticity (ARCH(1))
- Variance changes over time. A simple version:
  \[ \sigma_t^2 = \alpha_0 + \alpha_1 \varepsilon_{t-1}^2 \]
- Great to show why constant-variance assumptions may fail.

### 3.6 Regime Switching
- Hidden states (e.g., “calm” vs “volatile”) follow a Markov chain with transition matrix \(P\).
- Each state can change the **level** (bias) and **scale** of noise — perfect to illustrate breaks and regime-dependent behavior.

### 3.7 Exogenous Variables (X)
- External drivers like marketing spend or temperature.
- Generator can produce **random walks**, **seasonal signals**, **binary events**, or **noise** features and applies a **linear** effect:
  \[ y_t \leftarrow y_t + X_t \beta. \]

### 3.8 Outliers & Missingness
- Inject *point anomalies* or blocks of missing values to practice robust modeling and imputation.

### 3.9 Multivariate Correlation
- Create multiple related series by mixing a few shared latent **factors** with strength \( \in [0,1] \).

### 3.10 Calendar Features
- Adds integer date parts and Fourier features for weekly/yearly seasonality to help linear models.

---

## 4) Full API (Key Arguments to `make`)

```python
df, meta = gen.make(
    n_series=1,            # number of parallel series
    n_steps=500,           # length of each series
    freq="D",              # pandas frequency string (e.g., "H", "D", "MS")
    start=None,            # default "2020-01-01" if None

    trend=None or {
        "type": "none" | "linear" | "poly" | "piecewise",
        # linear
        "slope": float, "intercept": float,
        # poly
        "coeffs": [c0, c1, c2, ...],
        # piecewise
        "knots": [k1, k2, ...], "slopes": [m1, m2, ...], "intercept": float,
    },

    seasonality=None or [
        {"period": float, "amplitude": float, "phase": float=0.0, "harmonics": int=1},
        ...
    ],

    cycle=None or {
        "period": float, "amplitude": float, "freq_drift": float=0.0, "phase": float=0.0
    },

    noise=None or {
        "ar": [phi1, phi2, ...], "ma": [theta1, ...], "sigma": float
    },

    regime=None or {
        "n_states": int,
        "p": [[...], [...], ...],                 # row-stochastic transition matrix
        "pi0": [..] or None,                      # initial probs (optional)
        "state_bias": [b1, b2, ...],
        "state_sigma_scale": [s1, s2, ...],
    },

    heterosked=None or {"type": "arch1", "alpha0": float, "alpha1": float},

    outliers=None or {"prob": float, "scale": float},

    missing=None or {"prob": float, "block_prob": float, "block_max": int},

    exog=None or {
        "n_features": int,
        "types": List[str],         # choices: "random_walk", "seasonal", "binary_event", "noise"
        "beta": List[float] or float
    },

    multivariate=None or {
        "n_factors": int, "mix_strength": float (0..1)
    },

    add_calendar=True,             # add date parts + Fourier features
    return_components=True,        # include component arrays in meta["components"]
    splits=None                    # (train,val,test) fractions that sum to 1.0
)
```

**Returns**
- `df`: tidy DataFrame with columns `["series", "time", "y", ...exog..., ...calendar...]`.
- `meta`: dictionary with:
  - `time_index`, `params` (echo of inputs),
  - `components`: dict of arrays (trend, seasonality, cycle, noise, regime_bias, exog_effect),
  - `states`: latent regime indices (if any),
  - `missing_mask`: boolean mask of injected NaNs,
  - `splits`: indices if `splits` is used.

---

## 5) Common Teaching Recipes

### A. Pure Seasonality (weekly)
```python
df, meta = gen.make(
    n_series=1, n_steps=200, freq="D",
    seasonality=[{"period": 7.0, "amplitude": 3.0, "harmonics": 2}],
    noise={"ar": [], "ma": [], "sigma": 0.5}
)
```

### B. Trend + Seasonality + AR(1) Noise
```python
df, meta = gen.make(
    n_series=1, n_steps=300, freq="D",
    trend={"type": "linear", "slope": 0.02, "intercept": 5.0},
    seasonality=[{"period": 7.0, "amplitude": 1.5}],
    noise={"ar": [0.7], "ma": [], "sigma": 1.0}
)
```

### C. Regime Switch (Calm vs. Volatile)
```python
df, meta = gen.make(
    n_series=1, n_steps=400, freq="D",
    noise={"ar": [0.4], "ma": [], "sigma": 0.8},
    regime={
        "n_states": 2,
        "p": [[0.95, 0.05],[0.05, 0.95]],
        "state_bias": [0.0, 2.0],
        "state_sigma_scale": [1.0, 1.6]
    }
)
```

### D. Multivariate, Shared Factors
```python
df, meta = gen.make(
    n_series=5, n_steps=300, freq="D",
    seasonality=[{"period": 7.0, "amplitude": 1.0}],
    noise={"ar": [0.3], "ma": [0.2], "sigma": 0.7},
    multivariate={"n_factors": 2, "mix_strength": 0.7}
)
```

### E. Exogenous Drivers + Linear Effect
```python
df, meta = gen.make(
    n_series=1, n_steps=250, freq="D",
    exog={"n_features": 2, "types": ["random_walk", "seasonal"], "beta": [0.5, -1.0]},
    seasonality=[{"period": 7.0, "amplitude": 1.0}],
    noise={"ar": [], "ma": [], "sigma": 1.0}
)
```

---

## 6) Classroom Exercises

1. **Decomposition**: Generate data with trend + weekly seasonality + AR(1) noise.
   - Ask students to estimate trend/seasonality (moving averages, STL, or Fourier regression).
   - Compare estimated components to `meta["components"]` ground truth.

2. **Model selection**: Create several datasets varying AR and MA parameters.
   - Fit baselines: Naive, Seasonal Naive, ARIMA/ETS, and a small LSTM.
   - Discuss error curves and overfitting vs. underfitting.

3. **Robustness**: Inject outliers and missing blocks.
   - Have students compare simple imputation vs. model-based imputation.
   - Evaluate MAE/RMSE before/after cleaning.

4. **Regime awareness**: Use regime switching.
   - Fit a single-regime model and analyze residuals.
   - Discuss how hidden states violate stationarity and how to detect them.

5. **Exogenous variables**: Include `X` with known β.
   - Fit linear regression with lags of `X` and compare estimated coefficients to truth.

6. **Multivariate forecasting**: Create 5 correlated series.
   - Compare per-series ARIMA vs. a multivariate model (VAR or a shared RNN).

---

## 7) Tips for Forecasting Labs

- **Hold-out strategy:** Use `make_train_ready(horizon=H)` to get train/val/test splits aligned with real forecasting.
- **Scaling:** Many models benefit from standardization; remember to invert transforms before evaluating.
- **Feature engineering:** Try Fourier terms and calendar dummies for seasonality in linear models.
- **Lagged features:** For tree/boosting/MLP baselines, build lagged windows of `y` and `X` (e.g., lags 1..24).
- **Evaluation:** Always compare a model to Naive/Seasonal-Naive baselines.
- **Reproducibility:** Set `random_state` and document the configuration in `meta["params"]`.

---

## 8) Troubleshooting

- **Exploding values:** Check trend slopes and AR coefficients. For AR, keep roots outside the unit circle (|φ| < 1 as a simple rule of thumb).
- **Weird seasonality shape:** Increase `harmonics` or adjust `phase`.
- **Too many NaNs:** Reduce `missing` probabilities or `block_max`.
- **Exog mismatch:** Ensure `exog["types"]` length equals `exog["n_features"]`; set `beta` length to 1 or `n_features`.

---

## 9) License and Attribution

You may use this generator freely for teaching and student projects.
Please retain a small attribution comment if you redistribute the class in course materials.


---

## 📦 Template Patterns for Forecasting

```python
# Typical Time Series Templates
# ---------------------------------
# Each function returns (df, meta) with a clear "story".
# Students can import and call them directly, then train their model.

from typing import Tuple, Dict
import pandas as pd

def template_trend_seasonal_noise(gen) -> Tuple[pd.DataFrame, Dict]:
    """Upward trend + strong weekly seasonality + AR(1) noise."""
    return gen.make(
        n_series=1, n_steps=500, freq="D",
        trend={"type": "linear", "slope": 0.05, "intercept": 10.0},
        seasonality=[{"period": 7.0, "amplitude": 5.0, "harmonics": 2}],
        noise={"ar": [0.6], "ma": [], "sigma": 1.0}
    )

def template_regime_switch(gen) -> Tuple[pd.DataFrame, Dict]:
    """Two regimes: calm vs. volatile (level shift + variance)."""
    return gen.make(
        n_series=1, n_steps=600, freq="D",
        noise={"ar": [0.4], "sigma": 0.8},
        regime={
            "n_states": 2,
            "p": [[0.95, 0.05], [0.05, 0.95]],
            "state_bias": [0.0, 5.0],
            "state_sigma_scale": [1.0, 2.0]
        }
    )

def template_outliers_missing(gen) -> Tuple[pd.DataFrame, Dict]:
    """Weekly seasonality with injected outliers and missing blocks."""
    return gen.make(
        n_series=1, n_steps=400, freq="D",
        seasonality=[{"period": 7.0, "amplitude": 4.0}],
        noise={"sigma": 1.0},
        outliers={"prob": 0.02, "scale": 10.0},
        missing={"prob": 0.03, "block_prob": 0.02, "block_max": 7}
    )

def template_multivariate_shared(gen) -> Tuple[pd.DataFrame, Dict]:
    """5 correlated series driven by common factors + seasonality."""
    return gen.make(
        n_series=5, n_steps=500, freq="D",
        seasonality=[{"period": 7.0, "amplitude": 2.0}],
        noise={"sigma": 1.0},
        multivariate={"n_factors": 2, "mix_strength": 0.7}
    )

def template_exog_driven(gen) -> Tuple[pd.DataFrame, Dict]:
    """Series influenced by exogenous drivers (random walk + seasonal)."""
    return gen.make(
        n_series=1, n_steps=400, freq="D",
        seasonality=[{"period": 7.0, "amplitude": 2.0}],
        exog={"n_features": 2, "types": ["random_walk", "seasonal"], "beta": [0.5, -1.2]},
        noise={"sigma": 1.0}
    )
```

---

## 📊 Suggested Classroom Activities

1. **Trend + Seasonality + Noise**

   * Baseline: Naïve / ARIMA.
   * Deep learning: LSTM / Transformer should capture the seasonality well.

2. **Regime Switch**

   * Show how forecasts degrade when models assume stationarity.
   * Encourage use of attention models or regime-aware features.

3. **Outliers + Missing**

   * Test imputation strategies before forecasting.
   * Compare robust models vs. sensitive ones.

4. **Multivariate Shared Factors**

   * Encourage students to try **multivariate models** (VAR, multivariate LSTMs).
   * Show benefit of sharing information across series.

5. **Exogenous Drivers**

   * Classic causal forecasting problem.
   * Compare models with vs. without `X` inputs.