# Time Series Forecasting Demo

This notebook demonstrates time series forecasting using multiple model types:
- **Prophet**: Native time series model with trend and seasonality
- **ARIMA**: Classical time series model
- **Random Forest**: ML model with lag features for time series
- **Linear Regression**: Simple baseline with time features

## Key Concepts:
1. **Native time series models** (Prophet, ARIMA): Handle dates directly
2. **ML models** (Random Forest, Linear Reg): Require feature engineering (lags, rolling stats)
3. **Comprehensive outputs**: All models return standardized three-DataFrame structure

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

from py_parsnip import prophet_reg, arima_reg, rand_forest, linear_reg

# Set random seed
np.random.seed(42)

## Generate Synthetic Time Series Data

We'll create daily sales data with:
- Trend (increasing over time)
- Weekly seasonality
- Random noise

In [2]:
# Generate 2 years of daily data
n_days = 730
start_date = datetime(2022, 1, 1)

dates = [start_date + timedelta(days=i) for i in range(n_days)]

# Create time series components
trend = np.linspace(100, 150, n_days)
seasonality = 20 * np.sin(2 * np.pi * np.arange(n_days) / 7)  # Weekly pattern
noise = np.random.normal(0, 5, n_days)

sales = trend + seasonality + noise

data = pd.DataFrame({
    'date': dates,
    'sales': sales
})

print(data.head(10))
print(f"\nData shape: {data.shape}")
print(f"Date range: {data['date'].min()} to {data['date'].max()}")

        date       sales
0 2022-01-01  102.483571
1 2022-01-02  115.013895
2 2022-01-03  122.874175
3 2022-01-04  116.498585
4 2022-01-05   90.425907
5 2022-01-06   79.673692
6 2022-01-07   92.670957
7 2022-01-08  104.317283
8 2022-01-09  113.837955
9 2022-01-10  122.828642

Data shape: (730, 2)
Date range: 2022-01-01 00:00:00 to 2023-12-31 00:00:00


## Train/Test Split for Time Series

**Important**: For time series, we split chronologically (not randomly)

In [3]:
# Split: Train on first 600 days, test on last 130 days
train = data.iloc[:600].copy()
test = data.iloc[600:].copy()

print(f"Train: {train.shape[0]} days ({train['date'].min()} to {train['date'].max()})")
print(f"Test: {test.shape[0]} days ({test['date'].min()} to {test['date'].max()})")

Train: 600 days (2022-01-01 00:00:00 to 2023-08-23 00:00:00)
Test: 130 days (2023-08-24 00:00:00 to 2023-12-31 00:00:00)


---

# Model 1: Prophet (Native Time Series)

Prophet handles dates natively and automatically detects trend + seasonality.

In [4]:
# Create Prophet specification
# Prophet will automatically detect weekly seasonality in our data
spec_prophet = prophet_reg(
    n_changepoints=25,  # Number of potential trend changes
    changepoint_prior_scale=0.05,  # Flexibility of trend
    seasonality_prior_scale=10.0   # Flexibility of seasonality
)

print(spec_prophet)

ModelSpec(model_type='prophet_reg', engine='prophet', mode='regression', args={'growth': 'linear', 'changepoint_prior_scale': 0.05, 'seasonality_prior_scale': 10.0, 'seasonality_mode': 'additive', 'n_changepoints': 25, 'changepoint_range': 0.8})


In [5]:
# Fit Prophet
fit_prophet = spec_prophet.fit(train, "sales ~ date")
print("Prophet model fitted!")

19:48:40 - cmdstanpy - INFO - Chain [1] start processing
19:48:40 - cmdstanpy - INFO - Chain [1] done processing


Prophet model fitted!


In [6]:
# Predict on test data
pred_prophet = fit_prophet.predict(test)
print(pred_prophet.head(10))

                 .pred
date                  
2023-08-24  121.509856
2023-08-25  124.885899
2023-08-26  140.727634
2023-08-27  156.711994
2023-08-28  159.677312
2023-08-29  149.760930
2023-08-30  131.996135
2023-08-31  121.968883
2023-09-01  125.344926
2023-09-02  141.186661


In [7]:
fit_prophet

ModelFit(spec=ModelSpec(model_type='prophet_reg', engine='prophet', mode='regression', args={'growth': 'linear', 'changepoint_prior_scale': 0.05, 'seasonality_prior_scale': 10.0, 'seasonality_mode': 'additive', 'n_changepoints': 25, 'changepoint_range': 0.8}), fit_data={'model': <prophet.forecaster.Prophet object at 0x11172c7c0>, 'n_obs': 600, 'predictor_name': 'date', 'outcome_name': 'sales', 'prophet_df':             ds           y
0   2022-01-01  102.483571
1   2022-01-02  115.013895
2   2022-01-03  122.874175
3   2022-01-04  116.498585
4   2022-01-05   90.425907
..         ...         ...
595 2023-08-19  138.259246
596 2023-08-20  155.165170
597 2023-08-21  155.551242
598 2023-08-22  147.471298
599 2023-08-23  134.292504

[600 rows x 2 columns], 'y_train': array([102.48357077, 115.01389525, 122.87417515, 116.49858538,
        90.42590677,  79.6736925 ,  92.67095706, 104.31728339,
       113.83795456, 122.82864241, 107.04645737,  89.74813461,
        82.53429838,  75.6886015 ,  92.3

In [8]:
# Evaluate and extract outputs
fit_prophet = fit_prophet.evaluate(test)
outputs_prophet, coefs_prophet, stats_prophet = fit_prophet.extract_outputs()

print("Prophet OUTPUTS:")
print(outputs_prophet[outputs_prophet['split'] == 'test'].head(10))

Prophet OUTPUTS:
          date     actuals      fitted    forecast  residuals split  \
600 2023-08-24  125.438648  121.509856  125.438648   3.928793  test   
601 2023-08-25  120.973394  124.885899  120.973394  -3.912504  test   
602 2023-08-26  145.637467  140.727634  145.637467   4.909834  test   
603 2023-08-27  163.772844  156.711994  163.772844   7.060849  test   
604 2023-08-28  162.992345  159.677312  162.992345   3.315033  test   
605 2023-08-29  159.556853  149.760930  159.556853   9.795923  test   
606 2023-08-30  129.017165  131.996135  129.017165  -2.978970  test   
607 2023-08-31  115.910541  121.968883  115.910541  -6.058341  test   
608 2023-09-01  117.170729  125.344926  117.170729  -8.174197  test   
609 2023-09-02  149.249769  141.186661  149.249769   8.063108  test   

           model model_group_name   group  
600  prophet_reg                   global  
601  prophet_reg                   global  
602  prophet_reg                   global  
603  prophet_reg         

In [9]:
outputs_prophet

Unnamed: 0,date,actuals,fitted,forecast,residuals,split,model,model_group_name,group
0,2022-01-01,102.483571,99.464228,102.483571,3.019343,train,prophet_reg,,global
1,2022-01-02,115.013895,115.456136,115.013895,-0.442241,train,prophet_reg,,global
2,2022-01-03,122.874175,118.429001,122.874175,4.445174,train,prophet_reg,,global
3,2022-01-04,116.498585,108.520167,116.498585,7.978419,train,prophet_reg,,global
4,2022-01-05,90.425907,90.762920,90.425907,-0.337013,train,prophet_reg,,global
...,...,...,...,...,...,...,...,...,...
725,2023-12-27,140.128060,139.799598,140.128060,0.328462,test,prophet_reg,,global
726,2023-12-28,130.387850,129.772346,130.387850,0.615504,test,prophet_reg,,global
727,2023-12-29,135.964105,133.148389,135.964105,2.815716,test,prophet_reg,,global
728,2023-12-30,147.232614,148.990124,147.232614,-1.757510,test,prophet_reg,,global


In [10]:
# Get test metrics
prophet_test_metrics = stats_prophet[
    (stats_prophet['split'] == 'test') & 
    (stats_prophet['metric'].isin(['rmse', 'mae', 'mape', 'r_squared']))
][['metric', 'value']]

print("\nProphet Test Metrics:")
print(prophet_test_metrics)


Prophet Test Metrics:
       metric     value
6        rmse  5.179655
7         mae  4.082371
8        mape  2.836648
10  r_squared  0.880861


---

# Model 2: ARIMA (Classical Time Series)

ARIMA models the autocorrelation structure of the time series.

In [11]:
# Create ARIMA specification
spec_arima = arima_reg(
    seasonal_period=7,  # Weekly seasonality
    non_seasonal_ar=1,
    non_seasonal_differences=1,
    non_seasonal_ma=1,
    seasonal_ar=1,
    seasonal_differences=0,
    seasonal_ma=1
)

print(spec_arima)

ModelSpec(model_type='arima_reg', engine='statsmodels', mode='regression', args={'seasonal_period': 7, 'non_seasonal_ar': 1, 'non_seasonal_differences': 1, 'non_seasonal_ma': 1, 'seasonal_ar': 1, 'seasonal_differences': 0, 'seasonal_ma': 1})


In [12]:
# Fit ARIMA
fit_arima = spec_arima.fit(train, "sales ~ date")
print("ARIMA model fitted!")

  self._init_dates(dates, freq)
  self._init_dates(dates, freq)


ARIMA model fitted!


In [13]:
# Predict on test data
pred_arima = fit_arima.predict(test)
print(pred_arima.head(10))

                 .pred
date                  
2023-08-24  122.171910
2023-08-25  125.470230
2023-08-26  140.944822
2023-08-27  156.564862
2023-08-28  159.489148
2023-08-29  149.789418
2023-08-30  132.471099
2023-08-31  122.671651
2023-09-01  125.946153
2023-09-02  141.412524


In [14]:
# Evaluate and extract outputs
fit_arima = fit_arima.evaluate(test)
outputs_arima, coefs_arima, stats_arima = fit_arima.extract_outputs()

# Get test metrics
arima_test_metrics = stats_arima[
    (stats_arima['split'] == 'test') & 
    (stats_arima['metric'].isin(['rmse', 'mae', 'mape', 'r_squared']))
][['metric', 'value']]

print("ARIMA Test Metrics:")
print(arima_test_metrics)

ARIMA Test Metrics:
       metric     value
6        rmse  5.156017
7         mae  4.053007
8        mape  2.820957
10  r_squared  0.881946


---

# Model 3: Random Forest (ML with Time Features)

Random Forest doesn't handle dates natively, so we need to engineer features:
- Lag features (previous values)
- Rolling statistics (moving averages)
- Time-based features (day of week, month, etc.)

In [15]:
def create_time_series_features(df, target_col='sales', lags=[1, 7, 14]):
    """
    Create time series features for ML models.
    
    Args:
        df: DataFrame with 'date' and target column
        target_col: Name of target variable
        lags: List of lag periods to create
    
    Returns:
        DataFrame with engineered features
    """
    df = df.copy()
    
    # Time-based features
    df['day_of_week'] = df['date'].dt.dayofweek
    df['day_of_month'] = df['date'].dt.day
    df['month'] = df['date'].dt.month
    df['day_of_year'] = df['date'].dt.dayofyear
    
    # Lag features
    for lag in lags:
        df[f'lag_{lag}'] = df[target_col].shift(lag)
    
    # Rolling statistics
    df['rolling_mean_7'] = df[target_col].shift(1).rolling(window=7, min_periods=1).mean()
    df['rolling_std_7'] = df[target_col].shift(1).rolling(window=7, min_periods=1).std()
    df['rolling_mean_14'] = df[target_col].shift(1).rolling(window=14, min_periods=1).mean()
    
    # Drop rows with NaN (from lag/rolling features)
    df = df.dropna()
    
    return df

# Create features for train and test
train_rf = create_time_series_features(train)
test_rf = create_time_series_features(test)

print("Random Forest features:")
print(train_rf.head(20))
print(f"\nTrain shape: {train_rf.shape}")
print(f"Test shape: {test_rf.shape}")

Random Forest features:
         date       sales  day_of_week  day_of_month  month  day_of_year  \
14 2022-01-15   92.335630            5            15      1           15   
15 2022-01-16  113.853999            6            16      1           16   
16 2022-01-17  115.531796            0            17      1           17   
17 2022-01-18  111.414892            1            18      1           18   
18 2022-01-19   88.016773            2            19      1           19   
19 2022-01-20   74.743078            3            20      1           20   
20 2022-01-21   93.063356            4            21      1           21   
21 2022-01-22  100.311448            5            22      1           22   
22 2022-01-23  117.483187            6            23      1           23   
23 2022-01-24  113.952321            0            24      1           24   
24 2022-01-25  107.601852            1            25      1           25   
25 2022-01-26   93.591616            2            26      1     

In [16]:
# Create Random Forest specification
spec_rf = rand_forest(
    trees=300,
    mtry=4,
    min_n=5
).set_mode("regression")

print(spec_rf)

ModelSpec(model_type='rand_forest', engine='sklearn', mode='regression', args={'mtry': 4, 'trees': 300, 'min_n': 5})


In [17]:
# Fit Random Forest (using engineered features)
formula_rf = "sales ~ day_of_week + day_of_month + month + lag_1 + lag_7 + lag_14 + rolling_mean_7 + rolling_std_7 + rolling_mean_14"

fit_rf = spec_rf.fit(train_rf, formula_rf)
print("Random Forest model fitted!")

Random Forest model fitted!


In [18]:
# Predict on test data
pred_rf = fit_rf.predict(test_rf)
print(pred_rf.head(10))

        .pred
0  118.796579
1  116.833078
2  147.459110
3  156.670711
4  160.149612
5  154.272904
6  130.955189
7  126.179749
8  123.707652
9  147.702509


In [19]:
# Evaluate and extract outputs
fit_rf = fit_rf.evaluate(test_rf)
outputs_rf, coefs_rf, stats_rf = fit_rf.extract_outputs()

print("\nRandom Forest Feature Importances:")
print(coefs_rf.sort_values('coefficient', ascending=False))


Random Forest Feature Importances:
          variable  coefficient  std_error  t_stat  p_value  ci_0.025  \
4            lag_7     0.422847        NaN     NaN      NaN       NaN   
5           lag_14     0.351539        NaN     NaN      NaN       NaN   
3            lag_1     0.100216        NaN     NaN      NaN       NaN   
8  rolling_mean_14     0.041979        NaN     NaN      NaN       NaN   
6   rolling_mean_7     0.037179        NaN     NaN      NaN       NaN   
0      day_of_week     0.021122        NaN     NaN      NaN       NaN   
7    rolling_std_7     0.010575        NaN     NaN      NaN       NaN   
1     day_of_month     0.008445        NaN     NaN      NaN       NaN   
2            month     0.006098        NaN     NaN      NaN       NaN   

   ci_0.975  vif        model model_group_name   group  
4       NaN  NaN  rand_forest                   global  
5       NaN  NaN  rand_forest                   global  
3       NaN  NaN  rand_forest                   global  
8    

In [20]:
# Get test metrics
rf_test_metrics = stats_rf[
    (stats_rf['split'] == 'test') & 
    (stats_rf['metric'].isin(['rmse', 'mae', 'mape', 'r_squared']))
][['metric', 'value']]

print("Random Forest Test Metrics:")
print(rf_test_metrics)

Random Forest Test Metrics:
       metric     value
7        rmse  6.808105
8         mae  5.435608
9        mape  3.731617
11  r_squared   0.78172


---

# Model 4: Linear Regression (Simple Baseline)

Linear regression with the same time features as Random Forest.

In [21]:
# Create Linear Regression specification
spec_lm = linear_reg()

# Fit using same formula as Random Forest
fit_lm = spec_lm.fit(train_rf, formula_rf)
print("Linear Regression model fitted!")

Linear Regression model fitted!


In [22]:
# Evaluate
fit_lm = fit_lm.evaluate(test_rf)
outputs_lm, coefs_lm, stats_lm = fit_lm.extract_outputs()

# Get test metrics
lm_test_metrics = stats_lm[
    (stats_lm['split'] == 'test') & 
    (stats_lm['metric'].isin(['rmse', 'mae', 'mape', 'r_squared']))
][['metric', 'value']]

print("Linear Regression Test Metrics:")
print(lm_test_metrics)

Linear Regression Test Metrics:
       metric     value
7        rmse  6.356702
8         mae  4.913321
9        mape  3.436576
11  r_squared  0.809706


In [32]:
outputs_lm

Unnamed: 0,actuals,fitted,forecast,residuals,split,model,model_group_name,group
0,92.335630,102.448970,92.335630,92.335630,train,linear_reg,,global
1,113.853999,114.653714,113.853999,113.853999,train,linear_reg,,global
2,115.531796,121.960042,115.531796,115.531796,train,linear_reg,,global
3,111.414892,112.829976,111.414892,111.414892,train,linear_reg,,global
4,88.016773,93.283959,88.016773,88.016773,train,linear_reg,,global
...,...,...,...,...,...,...,...,...
697,140.128060,141.974866,140.128060,-1.846806,test,linear_reg,,global
698,130.387850,130.591752,130.387850,-0.203901,test,linear_reg,,global
699,135.964105,136.508827,135.964105,-0.544723,test,linear_reg,,global
700,147.232614,145.945399,147.232614,1.287215,test,linear_reg,,global


---

# Model Comparison

Compare all models on test set performance.

In [23]:
# Combine all test metrics
prophet_test_metrics['model'] = 'Prophet'
arima_test_metrics['model'] = 'ARIMA'
rf_test_metrics['model'] = 'Random Forest'
lm_test_metrics['model'] = 'Linear Regression'

all_metrics = pd.concat([
    prophet_test_metrics,
    arima_test_metrics,
    rf_test_metrics,
    lm_test_metrics
], ignore_index=True)

# Pivot for easy comparison
comparison = all_metrics.pivot(index='metric', columns='model', values='value')

print("\n" + "=" * 80)
print("MODEL COMPARISON - TEST SET METRICS")
print("=" * 80)
print(comparison)
print("\nLower is better for: RMSE, MAE, MAPE")
print("Higher is better for: RÂ²")


MODEL COMPARISON - TEST SET METRICS
model         ARIMA Linear Regression   Prophet Random Forest
metric                                                       
mae        4.053007          4.913321  4.082371      5.435608
mape       2.820957          3.436576  2.836648      3.731617
r_squared  0.881946          0.809706  0.880861       0.78172
rmse       5.156017          6.356702  5.179655      6.808105

Lower is better for: RMSE, MAE, MAPE
Higher is better for: RÂ²


In [24]:
# Find best model for each metric
print("\n" + "=" * 80)
print("BEST MODEL FOR EACH METRIC")
print("=" * 80)

for metric in ['rmse', 'mae', 'mape']:
    best_model = comparison.loc[metric].idxmin()
    best_value = comparison.loc[metric].min()
    print(f"{metric.upper():6s}: {best_model:20s} ({best_value:.2f})")

# RÂ² is higher-is-better
best_model = comparison.loc['r_squared'].idxmax()
best_value = comparison.loc['r_squared'].max()
print(f"RÂ²    : {best_model:20s} ({best_value:.4f})")


BEST MODEL FOR EACH METRIC
RMSE  : ARIMA                (5.16)
MAE   : ARIMA                (4.05)
MAPE  : ARIMA                (2.82)
RÂ²    : ARIMA                (0.8819)


---

# Future Forecasting

Generate forecasts for the next 30 days beyond the test set.

In [25]:
# Create future dates
last_date = data['date'].max()
future_dates = [last_date + timedelta(days=i+1) for i in range(30)]
future_data = pd.DataFrame({'date': future_dates})

print(f"Forecasting for: {future_data['date'].min()} to {future_data['date'].max()}")

Forecasting for: 2024-01-01 00:00:00 to 2024-01-30 00:00:00


### Prophet Future Forecast

In [26]:
# Prophet can forecast directly
future_prophet = fit_prophet.predict(future_data)

print("Prophet Future Forecast:")
print(future_prophet)

Prophet Future Forecast:
                 .pred
date                  
2024-01-01  167.939802
2024-01-02  158.023420
2024-01-03  140.258626
2024-01-04  130.231373
2024-01-05  133.607416
2024-01-06  149.449151
2024-01-07  165.433512
2024-01-08  168.398829
2024-01-09  158.482447
2024-01-10  140.717653
2024-01-11  130.690400
2024-01-12  134.066444
2024-01-13  149.908178
2024-01-14  165.892539
2024-01-15  168.857857
2024-01-16  158.941474
2024-01-17  141.176680
2024-01-18  131.149428
2024-01-19  134.525471
2024-01-20  150.367206
2024-01-21  166.351567
2024-01-22  169.316884
2024-01-23  159.400502
2024-01-24  141.635707
2024-01-25  131.608455
2024-01-26  134.984498
2024-01-27  150.826233
2024-01-28  166.810594
2024-01-29  169.775911
2024-01-30  159.859529


### ARIMA Future Forecast

In [27]:
# ARIMA can also forecast directly
future_arima = fit_arima.predict(future_data)

print("ARIMA Future Forecast:")
print(future_arima)

ARIMA Future Forecast:
                 .pred
date                  
2024-01-01  122.171910
2024-01-02  125.470230
2024-01-03  140.944822
2024-01-04  156.564862
2024-01-05  159.489148
2024-01-06  149.789418
2024-01-07  132.471099
2024-01-08  122.671651
2024-01-09  125.946153
2024-01-10  141.412524
2024-01-11  157.024052
2024-01-12  159.946745
2024-01-13  150.252299
2024-01-14  132.943416
2024-01-15  123.149306
2024-01-16  126.422024
2024-01-17  141.879969
2024-01-18  157.482992
2024-01-19  160.404093
2024-01-20  150.714928
2024-01-21  133.415475
2024-01-22  123.626701
2024-01-23  126.897636
2024-01-24  142.347159
2024-01-25  157.941681
2024-01-26  160.861191
2024-01-27  151.177305
2024-01-28  133.887276
2024-01-29  124.103836
2024-01-30  127.372989


In [28]:
future_arima

Unnamed: 0_level_0,.pred
date,Unnamed: 1_level_1
2024-01-01,122.17191
2024-01-02,125.47023
2024-01-03,140.944822
2024-01-04,156.564862
2024-01-05,159.489148
2024-01-06,149.789418
2024-01-07,132.471099
2024-01-08,122.671651
2024-01-09,125.946153
2024-01-10,141.412524


### Random Forest Future Forecast

**Note**: For ML models, we need to generate features iteratively for multi-step forecasting.

In [29]:
# For ML models, we need to forecast iteratively
# Start with full historical data
full_history = data.copy()

# Forecast one day at a time
future_forecasts_rf = []

for future_date in future_dates:
    # Create features for this future date
    temp_data = pd.concat([
        full_history,
        pd.DataFrame({'date': [future_date], 'sales': [np.nan]})
    ], ignore_index=True)
    
    temp_features = create_time_series_features(temp_data.dropna())
    
    if len(temp_features) == 0:
        print(f"Warning: Could not create features for {future_date}")
        continue
    
    # Get last row (our future date)
    future_row = temp_features.iloc[[-1]].copy()
    
    # Predict
    pred = fit_rf.predict(future_row)
    forecast_value = pred['.pred'].values[0]
    
    # Store forecast
    future_forecasts_rf.append({
        'date': future_date,
        '.pred': forecast_value
    })
    
    # Add to history for next iteration
    full_history = pd.concat([
        full_history,
        pd.DataFrame({'date': [future_date], 'sales': [forecast_value]})
    ], ignore_index=True)

future_rf = pd.DataFrame(future_forecasts_rf)
print("Random Forest Future Forecast:")
print(future_rf)

Random Forest Future Forecast:
         date       .pred
0  2024-01-01  155.275532
1  2024-01-02  159.384147
2  2024-01-03  156.569204
3  2024-01-04  143.638270
4  2024-01-05  131.436900
5  2024-01-06  132.264004
6  2024-01-07  147.299626
7  2024-01-08  154.071165
8  2024-01-09  158.382374
9  2024-01-10  155.945968
10 2024-01-11  149.484803
11 2024-01-12  138.154216
12 2024-01-13  133.512963
13 2024-01-14  141.205649
14 2024-01-15  152.123474
15 2024-01-16  155.692823
16 2024-01-17  155.598387
17 2024-01-18  155.219644
18 2024-01-19  148.362857
19 2024-01-20  134.285858
20 2024-01-21  133.737964
21 2024-01-22  150.342050
22 2024-01-23  153.392520
23 2024-01-24  155.359808
24 2024-01-25  154.374938
25 2024-01-26  151.263964
26 2024-01-27  143.759321
27 2024-01-28  135.015239
28 2024-01-29  142.266608
29 2024-01-30  153.267896


### Compare Future Forecasts

In [30]:
future_rf

Unnamed: 0,date,.pred
0,2024-01-01,155.275532
1,2024-01-02,159.384147
2,2024-01-03,156.569204
3,2024-01-04,143.63827
4,2024-01-05,131.4369
5,2024-01-06,132.264004
6,2024-01-07,147.299626
7,2024-01-08,154.071165
8,2024-01-09,158.382374
9,2024-01-10,155.945968


In [31]:
# Combine all forecasts
forecast_comparison = pd.DataFrame({
    'date': future_dates,
    'Prophet': future_prophet['.pred'].values,
    'ARIMA': future_arima['.pred'].values,
    'Random Forest': future_rf['.pred'].values
})

print("\nFuture Forecast Comparison:")
print(forecast_comparison)

print("\nForecast Statistics:")
print(forecast_comparison[['Prophet', 'ARIMA', 'Random Forest']].describe())


Future Forecast Comparison:
         date     Prophet       ARIMA  Random Forest
0  2024-01-01  167.939802  122.171910     155.275532
1  2024-01-02  158.023420  125.470230     159.384147
2  2024-01-03  140.258626  140.944822     156.569204
3  2024-01-04  130.231373  156.564862     143.638270
4  2024-01-05  133.607416  159.489148     131.436900
5  2024-01-06  149.449151  149.789418     132.264004
6  2024-01-07  165.433512  132.471099     147.299626
7  2024-01-08  168.398829  122.671651     154.071165
8  2024-01-09  158.482447  125.946153     158.382374
9  2024-01-10  140.717653  141.412524     155.945968
10 2024-01-11  130.690400  157.024052     149.484803
11 2024-01-12  134.066444  159.946745     138.154216
12 2024-01-13  149.908178  150.252299     133.512963
13 2024-01-14  165.892539  132.943416     141.205649
14 2024-01-15  168.857857  123.149306     152.123474
15 2024-01-16  158.941474  126.422024     155.692823
16 2024-01-17  141.176680  141.879969     155.598387
17 2024-01-18  13

---

# Summary

## Model Characteristics:

### 1. Prophet
- **Pros**: Handles dates natively, automatic trend/seasonality detection, robust to missing data
- **Cons**: Can be slower, less flexible for custom features
- **Best for**: Business time series with strong seasonality

### 2. ARIMA
- **Pros**: Classical approach, interpretable parameters, works well with stationary series
- **Cons**: Requires stationarity, parameter tuning can be complex
- **Best for**: Stationary time series, short-term forecasts

### 3. Random Forest
- **Pros**: Captures non-linear relationships, handles complex interactions, feature importances
- **Cons**: Requires feature engineering, can't extrapolate trends, slower for multi-step forecasting
- **Best for**: Time series with rich features, non-linear patterns

### 4. Linear Regression
- **Pros**: Fast, interpretable coefficients, simple baseline
- **Cons**: Assumes linear relationships, limited flexibility
- **Best for**: Simple trends, baseline comparisons

## Key Takeaways:

1. **Native time series models** (Prophet, ARIMA) are easier to use but less flexible
2. **ML models** (Random Forest, Linear Reg) require feature engineering but can capture complex patterns
3. **All models** return standardized three-DataFrame outputs for consistent analysis
4. **evaluate()** method enables easy train/test comparison across all model types
5. **Multi-step forecasting** is straightforward for Prophet/ARIMA, iterative for ML models