# Basic Forecasting with TimeCopilot

This notebook demonstrates the core forecasting capabilities of TimeCopilot, including how to work with different data formats, model selection, and interpretation of results.

## Data Preparation

TimeCopilot works with pandas DataFrames that follow a specific format. Let's explore the data requirements and preparation steps.

In [None]:
import pandas as pd
import numpy as np
from timecopilot import TimeCopilot
from timecopilot.forecaster import TimeCopilotForecaster
from timecopilot.models.benchmarks import AutoETS, AutoARIMA
from timecopilot.models.foundational.timesfm import TimesFM
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(42)

## Required Data Format

TimeCopilot requires your data to be in a specific format with these columns:

- **unique_id**: Unique identifier for each time series (string)
- **ds**: Date column (datetime format)
- **y**: Target variable for forecasting (float format)

Let's load and examine a sample dataset:

In [None]:
# Load a sample dataset
df = pd.read_csv("https://timecopilot.s3.amazonaws.com/public/data/air_passengers.csv")

# Display dataset information
print("Dataset shape:", df.shape)
print("\nColumn names:", df.columns.tolist())
print("\nData types:")
print(df.dtypes)
print("\nFirst few rows:")
df.head()

In [None]:
# Convert date column to datetime if needed
df['ds'] = pd.to_datetime(df['ds'])

# Basic statistics about the time series
print("Time series statistics:")
print(f"Date range: {df['ds'].min()} to {df['ds'].max()}")
print(f"Number of observations: {len(df)}")
print(f"Target variable range: {df['y'].min():.2f} to {df['y'].max():.2f}")
print(f"Missing values: {df['y'].isna().sum()}")

## Using the TimeCopilot Agent

The TimeCopilot agent provides an intelligent interface that combines forecasting with natural language explanations.

In [None]:
# Initialize the TimeCopilot agent
# Note: Make sure you have your API keys set up as environment variables
agent = TimeCopilot(
    model="openai:gpt-4o-mini",  # You can also use "openai:gpt-4o", "anthropic:claude-3-5-sonnet-20241022", etc.
)

# Generate forecast with specific parameters
result = agent.forecast(
    df=df,
    h=24,  # Forecast 24 periods ahead
    freq="M",  # Monthly frequency
    level=[80, 95],  # Prediction intervals
    query="Analyze the seasonal patterns and provide insights about future trends"
)

print("Forecast generated successfully!")
print(f"Forecast shape: {result.forecast.shape}")
print(f"Model used: {result.model}")

## Analyzing the Results

In [None]:
# Display the forecast results
print("Forecast DataFrame:")
result.forecast.head(10)

In [None]:
# Read the agent's explanation
print("Agent's Analysis:")
print("=" * 50)
print(result.explanation)
print("=" * 50)

## Using the TimeCopilotForecaster Directly

For more control over the forecasting process, you can use the TimeCopilotForecaster directly with specific models.

In [None]:
# Initialize individual models
models = [
    AutoETS(),
    AutoARIMA(),
    TimesFM(),  # Foundation model
]

# Create the unified forecaster
forecaster = TimeCopilotForecaster(models=models)

# Generate forecasts
forecasts = forecaster.forecast(
    df=df,
    h=12,
    freq="M",
    level=[80, 95]
)

print("Multi-model forecast generated!")
print(f"Shape: {forecasts.shape}")
print(f"\nModels used: {forecasts['model'].unique()}")
forecasts.head()

## Cross-Validation

Evaluate model performance using cross-validation.

In [None]:
# Perform cross-validation
cv_results = forecaster.cross_validation(
    df=df,
    h=6,  # Forecast 6 periods ahead
    freq="M",
    n_windows=3,  # Number of cross-validation windows
    step_size=6   # Step size between windows
)

print("Cross-validation completed!")
print(f"CV results shape: {cv_results.shape}")
cv_results.head()

In [None]:
# Calculate accuracy metrics
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Group by model and calculate metrics
metrics = []
for model in cv_results['model'].unique():
    model_data = cv_results[cv_results['model'] == model]
    
    mae = mean_absolute_error(model_data['y'], model_data['y_pred'])
    mse = mean_squared_error(model_data['y'], model_data['y_pred'])
    rmse = np.sqrt(mse)
    
    metrics.append({
        'model': model,
        'mae': mae,
        'mse': mse,
        'rmse': rmse
    })

metrics_df = pd.DataFrame(metrics)
print("Model Performance Metrics:")
metrics_df.round(2)

## Visualization

Create comprehensive visualizations of the forecast results.

In [None]:
# Plot historical data and forecasts
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Plot 1: Historical data
axes[0, 0].plot(pd.to_datetime(df['ds']), df['y'], 'b-', linewidth=2)
axes[0, 0].set_title('Historical Data')
axes[0, 0].set_xlabel('Date')
axes[0, 0].set_ylabel('Air Passengers')
axes[0, 0].grid(True, alpha=0.3)

# Plot 2: Agent forecast with intervals
agent_forecast = result.forecast
agent_forecast['ds'] = pd.to_datetime(agent_forecast['ds'])

axes[0, 1].plot(pd.to_datetime(df['ds']), df['y'], 'b-', label='Historical', linewidth=2)
axes[0, 1].plot(agent_forecast['ds'], agent_forecast['y'], 'r--', label='Forecast', linewidth=2)

# Add prediction intervals if available
if 'lo-80' in agent_forecast.columns:
    axes[0, 1].fill_between(agent_forecast['ds'], agent_forecast['lo-80'], agent_forecast['hi-80'], 
                           alpha=0.3, color='red', label='80% PI')
if 'lo-95' in agent_forecast.columns:
    axes[0, 1].fill_between(agent_forecast['ds'], agent_forecast['lo-95'], agent_forecast['hi-95'], 
                           alpha=0.2, color='red', label='95% PI')

axes[0, 1].set_title('Agent Forecast with Prediction Intervals')
axes[0, 1].set_xlabel('Date')
axes[0, 1].set_ylabel('Air Passengers')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

# Plot 3: Model comparison
colors = ['red', 'green', 'blue', 'orange', 'purple']
axes[1, 0].plot(pd.to_datetime(df['ds']), df['y'], 'k-', label='Historical', linewidth=2, alpha=0.7)

for i, model in enumerate(forecasts['model'].unique()):
    model_data = forecasts[forecasts['model'] == model]
    model_data['ds'] = pd.to_datetime(model_data['ds'])
    axes[1, 0].plot(model_data['ds'], model_data['y'], 
                   color=colors[i % len(colors)], linestyle='--', 
                   label=model, linewidth=2)

axes[1, 0].set_title('Model Comparison')
axes[1, 0].set_xlabel('Date')
axes[1, 0].set_ylabel('Air Passengers')
axes[1, 0].legend()
axes[1, 0].grid(True, alpha=0.3)

# Plot 4: Model performance metrics
metrics_df.set_index('model')[['mae', 'rmse']].plot(kind='bar', ax=axes[1, 1])
axes[1, 1].set_title('Model Performance Metrics')
axes[1, 1].set_ylabel('Error')
axes[1, 1].legend(['MAE', 'RMSE'])
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Key Takeaways

1. **Data Format**: TimeCopilot requires data in a specific format with `unique_id`, `ds`, and `y` columns
2. **Agent Interface**: The TimeCopilot agent provides intelligent model selection and natural language explanations
3. **Direct Forecaster**: For more control, use TimeCopilotForecaster with specific models
4. **Model Evaluation**: Cross-validation helps assess model performance
5. **Visualization**: Comprehensive plots help understand patterns and forecasts

In the next notebook, we'll explore more advanced features like working with multiple time series and custom model configurations.