# Custom Time Series Model Tutorial with DigitalHub

This notebook demonstrates how to build, train, and serve a custom time series forecasting model using Darts (a Python library for time series) with the DigitalHub SDK. We'll work with the Air Passengers dataset, train a NBEATS model, and deploy it as a REST API service.

## Overview
- **Data Processing**: Load and prepare the Air Passengers time series dataset
- **Model Training**: Train a NBEATS deep learning model for time series forecasting
- **Model Packaging**: Package the trained model for deployment
- **Model Serving**: Deploy the model as a REST API with custom serving logic
- **Orchestrate**: Create a workflow pipeline to automate the entire process

## Setup and Function Definitions

First, we'll create the necessary directory structure and define all the functions we'll need for our time series pipeline. All functions will be stored in a single `src/functions.py` file for easy management.

In [None]:
from pathlib import Path

Path("src").mkdir(exist_ok=True)

### Function Definitions

This cell creates our main functions file with the following components:

- **`train_model`**: Trains a NBEATS model on the Air Passengers dataset with evaluation metrics
- **`init_context`**: Initializes the serving context by loading the trained model
- **`serve_predictions`**: Serves time series predictions via REST API

The functions use Darts library for time series modeling and implement custom serving logic for real-time forecasting.

In [None]:
%%writefile "src/functions.py"
import json
from zipfile import ZipFile

import pandas as pd
from darts import TimeSeries
from darts.datasets import AirPassengersDataset
from darts.metrics import mae, mape, smape
from darts.models import NBEATSModel
from digitalhub_runtime_python import handler


@handler(outputs=["model"])
def train_model(project):
    """
    Train a NBEATS model on the Air Passengers dataset
    """
    # Load Air Passengers dataset
    series = AirPassengersDataset().load()
    train, test = series[:-36], series[-36:]

    # Configure and train NBEATS model
    model = NBEATSModel(input_chunk_length=24, output_chunk_length=12, n_epochs=200, random_state=0)
    model.fit(train)

    # Make predictions for evaluation
    pred = model.predict(n=36)

    # Save model artifacts
    model.save("predictor_model.pt")
    with ZipFile("predictor_model.pt.zip", "w") as z:
        z.write("predictor_model.pt")
        z.write("predictor_model.pt.ckpt")

    # Calculate metrics
    metrics = {"mape": mape(test, pred), "smape": smape(test, pred), "mae": mae(test, pred)}

    # Register model in DigitalHub
    model_artifact = project.log_model(
        name="air-passengers-forecaster",
        kind="model",
        source="predictor_model.pt.zip",
        algorithm="darts.models.NBEATSModel",
        framework="darts",
    )
    model_artifact.log_metrics(metrics)
    return model_artifact


def init_context(context, model_key):
    """
    Initialize serving context by loading the trained model
    """
    model = context.project.get_model(model_key)
    path = model.download()
    local_path_model = "extracted_model/"

    # Extract model from zip file
    with ZipFile(path, "r") as zip_ref:
        zip_ref.extractall(local_path_model)

    # Load the NBEATS model
    input_chunk_length = 24
    output_chunk_length = 12
    name_model_local = local_path_model + "predictor_model.pt"
    mm = NBEATSModel(input_chunk_length, output_chunk_length).load(name_model_local)

    setattr(context, "model", mm)


def serve_predictions(context, event):
    """
    Serve time series predictions via REST API
    """
    if isinstance(event.body, bytes):
        body = json.loads(event.body)
    else:
        body = event.body

    context.logger.info(f"Received event: {body}")
    inference_input = body["inference_input"]

    # Convert input to Darts TimeSeries format
    pdf = pd.DataFrame(inference_input)
    pdf["date"] = pd.to_datetime(pdf["date"], unit="ms")

    ts = TimeSeries.from_dataframe(pdf, time_col="date", value_cols="value")

    # Make predictions
    output_chunk_length = 12
    result = context.model.predict(n=output_chunk_length * 2, series=ts)

    # Convert result to JSON format
    jsonstr = result.pd_dataframe().reset_index().to_json(orient="records")
    return json.loads(jsonstr)


## Project Initialization

Now we'll initialize our DigitalHub project using consistent naming with other tutorials.

In [None]:
import digitalhub as dh

p_name = "tutorial-project"
project = dh.get_or_create_project(p_name)

## Step 1: Model Training

We'll create and run our NBEATS training function. This will train a deep learning model for time series forecasting on the Air Passengers dataset, which contains monthly passenger numbers from 1949 to 1960.

In [None]:
train_fn = project.new_function(
    name="train-time-series-model",
    kind="python",
    python_version="PYTHON3_10",
    code_src="src/functions.py",
    handler="train_model",
)

In [None]:
train_build = train_fn.run(
    "build",
    instructions=[
        "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu",
        "pip3 install darts==0.30.0 patsy scikit-learn",
    ],
    wait=True,
)

In [None]:
train_run = train_fn.run("job", wait=True)

Let's examine the trained model and its performance metrics:

In [None]:
model = train_run.output("model")
print("Time series model metrics:")
print(model.spec.get("metrics", {}))

## Step 2: Model Serving

Now we'll deploy our trained time series model as a REST API service. This involves creating a serving function with custom initialization and prediction logic.

In [None]:
serve_func = project.new_function(
    name="serve-time-series-model",
    kind="python",
    python_version="PYTHON3_10",
    code_src="src/functions.py",
    handler="serve_predictions",
    init_function="init_context",
)

In [None]:
run_build_model_serve = serve_func.run(
    "build",
    instructions=[
        "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu",
        "pip3 install darts==0.30.0 patsy scikit-learn",
    ],
    wait=True,
)

In [None]:
serve_run = serve_func.run("serve", init_parameters={"model_key": model.key}, labels=["time-series-service"], wait=True)

### Test the Time Series API

Let's test our deployed model by sending time series data for forecasting. We'll use the last 24 data points from the Air Passengers dataset as input.

In [None]:
# Install darts locally for testing (if not already installed)
%pip install darts==0.30.0 --quiet

import json
from datetime import datetime
from darts.datasets import AirPassengersDataset

# Load test data
series = AirPassengersDataset().load()
val = series[-24:]  # Last 24 points for prediction
json_value = json.loads(val.to_json())

# Prepare input data in the expected format
data = map(
    lambda x, y: {"value": x[0], "date": datetime.timestamp(datetime.strptime(y, "%Y-%m-%dT%H:%M:%S.%f")) * 1000},
    json_value["data"],
    json_value["index"],
)
inputs = {"inference_input": list(data)}

In [None]:
# Make prediction request
result = serve_run.invoke(json=inputs).json()
print("Time series forecast result:")
print(f"Predicted {len(result)} future time points")
print("Sample predictions:", result[:3])  # Show first 3 predictions

## Pipeline Orchestration

Now let's create a workflow that orchestrates the entire time series modeling process. This pipeline uses Hera (Argo Workflows) to define the execution flow:

1. **A**: Build training environment 
2. **B**: Build serving environment 
3. **C**: Train the time series model
4. **D**: Deploy model service

The pipeline handles the complex dependencies and environment setup required for the Darts time series library.

In [None]:
%%writefile "src/pipeline.py"
from digitalhub_runtime_hera.dsl import step
from hera.workflows import DAG, Workflow


def pipeline():
    with Workflow(entrypoint="dag") as w:
        with DAG(name="dag"):
            A = step(
                template={
                    "action": "build",
                    "instructions": [
                        "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu",
                        "pip3 install darts==0.30.0 patsy scikit-learn",
                    ],
                },
                function="train-time-series-model",
            )
            B = step(
                template={
                    "action": "build",
                    "instructions": [
                        "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu",
                        "pip3 install darts==0.30.0 patsy scikit-learn",
                    ],
                },
                function="serve-time-series-model",
            )
            C = step(
                template={"action": "job"},
                function="train-time-series-model",
                outputs=["model"],
            )
            D = step(
                template={"action": "serve", "init_parameters": {"model_key": "{{inputs.parameters.model}}"}},
                function="serve-time-series-model",
                inputs={"model": C.get_parameter("model")},
            )
            [A, B] >> C >> D
    return w


### Execute the Complete Pipeline

Finally, let's create and execute our complete time series pipeline workflow. This will run environment setup, training, and serving deployment in an automated, orchestrated manner.

In [None]:
workflow = project.new_workflow(
    name="time-series-pipeline",
    kind="hera",
    code_src="src/pipeline.py",
    handler="pipeline",
)

In [None]:
wf_build = workflow.run("build", wait=True)

In [None]:
wf_run = workflow.run("pipeline", wait=True)