# Getting Started with Time Series Models on IBM WatsonX

This notebook demonstrates using the WatsonX SDK to perform inference calls against a model hosted remotely on [WatsonX](https://www.ibm.com/products/watsonx-ai).

### Install dependencies

> **NOTE**: When running this recipe in [Colab](https://colab.research.google.com/), you may see an error about dependency conflicts with `google-colab 1.0.0`. You can safely ignore this error.

In [None]:
!pip install git+https://github.com/ibm-granite-community/utils
!pip install ibm-watsonx-ai
!pip install matplotlib

In [None]:
import pprint

import matplotlib.pyplot as plt
import pandas as pd
from ibm_granite_community.notebook_utils import get_env_var
from ibm_watsonx_ai import APIClient, Credentials
from ibm_watsonx_ai.foundation_models import TSModelInference
from ibm_watsonx_ai.foundation_models.schema import TSForecastParameters

### Provide the environment variables

There are three ways to provide the environment variables required. In order of precedence:

1. Directly as an environment variable in the python environment where the jupyter notebook is running.
2. As a Google Colab secret, if you are running the notebook in Colab.
3. Supplied by the user in a prompt during execution of the notebook.

#### Provide your API Key

Obtain your `WATSONX_APIKEY` by generating a [Platform API Key](https://www.ibm.com/docs/en/watsonx/watsonxdata/1.0.x?topic=started-generating-api-keys) on the watsonx.data web client.

#### Provide your Project Id

Get your `WATSONX_PROJECT_ID` from the [WatsonX](https://www.ibm.com/watsonx) web client by following [these instructions](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/fm-project-id.html?context=wx).

#### Provide your base WatsonX URL

Get your `WATSONX_URL` by viewing the details for the service instance from the Cloud Pak for Data web client, as described in [these watsonx.ai setup instructions](https://ibm.github.io/watsonx-ai-python-sdk/setup_cpd.html).

As an example, your `WATSONX_URL` may be `https://us-south.ml.cloud.ibm.com` for the Dallas zone.

In [None]:
credentials = Credentials(
    api_key=get_env_var("WATSONX_APIKEY"),
    url=get_env_var("WATSONX_URL"),
)
client = APIClient(credentials)
client.set.default_project(get_env_var("WATSONX_PROJECT_ID"))

In [None]:
for model in client.foundation_models.get_time_series_model_specs()["resources"]:
    pprint.pp("--------------------------------------------------")
    pprint.pp(f'model_id: {model["model_id"]}')
    pprint.pp(f'functions: {model["functions"]}')
    pprint.pp(f'long_description: {model["long_description"]}')
    pprint.pp(f'label: {model["label"]}')

In [None]:
ts_model_id = client.foundation_models.TimeSeriesModels.GRANITE_TTM_512_96_R2

ts_model = TSModelInference(model_id=ts_model_id, api_client=client)
context_length = 512
prediction_length = 20

### Download the data

We'll work with a [bike sharing dataset](https://archive.ics.uci.edu/dataset/275/bike+sharing+dataset) available from the UCI Machine learning repository. This dataset includes the count of rental bikes between the years 2011 and 2012 in the Capital bike share system with the corresponding weather and seasonal information.

You can download the source code to a temporary directory by running the following commands. Later you can clean up any downloaded files by removing the `temp` folder.

In [None]:
%%bash
# curl https://archive.ics.uci.edu/static/public/275/$BIKE_SHARING -o $BIKE_SHARING && \
BIKE_SHARING=bike+sharing+dataset.zip
test -d temp || ( \
  mkdir -p temp && \
  cd temp && \
    wget https://archive.ics.uci.edu/static/public/275/$BIKE_SHARING -O $BIKE_SHARING && \
    unzip -o $BIKE_SHARING && \
  rm -f $BIKE_SHARING && \
  cd - \
) && ls -l temp/

In [None]:
DATA_FILE_PATH = "temp/hour.csv"

### Read in the data

We parse the CSV into a pandas dataframe, filling in any null values, and create a single window containing `context_length` time points. We ensure the timestamp column is a UTC datetime.

In [None]:
timestamp_column = "dteday"
target_columns = ["casual", "registered"]

# Read in the data from the downloaded file.
input_df = pd.read_csv(DATA_FILE_PATH, parse_dates=[timestamp_column])

# Fix missing hours in original dataset date column
input_df[timestamp_column] = input_df[timestamp_column] + input_df.hr.apply(lambda x: pd.Timedelta(x, unit="hr"))

# Show the last few rows of the dataset.
input_df[timestamp_column] = input_df[timestamp_column].apply(lambda x: x.isoformat())
input_df.tail()

### Create the dataset for the request

Rather than making a single request by passing one context length of the data to the watsonx.ai service, we select a few starting indices and provide a single payload to make multiple forecasts at once. We add an id column to distinguish these context windows.

In the cell below, we construct the `input_data` DataFrame containing the context windows, while `ground_truth_data` contains the true values of the target columns.

In [None]:
start_index = [2173, 10635, 10935, 14239]  # randomly chosen starting points
id_column = "id"

input_data = []
ground_truth_data = []
for i in start_index:
    df = input_df.iloc[i : i + context_length, :].copy()
    df[id_column] = f"id_{i}"
    input_data.append(df)
    df = input_df.iloc[i + context_length : i + context_length + prediction_length, :].copy()
    df[id_column] = f"id_{i}"
    ground_truth_data.append(df)

input_data = pd.concat(input_data)
ground_truth_data = pd.concat(ground_truth_data)

In [None]:
input_data.head()

### Build the arguments for the forecast request

We first construct the forecast parameters structure and then call the forecast method. The forecast parameters specify the id column, timestamp column, target columns, frequency, and prediction length.

In [None]:
forecasting_params = TSForecastParameters(
    id_columns=[id_column],
    timestamp_column=timestamp_column,
    freq="1h",
    target_columns=target_columns,
    prediction_length=prediction_length,
)

### Make the forecast request and plot results

Here we use the forecast method to get the forecast for our data, then we plot the results.

In [None]:
results = ts_model.forecast(data=input_data, params=forecasting_params)
df_out = pd.DataFrame(results["results"][0], columns=[id_column, timestamp_column] + target_columns)
df_out.head()

In [None]:
# Set a more beautiful style
plt.style.use("seaborn-v0_8-whitegrid")

# how much history to plot
history = 60

grps = input_data.groupby(id_column)

num_plots = len(grps)
fig, axs = plt.subplots(num_plots, 1, figsize=(10, 2 * num_plots))


for i, (grp_name, grp) in enumerate(grps):
    # concatenate ground truth historical data with the ground truth for the future targets
    gt = ground_truth_data[ground_truth_data[id_column] == grp_name]
    gt_dt = pd.concat([grp[timestamp_column].iloc[-history:], gt[timestamp_column]])
    gt_val = pd.concat([grp[target_columns[0]].iloc[-history:], gt[target_columns[0]]])

    # plot ground truth
    axs[i].plot(pd.to_datetime(gt_dt), gt_val, label="Ground truth", linestyle="-", color="blue", linewidth=2)

    # plot forecasts
    pred = df_out[df_out[id_column] == grp_name]
    axs[i].plot(
        pd.to_datetime(pred[timestamp_column]),
        pred[target_columns[0]],
        label="Prediction",
        linestyle="--",
        color="orange",
        linewidth=2,
    )

    axs[i].legend()
    axs[i].set_title(grp_name)

plt.tight_layout()