## Time

>The objective of this notebook is to implement utilities to ease the time handling

In [None]:
#| default_exp utils.time

In [None]:
#| hide
from nbdev.showdoc import *

  import pkg_resources,importlib


In [None]:
#| export
import pandas as pd
import xarray as xr

## Forecast time handling

Time series forecasting often involves multiple time indices (e.g., run time and forecast time) which can lead to alignment errors if not handled carefully. This class provides a unified way to manage these indices and prevent common mistakes in forecast data manipulation. The class provides methods to convert between two indexing ways:
- Forecast horizons as columns (e.g., t+1, t+2 columns)
- Forecast horizons and times as row indices
The first format is convenient for saving data, while the second is better suited for scoring and plotting since it explicitly tracks the actual forecast times.

In [None]:
#| export
class ForecastTimeHandler:
    """
    A utility class for handling forecast time transformations.

    This class provides functionality to manipulate forecast time between different formats,
    specifically handling the conversion between columnar forecast horizons and stacked time series formats.
    It manages forecast horizons (e.g., 't+1', 't+2') and their corresponding timestamps.
    """
    def __init__(
            self,
            run_time_col_name: str = "run_time", # Name of the time index that represent the time from which the forecast is made
            stack_col_name: str = "pred" # Name of the column when columns are stacked
            ):
        self.stack_col_name = stack_col_name
        self.run_time_col_name = run_time_col_name

    def stack(self, df: pd.DataFrame) -> pd.DataFrame:
        """Stack the forecast horizon as index and add forecast time as index"""
        df = df.copy()
        if df.columns.name is None:
            df.columns.name = "forecast_horizon"
        df = self.transpose_forecast_horizon_as_index(df)
        df = self.add_forecast_time_as_index(df)
        return df
    
    def transpose_forecast_horizon_as_index(self, df: pd.DataFrame) -> pd.DataFrame:
        df=df.stack()
        df = df.to_frame(self.stack_col_name)
        return df

    def add_forecast_time_as_index(self, df: pd.DataFrame) -> pd.DataFrame:
        def get_daily_timedeltas(forecast_horizons):
            """Extract timedelta day values from forecast horizons starting with 't+'"""
            return [pd.Timedelta(days=int(fh.replace("t+", ""))) for fh in forecast_horizons if fh.startswith('t+')]
        df = df.copy()
        forecast_horizon = df.index.get_level_values("forecast_horizon")
        timedeltas = get_daily_timedeltas(forecast_horizon)
        df["forecast_time"] = df.index.get_level_values(self.run_time_col_name) + pd.Index(timedeltas)
        df.set_index("forecast_time", inplace=True, append=True)
        return df
    
    def unstack(self, df: pd.DataFrame) -> pd.DataFrame:
        """Convert stacked forecast horizon index back to horizon-as-columns format"""
        return df.reset_index("forecast_time", drop=True)[self.stack_col_name].unstack("forecast_horizon")
    
    def align(self, pred, obs, stack_pred=False):
        """Align the predictions and observations by forecast time"""
        obs, pred = obs.copy(), pred.copy()
        if stack_pred:
            pred = self.stack(pred)
        obs_index_name = obs.index.name
        obs.index.name = "forecast_time"
        pred, obs = pred.align(obs, join="outer", axis=0)
        obs.index.name = obs_index_name
        return pred, obs
    
    def join(self, pred, obs, stack_pred=False):
        """Join the predictions and observations by forecast time"""
        obs, pred = obs.copy(), pred.copy()
        if stack_pred:
            pred = self.stack(pred)
        obs.index.name = "forecast_time"
        return pred.join(obs, on="forecast_time")

    def align_as_xarray(self, pred, obs):
        """Align the predictions and observations by forecast horizon and return as xarray"""
        if 1 != len(obs.columns):
            raise ValueError("Observations must have only one column")
        obs_col = obs.columns[0]
        pred_col = self.stack_col_name

        obs, pred = obs.copy(), pred.copy()
        pred = self.stack(pred)
        obs.index.name = "forecast_time"
        pred, obs = self.align(pred, obs)

        self.stack_col_name = obs_col
        obs = self.unstack(obs)
        obs = obs.to_xarray().to_array("forecast_horizon", name=obs_col)
        self.stack_col_name = pred_col
        pred = self.unstack(pred)
        pred = pred.to_xarray().to_array("forecast_horizon", name=pred_col)
        
        return pred, obs
    
    def join_as_xarray(self, pred, obs):
        """Align the predictions and observations by forecast horizon and join them as xarray"""
        pred, obs = self.align_as_xarray(pred, obs)
        return xr.merge([pred, obs])


We will first create some syntethic data for forecast

In [None]:
forecast_data = pd.DataFrame(
    {
        "t+1": [1, 2, 3],
        "t+2": [4, 5, 6],
        "t+3": [7, 8, 9]
    }, 
    index=pd.date_range("2024-01-01", periods=3, freq="D", name="run_time"),
)
forecast_data.head()

Unnamed: 0_level_0,t+1,t+2,t+3
run_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2024-01-01,1,4,7
2024-01-02,2,5,8
2024-01-03,3,6,9


We will do the same for observations

In [None]:
observations = pd.DataFrame([1, 2, 3, 4, 5, 6], index=pd.date_range("2024-01-02", periods=6, freq="D", name="time"), columns=["obs"])
observations.head(3)

Unnamed: 0_level_0,obs
time,Unnamed: 1_level_1
2024-01-02,1
2024-01-03,2
2024-01-04,3


We will now create an instance of the ForecastTimeHandler class

In [None]:
frcst_time_handler = ForecastTimeHandler(run_time_col_name="run_time", stack_col_name="pred")

We can stack the data to convert from forecast horizons as columns to having forecast time as an index. This format is better suited for scoring and plotting.

In [None]:
show_doc(ForecastTimeHandler.stack)

---

### ForecastTimeHandler.stack

>      ForecastTimeHandler.stack (df:pandas.core.frame.DataFrame)

*Stack the forecast horizon as index and add forecast time as index*

In [None]:
stacked_forecast = frcst_time_handler.stack(forecast_data)
stacked_forecast.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,pred
run_time,forecast_horizon,forecast_time,Unnamed: 3_level_1
2024-01-01,t+1,2024-01-02,1
2024-01-01,t+2,2024-01-03,4
2024-01-01,t+3,2024-01-04,7
2024-01-02,t+1,2024-01-03,2
2024-01-02,t+2,2024-01-04,5


We can simply revert this operation as follows

In [None]:
show_doc(ForecastTimeHandler.unstack)

---

### ForecastTimeHandler.unstack

>      ForecastTimeHandler.unstack (df:pandas.core.frame.DataFrame)

*Convert stacked forecast horizon index back to horizon-as-columns format*

In [None]:
unstacked_forecast = frcst_time_handler.unstack(stacked_forecast)
unstacked_forecast.head()

forecast_horizon,t+1,t+2,t+3
run_time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2024-01-01,1,4,7
2024-01-02,2,5,8
2024-01-03,3,6,9


This operation will rename the columns index name as "forecast_horizon". Lets try few more things we can do. We can align indexes:

In [None]:
show_doc(ForecastTimeHandler.align)

---

### ForecastTimeHandler.align

>      ForecastTimeHandler.align (pred, obs, stack_pred=False)

*Align the predictions and observations by forecast time*

In [None]:
aligned_frcst, aligned_obs = frcst_time_handler.align(forecast_data, observations, stack_pred=True)
aligned_frcst.head(3)


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,pred
run_time,forecast_horizon,forecast_time,Unnamed: 3_level_1
2024-01-01,t+1,2024-01-02,1
2024-01-01,t+2,2024-01-03,4
2024-01-01,t+3,2024-01-04,7


In [None]:
aligned_obs.head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,obs
run_time,forecast_horizon,forecast_time,Unnamed: 3_level_1
2024-01-01,t+1,2024-01-02,1
2024-01-01,t+2,2024-01-03,2
2024-01-01,t+3,2024-01-04,3


We can do the same thing but getting an xarray datarrays as output. Remember that in this case forcast data must be given with forecast horizons as columns

In [None]:
show_doc(ForecastTimeHandler.align_as_xarray)

---

### ForecastTimeHandler.align_as_xarray

>      ForecastTimeHandler.align_as_xarray (pred, obs)

*Align the predictions and observations by forecast horizon and return as xarray*

In [None]:
forcast_ds, obs_ds = frcst_time_handler.align_as_xarray(forecast_data, observations)
forcast_ds

In [None]:
obs_ds

Finally we can also join the data by forecast time index

In [None]:
show_doc(ForecastTimeHandler.join)

---

### ForecastTimeHandler.join

>      ForecastTimeHandler.join (pred, obs, stack_pred=False)

*Join the predictions and observations by forecast time*

In [None]:
joint = frcst_time_handler.join(forecast_data, observations, stack_pred=True)
joint.head(3)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,pred,obs
run_time,forecast_horizon,forecast_time,Unnamed: 3_level_1,Unnamed: 4_level_1
2024-01-01,t+1,2024-01-02,1,1
2024-01-01,t+2,2024-01-03,4,2
2024-01-01,t+3,2024-01-04,7,3


And do the same thing and get a xarray dataset as output

In [None]:
show_doc(ForecastTimeHandler.join_as_xarray)

---

### ForecastTimeHandler.stack

>      ForecastTimeHandler.stack (df:pandas.core.frame.DataFrame)

*Stack the forecast horizon as index and add forecast time as index*

In [None]:
frcst_time_handler.join_as_xarray(forecast_data, observations)

In [None]:
#| hide
import nbdev; nbdev.nbdev_export()