# Sktime marketplace: creating 2nd-party libraries 

Estimated time: 20 min

## Introduction

How to create a 2nd-party library for sktime and share it with the community?




## Agenda

* New sktime model marketplace
* Developing the 2nd-party library
    * Overview of creating a custom forecaster
    * Implementing the model
    * Testing the model


## New sktime model marketplace

* The [new documentation table](https://www.sktime.net/en/stable/estimator_overview.html#filter=all&tags=%7B%7D) allows to filter models by tags, users interested in a certain capability can find models that provide it (including yours!)

<center>
<img src="img/sktime-doc-estimator-overview.png" style="max-width: 50%; display: block;">
</center>

Now it is easier than ever to create your own model and share it with the community.

<center>
<img src="img/sktime-2nd-party-lib.png" alt="sktime ecosystem" style="max-width: 50%;">
</center>







**Benefits**: productivity, scalability, reproducibility, reach more users, etc.

* Out-of-the-box compatibility with the sktime ecosystem and the hundred models available
* Focus on the model, not preprocessing logic
* You implement one model -> get multivariate forecasting, tuning strategies, cross validation implementation,  for free
* Zero-friction for sktime users when trying your model

## How it looks to the user

```python
from sktime.forecasting.yourlibrary import YourModel

model = YourModel()
model.fit(y_train)
y_pred = model.predict(fh)

```

## Developing the 2nd-party library

Custom estimators can be implemented directly in sktime, in a separate library or even in closed-source code. In this tutorial we will show you how to create a 2nd-party library.

Overview:

0. Have an idea 💡
1. Implement the model in your library as forecaster/estimator
2. Run api-compatibility tests in your CI
3. Add model placeholder in sktime documentation

#### Overview of creating a custom forecaster

1. Inherit from `BaseForecaster`
2. Define the model metadata (tags)
3. Implement the private `_fit` and `_predict` methods
4. Add `get_test_params` method for testing purposes


There's an [extension template](https://github.com/sktime/sktime/blob/main/extension_templates/forecasting.py) that is a good starting point - you can copy, paste and modify it according to your needs.

The tag system is a key feature of sktime  serves not only to handle the preprocessing step before the user data reach your code, but also to provide a better user experience in the documentation.



In [1]:
from sktime.registry import all_tags

all_tags(estimator_types="forecaster", as_dataframe=True).head()

Unnamed: 0,name,scitype,type,description
0,X-y-must-have-same-index,"[forecaster, regressor, transformer]",bool,do X/y in fit/update and X/fh in predict have ...
1,X_inner_mtype,estimator,"(list, str)",which machine type(s) is the internal _fit/_pr...
2,authors,object,"(list, str)","list of authors of the object, each author a G..."
3,capability:categorical_in_X,forecaster,bool,can the forecaster natively handle categorical...
4,capability:contractable,estimator,bool,"contract time setting, does the estimator supp..."


In [2]:
from sktime.forecasting.base import BaseForecaster


class MyForecaster(BaseForecaster):
    _tags = {
        "y_inner_mtype": "pd.Series",  # Valid values: "pd.Series", "pd.DataFrame", "pd-multiindex", numpy3D
        "X_inner_mtype": "pd.DataFrame",
        "scitype:y": "univariate",
        "ignores-exogeneous-X": False,
        "requires-fh-in-fit": True,
        "X-y-must-have-same-index": True,
        "enforce_index_type": None,
        "handles-missing-data": False,
        "capability:insample": True,
        "capability:pred_int": False,
        "capability:pred_int:insample": True,
        "authors": ["author1", "author2"],
        "maintainers": ["maintainer1", "maintainer2"],
        "python_version": None,
        "python_dependencies": None,
    }



##### Private methods to implement (`_fit`, `_predict`)

<img src="img/sktime-public-private-interface-diagram.png" alt="sktime interface" width="700" style="display: block; margin-left: auto; margin-right: auto;">



### Practical example: residual boosting

* As a toy example, let's say you've read an interesting paper about gradient boosting and would like to implement something similar with sktime. 

* The idea is to fit many models sequentially, each one predicting the residuals of the previous one. 

* The final prediction is the sum of the predictions of all models.

<img src="img/residual-boosting.png" width="70%" />

In [3]:
from typing import Optional

import pandas as pd
from sktime.forecasting.base import BaseForecaster
from sktime.forecasting.exp_smoothing import ExponentialSmoothing


class ResidualBoostingForecaster(BaseForecaster):
    """Residual Boosting Forecaster.

    This forecaster uses a base forecaster to fit the residuals of the previous
    base forecaster instances.

    Parameters
    ----------
    forecaster : sktime forecaster
        The base forecaster instance to use.
    num_iter: int, default=2
        Number of forecasters to fit to the residuals of the previous forecasters.
        Should be at least 1
    """

    _tags = {
        # Tags that we won't clone from base forecaster
        "y_inner_mtype": "pd.Series",
        "X_inner_mtype": "pd.DataFrame",
        "scitype:y": "univariate",
        "capability:pred_int": False,
        "capability:pred_int:insample": False,
        "capability:insample": True,
        # Tags to clone
        "ignores-exogeneous-X": False,
        "requires-fh-in-fit": True,
        "X-y-must-have-same-index": True,
        "enforce_index_type": None,
        "handles-missing-data": False,
        # Other tags
        "authors": ["author1"],
        "maintainers": ["author1"],
        "python_version": None,
        "python_dependencies": None,
    }

    # todo: add any hyper-parameters and components to constructor
    def __init__(
        self,
        base_forecaster: Optional[BaseForecaster] = None,
        num_iter: int = 2
    ):
        # estimators should precede parameters
        #  if estimators have default values, set None and initialize below

        self.base_forecaster = base_forecaster
        self.num_iter = num_iter

        super().__init__()

        # Handle default base forecaster
        if base_forecaster is None:
            self._base_forecaster = ExponentialSmoothing()
        else:
            self._base_forecaster = base_forecaster

        # Parameter checking logic
        if not self._base_forecaster.get_tag("capability:insample"):
            raise ValueError("Base forecaster must have capability:insample")

        if num_iter < 1:
            raise ValueError("num_iter must be at least 1")

        # if tags of estimator depend on component tags, set them
        self.clone_tags(
            self._base_forecaster,
            [
                "capability:exogenous",
                "ignores-exogeneous-X",
                "requires-fh-in-fit",
                "X-y-must-have-same-index",
                "enforce_index_type",
                "handles-missing-data",
            ],
        )

    def _fit(self, y, X, fh):
        """Fit forecaster to training data.

        private _fit containing the core logic, called from fit

        Writes to self:
            Sets fitted model attributes ending in "_".

        Parameters
        ----------
        y : sktime time series object
            guaranteed to be of pd.Series
        fh : ForecastingHorizon or None, optional (default=None)
            The forecasting horizon with the steps ahead to to predict.
            Required (non-optional) here if self.get_tag("requires-fh-in-fit")==True
            Otherwise, if not passed in _fit, guaranteed to be passed in _predict
        X : sktime time series object, optional (default=None)
            guaranteed to be pd.DataFrame, or None


        Returns
        -------
        self : reference to self
        """

        timeseries_to_predict = y.copy()
        self.forecasters_ = []
        
        # Initial prediction is 0
        y_pred = pd.Series(0, index=y.index)
        
        for _ in range(self.num_iter):
            forecaster = self._base_forecaster.clone()
            forecaster.fit(timeseries_to_predict, X, fh)

            # Forecast insample
            insample_fh = y.index.get_level_values(-1).unique()
            insample_preds = forecaster.predict(fh=insample_fh, X=X)
            y_pred += insample_preds.values
            
            # Get residuals
            residuals = y - y_pred.values
            timeseries_to_predict = - residuals
            self.forecasters_.append(forecaster)

        return self

    def _predict(self, fh, X):
        """Forecast time series at future horizon.

        private _predict containing the core logic, called from predict

        State required:
            Requires state to be "fitted".

        Accesses in self:
            Fitted model attributes ending in "_"
            self.cutoff

        Parameters
        ----------
        fh : ForecastingHorizon or None, optional (default=None)
            The forecasting horizon with the steps ahead to to predict.
            Required (non-optional) here if self.get_tag("requires-fh-in-fit")==True
            Otherwise, if not passed in _fit, guaranteed to be passed in _predict
        X : sktime time series object, optional (default=None)
            guaranteed to be pd.DataFrame, or None

        Returns
        -------
        y_pred : sktime time series object
            should be of the same type as seen in _fit, as in "y_inner_mtype" tag
            Point predictions
        """
        y_pred = 0
        for forecaster in self.forecasters_:
            y_pred += forecaster.predict(fh, X)

        return y_pred

    def _update(self, y, X=None, update_params=True):
        """Update time series to incremental training data.

        private _update containing the core logic, called from update

        State required:
            Requires state to be "fitted".

        Accesses in self:
            Fitted model attributes ending in "_"
            self.cutoff

        Writes to self:
            Sets fitted model attributes ending in "_", if update_params=True.
            Does not write to self if update_params=False.

        Parameters
        ----------
        y : sktime time series object
            guaranteed to be of an mtype in self.get_tag("y_inner_mtype")
            Time series with which to update the forecaster.
            if self.get_tag("scitype:y")=="univariate":
                guaranteed to have a single column/variable
            if self.get_tag("scitype:y")=="multivariate":
                guaranteed to have 2 or more columns
            if self.get_tag("scitype:y")=="both": no restrictions apply
        X :  sktime time series object, optional (default=None)
            guaranteed to be of an mtype in self.get_tag("X_inner_mtype")
            Exogeneous time series for the forecast
        update_params : bool, optional (default=True)
            whether model parameters should be updated

        Returns
        -------
        self : reference to self
        """

        timeseries_to_predict = y.copy()
        y_pred = pd.Series(0, index=y.index)
        for forecaster in self.forecasters_:

            forecaster.update(timeseries_to_predict, X, update_params=update_params)

            # Forecast insample
            insample_fh = y.index.get_level_values(-1).unique()
            insample_preds = forecaster.predict(fh=insample_fh, X=X)
            y_pred += insample_preds.values
            # Get residuals
            residuals = y - y_pred.values
            timeseries_to_predict = - residuals

    @classmethod
    def get_test_params(cls, parameter_set="default"):
        """Return testing parameter settings for the estimator.

        Parameters
        ----------
        parameter_set : str, default="default"
            Name of the set of test parameters to return, for use in tests. If no
            special parameters are defined for a value, will return `"default"` set.
            There are currently no reserved values for forecasters.

        Returns
        -------
        params : dict or list of dict, default = {}
            Parameters to create testing instances of the class
            Each dict are parameters to construct an "interesting" test instance, i.e.,
            `MyClass(**params)` or `MyClass(**params[i])` creates a valid test instance.
            `create_test_instance` uses the first (or only) dictionary in `params`
        """

        return [
            {"base_forecaster": None,
             "num_iter": 1},
            {
                "base_forecaster": ExponentialSmoothing(),
                "num_iter": 10,
            },
        ]

### Testing the model

In [4]:
import pytest
from residualboosting import ResidualBoostingForecaster
from sktime.utils.estimator_checks import check_estimator, parametrize_with_checks


@parametrize_with_checks(ResidualBoostingForecaster)
def test_sktime_api_compliance(obj, test_name):
    check_estimator(obj, tests_to_run=test_name, raise_exceptions=True)

In [5]:
import pytest
pytest.main(["../example_package/tests/", "-v"])

platform darwin -- Python 3.10.14, pytest-8.3.2, pluggy-1.5.0 -- /Users/fangelim/Documents/personal_workspace/sktime-tutorial-euroscipy2024/.venv/bin/python
cachedir: .pytest_cache
rootdir: /Users/fangelim/Documents/personal_workspace/sktime-tutorial-euroscipy2024/example_package
configfile: pyproject.toml
plugins: anyio-4.4.0
[1mcollecting ... [0mcollected 55 items

../example_package/tests/test_check_estimator.py::test_sktime_api_compliance[ResidualBoostingForecaster-test_clone] [32mPASSED[0m[32m [  1%][0m
../example_package/tests/test_check_estimator.py::test_sktime_api_compliance[ResidualBoostingForecaster-test_constructor] [32mPASSED[0m[32m [  3%][0m
../example_package/tests/test_check_estimator.py::test_sktime_api_compliance[ResidualBoostingForecaster-test_create_test_instance] [32mPASSED[0m[32m [  5%][0m
../example_package/tests/test_check_estimator.py::test_sktime_api_compliance[ResidualBoostingForecaster-test_create_test_instances_and_names] [32mPASSED[0m[32m 

<ExitCode.OK: 0>

### Adding it to sktime

To add the model to sktime's documentation, you need to add a placeholder, similar to what was done with [Prophetverse](https://github.com/sktime/sktime/blob/c81f6aa605d3a679d13d9e16f52b735980acd87d/sktime/forecasting/prophetverse.py#L32)

In [6]:
from sktime.forecasting.base._delegate import _DelegatedForecaster


def placeholder(cls):
    """Delegate to your model if installed, otherwise use placeholder."""
    from sktime.utils.dependencies import _check_soft_dependencies

    try:
        if _check_soft_dependencies("residualboosting>=0.1.0", severity="none"):
            from residualboosting import ResidualBoostingForecaster

            return ResidualBoostingForecaster
    except Exception:  # noqa: S110
        pass

    # else we return the placeholder, which is a delegator
    return cls

@placeholder
class ResidualBoostingForecaster(_DelegatedForecaster):
    """Residual Boosting Forecaster.

    This forecaster uses a base forecaster to fit the residuals of the previous
    base forecaster instances.

    Parameters
    ----------
    forecaster : sktime forecaster
        The base forecaster instance to use.
    num_iter: int, default=2
        Number of forecasters to fit to the residuals of the previous forecasters.
        Should be at least 1
    """

    _tags = {
        # Tags that we won't clone from base forecaster
        "y_inner_mtype": "pd.Series",
        "X_inner_mtype": "pd.DataFrame",
        "scitype:y": "univariate",
        "capability:pred_int": False,
        "capability:pred_int:insample": False,
        "capability:insample": True,
        # Tags to clone
        "ignores-exogeneous-X": False,
        "requires-fh-in-fit": True,
        "X-y-must-have-same-index": True,
        "enforce_index_type": None,
        "handles-missing-data": False,
        "authors": ["felipeangelimvieira"],
        "maintainers": ["felipeangelimvieira"],
        "python_version": None,
        "python_dependencies": None,
    }

## Recap

- New sktime's estimator overview 
- Extend BaseForecaster and implement your library
- Run api-compatibility tests in your CI
- Add model to sktime's documentation