# extending sktime - private extensions and marketplace

Estimated time: 20 min

### **Why "extend"?**

Use case "private plug-in": developing plug&play components in a private code base
* "secret sauce" estimators
* data specific components

Use case "2nd party": open library containing plug&play components
* interoperable API for own algorithm
* making estimator searchable to wide audience!

**Benefits**: productivity, scalability, reproducibility, reach more users, etc.

* **Out-of-the-box compatibility** with the sktime ecosystem and the hundred models available
* Focus on the model, not preprocessing logic
* **implement one model, get rest for free!** multivariate forecasting, tuning strategies, cross-validation
* marketplace: Zero-friction for sktime users to use and find your model!

### Contents
* Overview of creating a custom forecaster
* Implementing the model
* Testing the model


## New sktime model marketplace

* The [new documentation table](https://www.sktime.net/en/stable/estimator_overview.html#filter=all&tags=%7B%7D) allows to filter models by tags, users interested in a certain capability can find models that provide it (including yours!)

<center>
<img src="img/sktime-doc-estimator-overview.png" style="max-width: 75%; display: block;">
</center>

Now it is easier than ever to create your own model and share it with the community.

<center>
<img src="img/sktime-2nd-party-lib.png" alt="sktime ecosystem" style="max-width: 50%;">
</center>

## How it looks to the user

```python
from sktime.forecasting.yourlibrary import YourModel

model = YourModel()
model.fit(y_train)
y_pred = model.predict(fh)

```

## Developing a plug-in

same workflow for

* closed-source code base
* open library - indexed on marketplace ("2nd party")
* directly contributed to sktime ("1st party")

Overview:

0. Have an idea 💡
1. Implement the model in your library as forecaster/estimator
2. Run api-compatibility tests in your CI
3. optional: add model to sktime (1st) or marketplace (2nd party)

#### Overview of implementing the model

1. Inherit from `BaseForecaster` and define the model metadata (tags)
2. Write your `__init__`, `_fit` and `_predict` methods
3. Add `get_test_params` method for testing purposes


start with [extension template](https://github.com/sktime/sktime/blob/main/extension_templates/forecasting.py) - copy, paste, modify it according to your needs

##### Inheriting and defining tags


The tag system is a key feature of sktime  serves not only to handle the preprocessing step before the user data reach your code, but also to provide a better user experience in the documentation.

In [1]:
from sktime.forecasting.base import BaseForecaster

class MyForecaster(BaseForecaster):
    _tags = {
        "y_inner_mtype": "pd.Series",  # Valid values: "pd.Series", "pd.DataFrame", "pd-multiindex", numpy3D
        "X_inner_mtype": "pd.DataFrame",
        "scitype:y": "univariate",
        "ignores-exogeneous-X": False,
        "capability:pred_int": False,
        "authors": ["author1", "author2"],
        "maintainers": ["maintainer1", "maintainer2"],
        "python_version": None,
        "python_dependencies": None,
    }

In [2]:
from sktime.registry import all_tags

all_tags(estimator_types="forecaster", as_dataframe=True).head()

Unnamed: 0,name,scitype,type,description
0,X-y-must-have-same-index,"[forecaster, regressor, transformer]",bool,do X/y in fit/update and X/fh in predict have ...
1,X_inner_mtype,estimator,"(list, str)",which machine type(s) is the internal _fit/_pr...
2,authors,object,"(list, str)","list of authors of the object, each author a G..."
3,capability:categorical_in_X,"[forecaster, transformer]",bool,can the estimator natively handle categorical ...
4,capability:contractable,estimator,bool,"contract time setting, does the estimator supp..."


##### Private methods to implement (`_fit`, `_predict`)

<img src="img/sktime-public-private-interface-diagram.png" alt="sktime implementer contract" width="700" style="display: block; margin-left: auto; margin-right: auto;">

### Practical example: ensemble forecasting

* As a toy example, let's say you want to implement a simple ensemble forecasting model.

* The idea is to fit a set of models, and then combine their predictions to make the final forecast with their mean or median. 

(this, in your repo, not in a notebook)

In [3]:
import pandas as pd
class SimpleEnsembleForecaster(BaseForecaster):
    """Simple ensemble forecaster.
    
    This forecaster fits a list of forecasters to the same training data and
    aggregates their predictions using a simple aggregation function.

    Parameters
    ----------
    forecasters : list of sktime forecasters
        List of forecasters to fit and aggregate.
    agg : str, optional (default="mean")
        Aggregation function to use. Must be one of "mean" or "median".
    """

    _tags = {
        # Model metadata
        "ignores-exogeneous-X": True,
        "requires-fh-in-fit": False,
        "handles-missing-data": False,
        "X_inner_mtype": "pd.DataFrame",
        "y_inner_mtype": "pd.Series",
        "scitype:y": "univariate",
        # Packaging info
        "authors": ["felipeangelimvieira"],
        "maintainers": ["felipeangelimvieira"],
        "python_version": None,
        "python_dependencies": None,
    }

    # todo: add any hyper-parameters and components to constructor
    def __init__(
        self, forecasters, agg = "mean"
    ):
        # estimators should precede parameters
        #  if estimators have default values, set None and initialize below

        self.forecasters = forecasters
        self.agg = agg

        super().__init__()

        # Handle default values, being careful to not overwrite the hyper-parameters
        # as they were passed!

        # Parameter checking logic
        if agg not in ["mean", "median"]:
            raise ValueError(f"agg must be 'mean' or 'median', got {agg}")

        # if tags of estimator depend on component tags, set them
        for forecaster in self.forecasters:
            if forecaster.get_tag("requires-fh-in-fit"):
                self.set_tags({"requires-fh-in-fit": True})

    def _fit(self, y, X, fh):
        """Fit forecaster to training data.

        private _fit containing the core logic, called from fit. 
        Sets fitted model attributes ending in "_".

        Parameters
        ----------
        y : sktime time series object
            guaranteed to be of pd.Series
        fh : ForecastingHorizon or None, optional (default=None)
            The forecasting horizon with the steps ahead to to predict.
        X : sktime time series object, optional (default=None)
            guaranteed to be pd.DataFrame, or None

        Returns
        -------
        self : reference to self
        """

        self.forecasters_ = []
        for forecaster in self.forecasters:
            forecaster = forecaster.clone()
            forecaster.fit(y, X, fh)
            self.forecasters_.append(forecaster)
        return self

    def _predict(self, fh, X):
        """Forecast time series at future horizon.

        private _predict containing the core logic, called from predict

        State required:
            Requires state to be "fitted".

        Accesses in self:
            Fitted model attributes ending in "_"
            self.cutoff

        Parameters
        ----------
        fh : ForecastingHorizon or None, optional (default=None)
            The forecasting horizon with the steps ahead to to predict.
        X : sktime time series object, optional (default=None)
            guaranteed to be pd.DataFrame, or None

        Returns
        -------
        y_pred : sktime time series object
            should be of the same type as seen in _fit, as in "y_inner_mtype" tag
            Point predictions
        """
        y_preds = []
        for forecaster in self.forecasters_:
            y_preds.append(
                forecaster.predict(fh, X)
            )

        aggregated = pd.concat(y_preds, axis=1).agg(self.agg, axis=1)

        # Must keep the name of the original series
        aggregated.name = y_preds[0].name
        return aggregated
        

    @classmethod
    def get_test_params(cls, parameter_set="default"):
        """Return testing parameter settings for the estimator.

        Parameters
        ----------
        parameter_set : str, default="default"
            Name of the set of test parameters to return, for use in tests. If no
            special parameters are defined for a value, will return `"default"` set.
            There are currently no reserved values for forecasters.

        Returns
        -------
        params : dict or list of dict, default = {}
            Parameters to create testing instances of the class
            Each dict are parameters to construct an "interesting" test instance, i.e.,
            `MyClass(**params)` or `MyClass(**params[i])` creates a valid test instance.
            `create_test_instance` uses the first (or only) dictionary in `params`
        """
        from sktime.forecasting.exp_smoothing import ExponentialSmoothing

        return [
            {
                "forecasters": [ExponentialSmoothing()],
                "agg": "mean",
            },
            {
                "forecasters": [ExponentialSmoothing(), ExponentialSmoothing()],
                "agg": "median",
            },
        ]


### Testing the model

testing is easy! Use this in a test folder seen by pytest:

In [4]:
from ensemble_forecaster import SimpleEnsembleForecaster
from sktime.utils.estimator_checks import check_estimator, parametrize_with_checks


@parametrize_with_checks(SimpleEnsembleForecaster)
def test_sktime_api_compliance(obj, test_name):
    check_estimator(obj, tests_to_run=test_name, raise_exceptions=True)

In [5]:
import pytest
pytest.main(["../example_package/tests/", "-v"])

platform darwin -- Python 3.10.14, pytest-8.3.2, pluggy-1.5.0 -- /Users/fangelim/Documents/personal_workspace/sktime-tutorial-euroscipy2024/.venv/bin/python
cachedir: .pytest_cache
rootdir: /Users/fangelim/Documents/personal_workspace/sktime-tutorial-euroscipy2024/example_package
configfile: pyproject.toml
plugins: anyio-4.4.0
[1mcollecting ... [0mcollected 55 items

../example_package/tests/test_check_estimator.py::test_sktime_api_compliance[SimpleEnsembleForecaster-test_clone] [32mPASSED[0m[32m [  1%][0m
../example_package/tests/test_check_estimator.py::test_sktime_api_compliance[SimpleEnsembleForecaster-test_constructor] [32mPASSED[0m[32m [  3%][0m
../example_package/tests/test_check_estimator.py::test_sktime_api_compliance[SimpleEnsembleForecaster-test_create_test_instance] [32mPASSED[0m[32m [  5%][0m
../example_package/tests/test_check_estimator.py::test_sktime_api_compliance[SimpleEnsembleForecaster-test_create_test_instances_and_names] [32mPASSED[0m[32m [  7%][

<ExitCode.OK: 0>

### Making estimator discoverable on `sktime` marketplace

Assumes:

* your own package release on `pypi`
* passing tests in package

via `_placeholder_record`, similar to [Prophetverse](https://github.com/sktime/sktime/blob/b070fcc1f76ba9136ceb6548e3c2bbe97dd7f37c/sktime/forecasting/prophetverse.py)

In [8]:
from sktime.utils.dependencies import _placeholder_record
from sktime.forecasting.base import BaseForecaster

@_placeholder_record("ensemble_forecaster.forecaster")  # import path to the forecaster
class SimpleEnsembleForecaster(BaseForecaster):
    """Simple ensemble forecaster.

    This forecaster fits a list of forecasters to the same training data and
    aggregates their predictions using a simple aggregation function.

    Parameters
    ----------
    forecasters : list of sktime forecasters
        List of forecasters to fit and aggregate.
    agg : str, optional (default="mean")
        Aggregation function to use. Must be one of "mean" or "median".
    """

    _tags = {
        # Model metadata
        "ignores-exogeneous-X": True,
        "requires-fh-in-fit": False,
        "handles-missing-data": False,
        "X_inner_mtype": "pd.DataFrame",
        "y_inner_mtype": "pd.Series",
        "scitype:y": "univariate",
        # Packaging info
        "authors": ["felipeangelimvieira"],
        "maintainers": ["felipeangelimvieira"],
        "python_version": None,
        "python_dependencies": "ensemble_forecaster>=0.1.0",  # PEP requirement
    }

## Recap

- New sktime's estimator overview 
- Extend BaseForecaster and implement your library
- Run api-compatibility tests in your CI
- Add model to sktime's documentation


## Credits notebook 4

- Notebook: felipeangelimvieira, partly based on pydata London 2022 (fkiraly, ltsaprounis)
- Estimator overview table: duydl
- Testing utilities: fkiraly, mloning, partly inspired by sklearn
- 2nd party integration patterns: fkiraly