![](../images/team.jpg)

![](../images/talk_agenda-01.png)

# Introduction to `skbase`

Contents of this tutorial:

1. introduction to the unified `sklearn` / `sktime`-like interface supported by `skbase`
2. `skbase` usage patterns
3. package building with `skbase`

## 1 - Introducing the ``sklearn`` / `sktime` interface

- it is recommended you have worked through either an ``sklearn`` or ``sktime`` tutorial
- for ``sktime``, check out a previous [pydata tutorial](https://www.youtube.com/watch?v=ODspi8-uWgo) of ours, and of course visit [our website](https://www.sktime.net/en/latest/index.html)! 
- ``skbase`` is currently maintained by the ``sktime`` project.
  - We *love* new contributors. Even if you are new to open source software developement!
  - Check out the ``sktime`` [new contributors guide](https://www.sktime.net/en/latest/get_involved/contributing.html).


### 1.1 ``skbase``, ``sklearn``, ``sktime`` in a nutshell

- `skbase` is a workbench package for developers for creation of "`sklearn`-likes"
  - reusable base class factory with `get_params`, config, nested composition interface, etc
  - templated base classes compatible with `sklearn` / `sktime`
  - lookup and search utilities
  - factory templates for test frameworks 

- `sklearn` / `sktime` interface:
  - unified interface for objects/estimators
  - modular design, strategy pattern
  - composable, composites are interface homogenous
  - simple specification language and parameter interface
  - visually informative pretty printing

- `sktime` base class design is an evolution on `sklearn`:
  - separation of `BaseObject` (non-fittable) and `BaseEstimator` (fittable)
  - `get_fitted_params` interface for fittable objects, similar to `get_params`
  - unified tag and config manager, dynamic tags
  - improved state handling - `clone`, `reset`
  - test case generation, e.g., `create_test_instances_and_names`
  - test framework with scenario and conditional fixture handling

### 1.2 sklearn unified interface - the strategy pattern

`sklearn` provides a unified interface to multiple learning tasks including classification, regression.

any (supervised) estimator has the following interface points

1. **Instantiate** your model of choice, with parameter settings
2. **Fit** the instance of your model
3. Use that fitted instance to **predict** new data!

![](./img/estimator-conceptual-model.jpg)

In [None]:
# get data to use the model on
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

X, y = load_iris(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [None]:
from sklearn.svm import SVC

# 1. Instantiate SVC with parameters gamma, C
clf = SVC(gamma=0.001, C=100.)

# 2. Fit clf to training data
clf.fit(X_train, y_train)

# 3. Predict labels on test data
y_test_pred = clf.predict(X_test)

y_test_pred

IMPORTANT: to use another classifier, only the specification line, part 1 changes!

`SVC` could have been `RandomForest`, steps 2 and 3 remain the same - unified interface:

In [None]:
from sklearn.ensemble import RandomForestClassifier

# 1. Instantiate SVC with parameters gamma, C
clf = RandomForestClassifier(n_estimators=100)

# 2. Fit clf to training data
clf.fit(X_train, y_train)

# 3. Predict labels on test data
y_test_pred = clf.predict(X_test)

y_test_pred

in object oriented design terminology, this is called **"strategy pattern"**

= different estimators can be switched out without change to the interface

= like a power plug adapter, it's plug&play if it conforms with the interface

Pictorial summary:
![](./img/sklearn-unified-interface.jpg)

parameters can be accessed and set via `get_params`, `set_params`:

In [None]:
clf.get_params()

In [None]:
clf.set_params(n_estimators=42)
clf

fitted parameters end in an underscore:

In [None]:
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression()
clf.fit(X_train, y_train)
clf.coef_

### 1.3 sklearn - composition patterns

`sklearn`'s unified interface extends to composition such as:

* tuning such as grid search
* ensembling such as bagging
* pipelining such as chaining pre-processing with a classifier

in that the pipeline also adheres to the unified interface!

This makes `sklearn` particularly powerful as a specification language,\
as compositors can be combined in any number of ways.

example compositions - tuning or ensembling:

![](./img/sklearn-composition-interface.png)

example - classification pipeline. Has the same interface!

In [None]:
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

# 1. Instantiate the estimator
pipe = make_pipeline(StandardScaler(), SVC(gamma=0.01))

# 2. Fit clf to training data
pipe.fit(X_train, y_train)

# 3. Predict labels on test data
y_test_pred = pipe.predict(X_test)

y_test_pred


In [None]:
# nice pretty printing
# that allows to read the specificiation easily
pipe

parameters of the composite are addressed by

`[componentname]__[paramname]` (separated by double-underscore)

this can be nested indefinitely, in multiply nested compositions!

In [None]:
pipe.get_params()

### 1.4 `skbase` / `sktime` is an evolution upon the `sklearn` base interface

`sktime` - and `skbase` which follows `sktime` - evolve the `sklearn` base interface in a number of ways, including:

- `get_fitted_params` interface for fittable objects, similar to `get_params`
- unified tag and config manager, dynamic tags
- improved state handling - `clone`, `reset`

`sktime` follows general `sklearn` interface patterns:

In [None]:
from sktime.datasets import load_airline
from sktime.forecasting.arima import ARIMA

y = load_airline()

fcst = ARIMA()

fcst.fit(y, fh=[1, 2, 3])

fcst.predict()

### 1.4.1 unified tag and config system

each `skbase` / `sktime` estimator has tags and configs.

* tags are "properties" of the estimator, for developers and for search
* configs are "instructions" to the estimators, for users to set

#### `skbase` estimator tag system

In [None]:
fcst.get_tags()

tags in `sktime` (and `skbase` templated packages) are listed and explained in the tag registry:

In [None]:
from sktime.registry import all_tags

all_tags("forecaster", as_dataframe=True)

this can be used to search, e.g., for forecasters that can produce prediction intervals

In [None]:
from sktime.registry import all_estimators

all_estimators(
    "forecaster", filter_tags={"capability:pred_int": True}, as_dataframe=True
)

#### `skbase` estimator config system

In [None]:
fcst.get_config()

using config to change display mode:

In [None]:
fcst

In [None]:
fcst.set_config(display="test")
fcst

### 1.4.2 `get_fitted_params` - unified access to fitted parameters

every fittable `skbase` / `sktime` estimator has a unified `get_fitted_params` interface point

this retrieves fitted parameters as a `str` keyed `dict`,

in complete analogy to `sklearn`'s `get_params`

the default retrieves attributes `[attrname]_` at the key `"[attrname]"`

In [None]:
fcst.get_fitted_params()

fitted params also works with nested composites, e.g., pipelines

behaviour is like `get_params` from `sklearn`

In [None]:
from sktime.datasets import load_airline
from sktime.forecasting.arima import ARIMA
from sktime.transformations.series.detrend import Deseasonalizer
from sktime.pipeline import make_pipeline

y = load_airline()

fcst = make_pipeline(Deseasonalizer(sp=12), ARIMA())

fcst.fit(y, fh=[1, 2, 3])

fcst.predict()

In [None]:
fcst.get_fitted_params()

### 1.4.4 state handling via `clone`, `reset`

`skbase` / `sktime` estimators have `clone` and `reset` interface points:

* `clone` creates a blank, newly constructed copy of any object
* `reset` resets object to state after construction

both return an object with the same content!

* `clone` is a copy, does not mutate
* `reset` is identical, and mutates

The equality dunder in `skbase` / `sktime` compares the *specification* for equality (not python object identity).

In [None]:
fcst = ARIMA(order=(1, 1, 0))
fcst.fit(y, fh=[1, 2, 3])
fcst.is_fitted

#### `clone` - create true copy of the specification

In [None]:
fcst_clone = fcst.clone()
fcst_clone.is_fitted

In [None]:
fcst_clone is fcst

In [None]:
fcst_clone == fcst

why is `clone` useful, as a method?

* allows to handle case specific logic in estimators and intermediate base classes
* no extensive coupling with a "loose method" `clone`
* design consistent with `reset`

#### `reset` - reset an estimator as if freshly constructed

In [None]:
fcst_reset = fcst.reset()
fcst_reset.is_fitted

In [None]:
fcst_reset is fcst

In [None]:
# this is of course also equal
fcst_reset == fcst

why is `reset` useful?

* internally, it actually runs `__init__`!
* it is called in `set_params`
* it can be called at the start of `fit` (and is in `sktime`)

so, preparation and parameter checking logic can happen in `__init__` (unlike in `sklearn`)

### 1.4.5 integrated test case generation via `get_test_params`, `create_test_instances_and_names`

`skbase` / `sktime` estimators have test instance generation points:

* `get_test_params` which returns a list of param dict that can be passed to constructor or `set_params`
* `create_test_instances_and_names` which returns the instances as name-esetimator tuples

This can be used with the `skbase` test framework for systematic testing in type specific scenarios.

### 1.5 `sktime` evolves the `sklearn` extension interface for power users!

`sklearn` has pioneered the easily extensible estimator interface:

it is easy to write your own estimators and maintain them in third party code bases!

`sktime` expands on this by introducing the **template pattern** on the extender interface side:

* outer/inner methods, e.g., `fit`/`_fit`, with opportunity for boilerplate on `fit`, e.g., input checks, estimator `reset`
* tags that control boilerplate and functionality without need to write it, e.g., preferred data type, check logic

`skbase` does not *require* but *facilitates* these design patterns (combined strategy & template), more in notebook 3.

### 1.6 Summary/What is next!

- `sklearn` has a seminal interface design: unified interface (strategy pattern), modular, composition stable, easy specification language
- `sktime` evolves and consolidates the `sklearn` API: parameter, tag, config, state handling, advanced extender interface support (template pattern)
- `skbase` is a convenient developer workbench to construct packages that are conformant and compatible with the above!
- next: core usage patterns of `skbase` & playing with `skbase` `BaseObject`, `BaseEstimator`
- then: recipes for building packages with `skbase`, with examples

---
### Credits: notebook 1 - Sktime intro, toolbox features, Forecasting

notebook creation: fkiraly

some vignettes based on existing `sktime` tutorials, credit: fkiraly, miraep8, mloning and danbartl

slides (png/jpg): from fkiraly's postgraduate course at UCL, Principles and Patterns in Data Scientific Software Engineering

General credit also to `sklearn` and `sktime` contributors