-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
API comparison: sktime vs HCrystalball
Contributors: @fkiraly, @mloning
A comparison of sktime
and HCrystalball
API designs for forecasting, and proposed way forward.
Both sktime
and HCrystalball
adopt a sklearn-like fit/predict design, and a unified interface.
The below table summarizes the main differences:
Area | sktime | HCrystalball |
---|---|---|
data container | pandas series | pandas DataFrame |
supports multivariate | no | yes |
supports exogeneous | experimental | yes |
supports iloc use | yes | no |
supports loc use | no | yes |
type consistent composition | yes | no |
task interoperability | yes | no |
For explanation:
- type consistent composition means: composites inherit from, and follow the same interface as a class type ancestor. For example,
GridSearchCV
insklearn
behaves as a classifier, when constructed with a classifier. The compositor itself is an estimator class. - task interoperability means: the interface is designed to allow reduction to other time series related tasks
- loc and iloc usage implies support for integer and date/time indices, and specification of the forecasting horizon as relative steps ahead and absolute time points respectively
On a high-level, HCrystalball
's interface seems inspired by Facebook's prophet
. sktime
's interface is closer to statsmodels
and the Hyndman interfaces in R (e.g. forecast
, fable
).
This section highlights advantages, disadvantages, and problems, according to our opinion.
- "natural" interface in univariate case
- higher-order operations, including composition and reduction, are well-handled
- lack of loc support
- no good multivariate support
- support for multivariate and exogeneous
- uses abc
- higher-order operations are not well-designed or consistent
- lack of iloc support
- interface is unintuitive in the univariate case
- does not consistently cover both univariate, multivariate use well - user frustration in at least one sub-case
- user cannot use series and DataFrame
- no support for both iloc and loc (indexed, e.g., datetime) indexing
Up to naming of variables, both sktime and HCrystalball adopt a fit/predict API, of the type
fit(y_past, [x_past], horizon)
predict([x_future], horizon)
where:
-
y_past
is the time series in the past, -
horizon
is the indices (loc or iloc) to predict at - note that some methods already require this infit
-
x_past
is exogeneous time series in the past -
x_future
is exogeneous time series in the future
The differences are mainly in expected type:
variable | sktime | HCrystalball |
---|---|---|
y_past |
pandas series | pandas DataFrame |
horizon in fit
|
integer sequence | not supported (instead fitting is moved to predict in cases where horizon is required for fitting) |
horizon in predict
|
integer sequence | empty DataFrame with loc indices |
x_past |
pandas DataFrame (experimental) | pandas DataFrame |
x_future |
pandas DataFrame (experimental) | pandas DataFrame |
The interface differences suggest:
- different signature and type choices cover different use cases well (e.g., univariate vs multivariate) - a joint/merged interface may therefore be desirable.
- the interfaces are currently incompatible, while compatibility will require support for both series and DataFrames, and support for both
loc
andiloc
indexing. - the
sktime
interface has an advantage in composition and other higher-order operations. A joint interface should perhaps adopt this.
More precisely, a "good" consensus interface should satisfy the following requirements:
- support for both series and DataFrames as inputs/outputs
- support for both
loc
andiloc
indexing - support for exogeneous variables
-
horizon
can be passed infit
- consistent typing in higher-order motifs including composition, wrappers, reduction (inherits from resultant type class, components passed in constructor)
We therefore suggest:
-
sktime
andHCrystalball
work together towards a unified forecasting interface in the next release. - This unified interface should satisfy the requirements outlined above
-
HCrystalball
becomes anaffiliated
package ofsktime
(means: compatible interface) - displayed on the landing page with other affiliated and coordinated packages -
HCrystalball
specifies a scope and roadmaps, e.g., adapters to advanced forecasters with major package dependencies? - individual
HCrystalball
team members are acknowledged as contributors tosktime
, insofar they ontribute to the re-factor - optionally, Heidelberg Cement is acknowledged as a contributing organisation to
sktime
post-refactor, pending approval of Heidelberg Cement comms
The proposed re-design is based on two work items:
-
HCrystalball
adaptssktime
's higher-order composition/reduction interface (correct class inheritance structure) - re-factor of
fit
/predict
signatures towards a consensus, which is type union based
The consensus could be as follows:
variable | consensus type |
---|---|
y_past |
pandas series or DataFrame
|
return of predict
|
same as type of y_past
|
horizon |
integer sequence (iloc ) or sequence of loc indices or empty DataFrame with loc indices |
x_past |
pandas series or DataFrame
|
x_future |
pandas series or DataFrame , needs same type and variables as x_past
|
There may be an additional flag for whether loc
or iloc
indices are used.
The low-level design could look similar to this, though the linked proposal is mainly concerned with support or datetime
.