Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] make TabularToSeriesAdaptor compatible with sklearn transformers that accept only y, e.g., LabelEncoder #5982

Merged
merged 2 commits into from Apr 26, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
174 changes: 105 additions & 69 deletions sktime/transformations/series/adapt.py
Expand Up @@ -21,69 +21,88 @@ class TabularToSeriesAdaptor(BaseTransformer):
do not require multiple :term:`instances <instance>` for fitting.

The adaptor behaves as follows.
If fit_in_transform = False and X is a series (pd.DataFrame, pd.Series, np.ndarray):
``fit(X)`` fits a clone of ``transformer`` to X (considered as a table)
``transform(X)`` applies transformer.transform to X and returns the result
``inverse_transform(X)`` applies transformer.inverse_transform to X
If fit_in_transform = True and X is a series (pd.DataFrame, pd.Series, np.ndarray):
``fit`` is empty
``transform(X)`` applies transformer.fit(X).transform(X) to X,
considered as a table, and returns the result
``inverse_transform(X)`` applies transformer.fit(X).inverse_transform(X) to X

If fit_in_transform = False, and X is of a panel/hierarchical type:
``fit(X)`` fits a clone of ``transformer`` for each individual series x in X
``transform(X)`` applies transform(x) of the clone belonging to x,
(where the index of x in transform equals the index of x in fit)
for each individual series x in X, and returns the result
``inverse_transform(X)`` applies transform(x) of the clone belonging to x,
(where the index of x in transform equals the index of x in fit)
for each individual series x in X, and returns the result
Note: instances indices in transform/inverse_transform
must be equal to those seen in fit
If fit_in_transform = True, and X is of a panel/hierarchical type:
``fit`` is empty
``transform(X)`` applies transformer.fit(x).transform(x)
to all individual series x in X and returns the result
``inverse_transform(X)`` applies transformer.fit(x).inverse_transform(x)
to all individual series x in X and returns the result

WARNING: if fit_in_transform is set to False,

If ``fit_in_transform = False`` and ``X`` is a series
(``pd.DataFrame``, ``pd.Series``, ``np.ndarray``):

* ``fit(X)`` fits a clone of ``transformer`` to X (considered as a table)
* ``transform(X)`` applies transformer.transform to X and returns the result
* ``inverse_transform(X)`` applies ``transformer.inverse_transform`` to ``X``

If ``fit_in_transform = True`` and ``X`` is a series
(``pd.DataFrame``, ``pd.Series``, ``np.ndarray``):

* ``fit`` is empty
* ``transform(X)`` applies ``transformer.fit(X).transform(X)`` to ``X``,
considered as a table, and returns the result
* ``inverse_transform(X)`` applies ``transformer.fit(X).inverse_transform(X)``
to ``X``

If ``fit_in_transform = False``, and ``X`` is of a panel/hierarchical type:
* ``fit(X)`` fits a clone of ``transformer`` for each individual
series ``x`` in ``X``
* ``transform(X)`` applies ``transform(x)`` of the clone belonging to ``x``
(where the index of x in transform equals the index of x in fit),
for each individual series ``x`` in ``X``, and returns the result
* ``inverse_transform(X)`` applies ``transform(x)`` of the clone belonging to
``x`` (where the index of x in transform equals the index of ``x`` in fit),
for each individual series ``x`` in ``X``, and returns the result
* Note: instances indices in ``transform/inverse_transform``
must be equal to those seen in ``fit``

If ``fit_in_transform = True``, and ``X`` is of a panel/hierarchical type:
* ``fit`` is empty
* ``transform(X)`` applies ``transformer.fit(x).transform(x)``
to all individual series ``x`` in ``X`` and returns the result
* ``inverse_transform(X)`` applies ``transformer.fit(x).inverse_transform(x)``
to all individual series ``x`` in ``X`` and returns the result

WARNING: if ``fit_in_transform`` is set to ``False``,
when applied to Panel or Hierarchical data,
the resulting transformer will identify individual series in test set
with series indices in training set, on which instances were fit
in particular, transform will not work if number of instances
and indices of instances in transform are different from those in fit
WARNING: if fit_in_transform is set to True,
and indices of instances in transform are different from those in fit

WARNING: if ``fit_in_transform`` is set to ``True``,
then each series in the test set will be transformed as batch by fit-predict,
this may cause information leakage in a forecasting setting
(but not in a time series classification/regression/clustering setting,
because in these settings the independent samples are the individual series)
(but not in a time series classification/regression/clustering setting,
because in these settings the independent samples are the individual series)

Whether ``y`` is passed to transformer methods is controlled by ``pass_y``.
If the inner transformer has non-defaulting ``y`` args, the default behaviour is
to pass ``y`` to ``fit``, ``fit_transform``, or ``transform``.
If no ``y`` arg is present, or if it has a default value, ``y`` is not passed.

If the passed transformer accepts only ``y`` in ``fit`` and ``transform``,
then ``pass_y`` is ignored, and ``X`` is plugged into the ``y`` argument.

Parameters
----------
transformer : Estimator
transformer : ``sklearn`` transformer, ``BaseEstimator`` descendant instance
scikit-learn-like transformer to fit and apply to series.
This is used as a "blueprint" and not fitted or otherwise mutated.

fit_in_transform: bool, optional, default=False
whether transformer_ should be fitted in transform (True), or in fit (False)
recommended setting in forecasting (single series or hierarchical): False
recommended setting in ts classification, regression, clustering: True
whether transformer_ should be fitted in transform (True), or in fit (False).

* recommended setting in forecasting (single series or hierarchical): ``False``
* recommended setting in ts classification, regression, clustering: ``True``

pass_y : str, optional, one of "auto" (default), "fit", "always", "never"
Whether to pass y to transformer methods of the ``transformer`` clone.
"auto": passes y to methods fit, transform, fit_transform, inverse_transform,
if and only if y is a named arg of either method without default.
Note: passes y even if it is None
"fit": passes y to method fit, but not to transform.
Note: passes y even if it is None, or if not a named arg
"always": passes y to all methods, fit, transform, inverse_transform.
Note: passes y even if it is None, or if not a named arg
"never": never passes y to any method.

* "auto": passes ``y`` to methods ``fit``, ``transform``, ``fit_transform``,
``inverse_transform``,
if and only if ``y`` is a named arg of either method without default.
Note: passes ``y`` even if it is ``None``
* "fit": passes ``y`` to method ``fit``, but not to ``transform``.
Note: passes ``y`` even if it is ``None``, or if not a named arg
* "always": passes ``y`` to all methods, ``fit``, ``transform``,
``inverse_transform``.
Note: passes ``y`` even if it is ``None``, or if not a named arg
* "never": never passes ``y`` to any method.

Attributes
----------
Expand Down Expand Up @@ -140,33 +159,50 @@ def __init__(self, transformer, fit_in_transform=False, pass_y="auto"):
if self._skip_fit:
self.set_tags(**{"fit_is_empty": True})

trafo_has_y, trafo_has_y_default = self._trafo_has_y_and_default("fit")
trafo_has_y, trafo_has_y_default = self._trafo_has_param_and_default("fit", "y")
need_y = trafo_has_y and not trafo_has_y_default
if need_y or pass_y not in ["auto", "no"]:
self.set_tags(**{"y_inner_mtype": "numpy1D"})

def _trafo_has_y_and_default(self, method="fit"):
"""Return if transformer.method has a y, and whether y has a default."""
def _trafo_has_param_and_default(self, method="fit", arg="y"):
"""Return if transformer.method has a parameter, and whether it has a default.

Parameters
----------
method : str, optional, default="fit"
method name to check
arg : str, optional, default="y"
parameter name to check

Returns
-------
has_param : bool
whether the method ``method`` has a parameter with name ``arg``
has_default : bool
whether the parameter ``arg`` of method ``method`` has a default value
"""
method_fun = getattr(self.transformer, method)
method_params = list(signature(method_fun).parameters.keys())
if "y" in method_params:
y_param = signature(self.transformer.fit).parameters["y"]
y_default = y_param.default
y_has_default = y_default is not y_param.empty
return True, y_has_default
if arg in method_params:
param = signature(self.transformer.fit).parameters[arg]
default = param.default
has_default = default is not param.empty
return True, has_default
else:
return False, False

def _get_y_args(self, y, method="fit"):
"""Get empty dict or dict with y, depending on pass_y and method.
def _get_args(self, X, y, method="fit"):
"""Get kwargs for method, depending on pass_y and method.

The return is a dict which is passed to the method of name method,
according to the pass_y setting.
The return is a dict which is passed to the method of name method.
"""
if not self._trafo_has_param_and_default(method, "X"):
return {"y": X}

pass_y = self.pass_y

if pass_y == "auto":
has_y, has_y_default = self._trafo_has_y_and_default(method)
has_y, has_y_default = self._trafo_has_param_and_default(method, "y")
need_y = has_y and not has_y_default
return_y = need_y
elif pass_y == "fit":
Expand All @@ -182,9 +218,9 @@ def _get_y_args(self, y, method="fit"):
)

if return_y:
return {"y": y}
return {"X": X, "y": y}
else:
return {}
return {"X": X}

def _fit(self, X, y=None):
"""Fit transformer to X and y.
Expand All @@ -202,10 +238,10 @@ def _fit(self, X, y=None):
-------
self: a fitted instance of the estimator
"""
y_args = self._get_y_args(y, method="fit")
fit_args = self._get_args(X, y, method="fit")

if not self._skip_fit:
self.transformer_.fit(X, **y_args)
self.transformer_.fit(**fit_args)

return self

Expand All @@ -226,13 +262,13 @@ def _transform(self, X, y=None):
Xt : 2D np.ndarray
transformed version of X
"""
y_fit_args = self._get_y_args(y, method="fit")
y_trafo_args = self._get_y_args(y, method="transform")
fit_args = self._get_args(X, y, method="fit")
trafo_args = self._get_args(X, y, method="transform")

if self._skip_fit:
Xt = self.transformer_.fit(X, **y_fit_args).transform(X, **y_trafo_args)
Xt = self.transformer_.fit(**fit_args).transform(**trafo_args)
else:
Xt = self.transformer_.transform(X)
Xt = self.transformer_.transform(**trafo_args)

# coerce sensibly to 2D np.ndarray
if isinstance(Xt, (int, float, str)):
Expand Down Expand Up @@ -261,13 +297,13 @@ def _inverse_transform(self, X, y=None):
Xt : 2D np.ndarray
inverse transformed version of X
"""
y_fit_args = self._get_y_args(y, method="fit")
y_i_args = self._get_y_args(y, method="inverse_transform")
fit_args = self._get_args(X, y, method="fit")
it_args = self._get_args(X, y, method="inverse_transform")

if self.fit_in_transform:
Xt = self.transformer_.fit(X, **y_fit_args).inverse_transform(X, **y_i_args)
Xt = self.transformer_.fit(X, **fit_args).inverse_transform(X, **it_args)
else:
Xt = self.transformer_.inverse_transform(X, **y_i_args)
Xt = self.transformer_.inverse_transform(**it_args)
return Xt

@classmethod
Expand Down