Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Multioutput capability for all time series classifiers and regressors, broadcasting and tag #5408

Merged
merged 76 commits into from Dec 25, 2023
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
0445158
changes to resolve #5349
Vasudeva-bit Oct 12, 2023
1ddff52
draft PR for broadcast support #5182
Vasudeva-bit Oct 12, 2023
b34c92d
fix: incorrect function call
Vasudeva-bit Oct 12, 2023
8133f41
None and 'None' tests
Vasudeva-bit Oct 14, 2023
b811b05
Merge branch 'main' of https://github.com/Vasudeva-bit/sktime into br…
Vasudeva-bit Oct 16, 2023
7dd14b7
temp changes
Vasudeva-bit Oct 29, 2023
5caa291
pull compatability
Vasudeva-bit Nov 3, 2023
6c811db
Merge branch 'main' of https://github.com/Vasudeva-bit/sktime into br…
Vasudeva-bit Nov 3, 2023
d7451a7
extending classifiers for multitarget
Vasudeva-bit Nov 3, 2023
4b67a2b
update pull
Vasudeva-bit Dec 1, 2023
0003b79
Merge branch 'proba_test' of https://github.com/Vasudeva-bit/sktime i…
Vasudeva-bit Dec 1, 2023
2c7fe73
addressed test fails
Vasudeva-bit Dec 1, 2023
9f29790
delete irrelevant code
Vasudeva-bit Dec 1, 2023
9e73626
code refactor
Vasudeva-bit Dec 1, 2023
28d6bfc
merge instance methods with original methods
Vasudeva-bit Dec 2, 2023
ae5b282
resolve merge conflicts
Vasudeva-bit Dec 3, 2023
edfd33b
Merge branch 'main' into broadcast
Vasudeva-bit Dec 3, 2023
c7071ac
fixing tests
Vasudeva-bit Dec 3, 2023
dae15b0
Merge branch 'broadcast' of https://github.com/Vasudeva-bit/sktime in…
Vasudeva-bit Dec 3, 2023
3c39b19
test_base_classifier_fit: update X, y checks
Vasudeva-bit Dec 3, 2023
e1a7a4a
test y multioutput
Vasudeva-bit Dec 3, 2023
b464528
minor test fixes
Vasudeva-bit Dec 4, 2023
46f8eba
broadcast regression base
Vasudeva-bit Dec 5, 2023
a0cae34
minor fixes related y's dim
Vasudeva-bit Dec 6, 2023
a99b7be
spelling
Vasudeva-bit Dec 7, 2023
7086e6b
spelling error
Vasudeva-bit Dec 7, 2023
057f1c9
fix codec error
Vasudeva-bit Dec 11, 2023
2692de1
fix error
Vasudeva-bit Dec 11, 2023
bb508f9
docstrings
fkiraly Dec 22, 2023
748dcd0
docstring
fkiraly Dec 22, 2023
5bcb7b7
minimize change in predict methods
fkiraly Dec 22, 2023
8859088
Merge branch 'main' into pr/5408
fkiraly Dec 22, 2023
51ba9e4
[AUTOMATED] update CONTRIBUTORS.md
fkiraly Dec 22, 2023
6990bd8
lint
fkiraly Dec 22, 2023
c953ab8
move fit_predict_proba back to reduce diff
fkiraly Dec 22, 2023
9b5dbe5
reduce _fit footprint, correct timer
fkiraly Dec 22, 2023
bd77e70
Merge branch 'broadcast' of https://github.com/Vasudeva-bit/sktime in…
fkiraly Dec 22, 2023
007571c
docstrig
fkiraly Dec 22, 2023
8dff89b
programmatic conversion
fkiraly Dec 22, 2023
8c648d2
rename classifiers attrib
fkiraly Dec 22, 2023
1f8179f
move validation after error
fkiraly Dec 22, 2023
fc77b42
allow object dtype
fkiraly Dec 22, 2023
1ff077e
remove set init
fkiraly Dec 22, 2023
38c55d9
Update test_base.py
fkiraly Dec 22, 2023
7bf66b9
simplify
fkiraly Dec 22, 2023
4cc2b52
add missing multioutput tag in early classif
fkiraly Dec 22, 2023
c80f1be
Update base.py
fkiraly Dec 22, 2023
5bfb940
boss update logic
fkiraly Dec 22, 2023
8979fff
forgot trafo data
fkiraly Dec 22, 2023
2efca8c
Update base.py
fkiraly Dec 22, 2023
0085c6d
revert boss
fkiraly Dec 23, 2023
0a594af
linting
fkiraly Dec 23, 2023
ee1c1f5
base early classifier _convert_X
fkiraly Dec 23, 2023
6ccc2a5
legacy consistency
fkiraly Dec 24, 2023
a483bcd
Merge branch 'main' into pr/5408
fkiraly Dec 24, 2023
4151a38
Merge branch 'main' into pr/5408
fkiraly Dec 24, 2023
1837578
Merge branch 'main' into pr/5408
fkiraly Dec 25, 2023
38ab9bc
rename y mtype var
fkiraly Dec 25, 2023
5adce3f
tests, bugfixes
fkiraly Dec 25, 2023
4967c2d
tests
fkiraly Dec 25, 2023
4a4fefa
Update _boss.py
fkiraly Dec 25, 2023
cef643e
fix wrong conversion
fkiraly Dec 25, 2023
c5bceb2
revert erroneous CNN tag
fkiraly Dec 25, 2023
84c7804
revert test changes
fkiraly Dec 25, 2023
62bd29f
revert test changes
fkiraly Dec 25, 2023
a6840c7
minimize footprint in regressor
fkiraly Dec 25, 2023
64a0e33
conversions
fkiraly Dec 25, 2023
4a55f4c
test
fkiraly Dec 25, 2023
1c7c4d8
linting
fkiraly Dec 25, 2023
53036d2
linting
fkiraly Dec 25, 2023
b8af2d4
Update base.py
fkiraly Dec 25, 2023
b44bac3
Update base.py
fkiraly Dec 25, 2023
d4ac28c
classifier docstrings
fkiraly Dec 25, 2023
702dea3
docstrings regressors
fkiraly Dec 25, 2023
60fa92a
extender docstrings, extension template
fkiraly Dec 25, 2023
654fd4c
linting
fkiraly Dec 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
308 changes: 292 additions & 16 deletions sktime/classification/base.py
fkiraly marked this conversation as resolved.
Show resolved Hide resolved
Expand Up @@ -30,6 +30,7 @@ class name: BaseClassifier

from sktime.base import BaseEstimator
from sktime.datatypes import check_is_scitype, convert_to
from sktime.datatypes._vectorize import VectorizedDF
from sktime.utils.sklearn import is_sklearn_transformer
from sktime.utils.validation import check_n_jobs
from sktime.utils.validation._dependencies import _check_estimator_deps
Expand All @@ -56,6 +57,7 @@ class BaseClassifier(BaseEstimator, ABC):
"object_type": "classifier", # type of object
"X_inner_mtype": "numpy3D", # which type do _fit/_predict, support for X?
# it should be either "numpy3D" or "nested_univ" (nested pd.DataFrame)
"capability:multioutput": False, # whether classifier supports multioutput
"capability:multivariate": False,
"capability:unequal_length": False,
"capability:missing_values": False,
Expand Down Expand Up @@ -88,6 +90,8 @@ def __init__(self):
# required for compatibility with some sklearn interfaces
# i.e. CalibratedClassifierCV
self._estimator_type = "classifier"
self.classifiers_ = None
self._is_vectorized = False

super().__init__()
_check_estimator_deps(self)
Expand Down Expand Up @@ -131,7 +135,283 @@ def __rmul__(self, other):
return NotImplemented

def fit(self, X, y):
"""Fit time series classifier to training data.
"""
Fit time series classifier to training data.

Parameters
----------
X : 3D np.array (any number of dimensions, equal length series)
of shape [n_instances, n_dimensions, series_length]
or 2D np.array (univariate, equal length series)
of shape [n_instances, series_length]
or pd.DataFrame with each column a dimension, each cell a pd.Series
(any number of dimensions, equal or unequal length series)
or of any other supported Panel mtype
for list of mtypes, see datatypes.SCITYPE_REGISTER
for specifications, see examples/AA_datatypes_and_datasets.ipynb
y : 2D np.array of int, of shape [n_instances, n_dimensions] - class labels for
fitting indices correspond to instance indices in X
or 1D np.array of int, of shape [n_instances] - class labels for fitting
indices correspond to instance indices in X

Returns
-------
self : Reference to self.
"""
# check and convert X/y
y = self._check_y(y)

self._is_vectorized = isinstance(y, VectorizedDF)
# we call the ordinary _fit if no looping/vectorization needed
if not self._is_vectorized or self.get_tag("capability:multioutput"):
self._fit_instance(X=X, y=y)
else:
# otherwise we call the vectorized version of fit
self._vectorize("fit", X=X, y=y)

# this should happen last: fitted state is set to True
self._is_fitted = True

return self

def predict(self, X):
"""Predicts labels for sequences in X.

Parameters
----------
X : 3D np.array (any number of dimensions, equal length series)
of shape [n_instances, n_dimensions, series_length]
or 2D np.array (univariate, equal length series)
of shape [n_instances, series_length]
or pd.DataFrame with each column a dimension, each cell a pd.Series
(any number of dimensions, equal or unequal length series)
or of any other supported Panel mtype
for list of mtypes, see datatypes.SCITYPE_REGISTER
for specifications, see examples/AA_datatypes_and_datasets.ipynb

Returns
-------
pred : 1D np.array of int, of shape [n_instances] - predicted class labels
indices correspond to instance indices in X
or pd.DataFrame with each column a dimension/target, containing class
labels for each instance in X
"""
if not self._is_vectorized or self.get_tag("capability:multioutput"):
pred = self._predict_instance(X=X)
else:
# otherwise we call the vectorized version
pred = self._vectorize("predict", X=X)

return pred

def predict_proba(self, X):
"""Predicts labels probabilities for sequences in X.

Parameters
----------
X : 3D np.array (any number of dimensions, equal length series)
of shape [n_instances, n_dimensions, series_length]
or 2D np.array (univariate, equal length series)
of shape [n_instances, series_length]
or pd.DataFrame with each column a dimension, each cell a pd.Series
(any number of dimensions, equal or unequal length series)
or of any other supported Panel mtype
for list of mtypes, see datatypes.SCITYPE_REGISTER
for specifications, see examples/AA_datatypes_and_datasets.ipynb

Returns
-------
pred : 1D np.array of int, of shape [n_instances] - predicted predicted class
probabilities correspond to instance indices in X
or pd.DataFrame with each column a dimension/target, containing predicted
class probabilities for each instance in X
"""
# we call the ordinary method if no looping/vectorization needed
if not self._is_vectorized or self.get_tag("capability:multioutput"):
pred_dist = self._predict_proba_instance(X=X)
else:
# otherwise we call the vectorized version
pred_dist = self._vectorize("predict_proba", X=X)

return pred_dist

def fit_predict(self, X, y, cv=None, change_state=True):
"""Fit and predict labels for sequences in X.

Convenience method to produce in-sample predictions and
cross-validated out-of-sample predictions.

Writes to self, if change_state=True:
Sets self.is_fitted to True.
Sets fitted model attributes ending in "_".

Does not update state if change_state=False.

Parameters
----------
X : 3D np.array (any number of dimensions, equal length series)
of shape [n_instances, n_dimensions, series_length]
or 2D np.array (univariate, equal length series)
of shape [n_instances, series_length]
or pd.DataFrame with each column a dimension, each cell a pd.Series
(any number of dimensions, equal or unequal length series)
or of any other supported Panel mtype
for list of mtypes, see datatypes.SCITYPE_REGISTER
for specifications, see examples/AA_datatypes_and_datasets.ipynb
y : 2D np.array of int, of shape [n_instances, n_dimensions] - class labels for
fitting indices correspond to instance indices in X
or 1D np.array of int, of shape [n_instances] - class labels for fitting
indices correspond to instance indices in X
cv : None, int, or sklearn cross-validation object, optional, default=None
None : predictions are in-sample, equivalent to fit(X, y).predict(X)
cv : predictions are equivalent to fit(X_train, y_train).predict(X_test)
where multiple X_train, y_train, X_test are obtained from cv folds
returned y is union over all test fold predictions
cv test folds must be non-intersecting
int : equivalent to cv=KFold(cv, shuffle=True, random_state=x),
i.e., k-fold cross-validation predictions out-of-sample
random_state x is taken from self if exists, otherwise x=None
change_state : bool, optional (default=True)
if False, will not change the state of the classifier,
i.e., fit/predict sequence is run with a copy, self does not change
if True, will fit self to the full X and y,
end state will be equivalent to running fit(X, y)

Returns
-------
pred : 1D np.array of int, of shape [n_instances] - predicted class labels
indices correspond to instance indices in X
or pd.DataFrame with each column a dimension/target, containing class
labels for each instance in X
"""
# check and convert X/y
y = self._check_y(y)

self._is_vectorized = isinstance(y, VectorizedDF)
# we call the ordinary method if no looping/vectorization needed
if not self._is_vectorized or self.get_tag("capability:multioutput"):
pred = self._fit_predict_instance(X=X, y=y)
else:
# otherwise we call the vectorized version
pred = self._vectorize("fit_predict", X=X, y=y)

return pred

def fit_predict_proba(self, X, y, cv=None, change_state=True):
"""Fit and predict labels probabilities for sequences in X.

Convenience method to produce in-sample predictions and
cross-validated out-of-sample predictions.

Parameters
----------
X : 3D np.array (any number of dimensions, equal length series)
of shape [n_instances, n_dimensions, series_length]
or 2D np.array (univariate, equal length series)
of shape [n_instances, series_length]
or pd.DataFrame with each column a dimension, each cell a pd.Series
(any number of dimensions, equal or unequal length series)
or of any other supported Panel mtype
for list of mtypes, see datatypes.SCITYPE_REGISTER
for specifications, see examples/AA_datatypes_and_datasets.ipynb
y : 2D np.array of int, of shape [n_instances, n_dimensions] - class labels for
fitting indices correspond to instance indices in X
or 1D np.array of int, of shape [n_instances] - class labels for fitting
indices correspond to instance indices in X
cv : None, int, or sklearn cross-validation object, optional, default=None
None : predictions are in-sample, equivalent to fit(X, y).predict(X)
cv : predictions are equivalent to fit(X_train, y_train).predict(X_test)
where multiple X_train, y_train, X_test are obtained from cv folds
returned y is union over all test fold predictions
cv test folds must be non-intersecting
int : equivalent to cv=Kfold(int), i.e., k-fold cross-validation predictions
change_state : bool, optional (default=True)
if False, will not change the state of the classifier,
i.e., fit/predict sequence is run with a copy, self does not change
if True, will fit self to the full X and y,
end state will be equivalent to running fit(X, y)

Returns
-------
pred : 1D np.array of int, of shape [n_instances] - predicted predicted class
probabilities correspond to instance indices in X
or pd.DataFrame with each column a dimension/target, containing predicted
class probabilities for each instance in X
"""
# check and convert X/y
y = self._check_y(y)

self._is_vectorized = isinstance(y, VectorizedDF)
# we call the ordinary method if no looping/vectorization needed
if not self._is_vectorized or self.get_tag("capability:multioutput"):
pred_dist = self._fit_predict_proba_instance(X=X, y=y)
else:
# otherwise we call the vectorized version
pred_dist = self._vectorize("fit_predict_proba", X=X, y=y)

return pred_dist

def _vectorize(self, methodname, **kwargs):
"""Vectorized/iterated loop over method of BaseClassifier.

Uses classifiers_ attribute to store one classifier per loop index.
"""
y = kwargs.get("y")
if y is not None:
self._y_vec = y
classifiers_ = self._y_vec.vectorize_est(
self,
method="clone",
)
if methodname == "fit":
self.classifiers_ = self._y_vec.vectorize_est(
classifiers_,
method=methodname,
args={"y": kwargs.get("y")} if kwargs.get("y") else {},
X=kwargs.get("X"),
)
return self
else:
if self.classifiers_ is not None:
classifiers_ = self.classifiers_
y_pred = self._y_vec.vectorize_est(
classifiers_,
method=methodname,
# return_type="list",
args={"y": y} if y is not None else {},
X=kwargs.get("X"),
)
y_pred = pd.DataFrame(
{str(i): y_pred[col].values[0] for i, col in enumerate(y_pred.columns)}
)
return y_pred

def _check_y(self, y=None):
"""Check and coerce X/y for fit/transform functions.

Parameters
----------
y : pd.DataFrame, pd.Series or np.ndarray

Returns
-------
y : object of sktime compatible time series type
can be Series, Panel, Hierarchical
"""
if isinstance(y, pd.DataFrame) and len(y.columns) == 1:
return y
if isinstance(y, pd.DataFrame):
y = VectorizedDF([y], iterate_cols=True)
self._is_vectorized = True
return y
if y.ndim == 1:
return y
y = VectorizedDF(np.array([y.T]), iterate_cols=True)
self._is_vectorized = True
return y

def _fit_instance(self, X, y):
"""Fit time series classifier to training data (single instance).

Parameters
----------
Expand Down Expand Up @@ -207,8 +487,8 @@ def fit(self, X, y):
self._is_fitted = True
return self

def predict(self, X) -> np.ndarray:
"""Predicts labels for sequences in X.
def _predict_instance(self, X) -> np.ndarray:
"""Predicts labels for sequences in X (single instance).

Parameters
----------
Expand Down Expand Up @@ -239,8 +519,8 @@ def predict(self, X) -> np.ndarray:
# call internal _predict_proba
return self._predict(X)

def predict_proba(self, X) -> np.ndarray:
"""Predicts labels probabilities for sequences in X.
def _predict_proba_instance(self, X) -> np.ndarray:
"""Predicts labels probabilities for sequences in X (single instance).

Parameters
----------
Expand Down Expand Up @@ -273,8 +553,8 @@ def predict_proba(self, X) -> np.ndarray:
# call internal _predict_proba
return self._predict_proba(X)

def fit_predict(self, X, y, cv=None, change_state=True) -> np.ndarray:
"""Fit and predict labels for sequences in X.
def _fit_predict_instance(self, X, y, cv=None, change_state=True) -> np.ndarray:
"""Fit and predict labels for sequences in X (single instance).

Convenience method to produce in-sample predictions and
cross-validated out-of-sample predictions.
Expand Down Expand Up @@ -395,8 +675,10 @@ def _fit_predict_boilerplate(self, X, y, cv, change_state, method):

return y_pred

def fit_predict_proba(self, X, y, cv=None, change_state=True) -> np.ndarray:
fkiraly marked this conversation as resolved.
Show resolved Hide resolved
"""Fit and predict labels probabilities for sequences in X.
def _fit_predict_proba_instance(
self, X, y, cv=None, change_state=True
) -> np.ndarray:
"""Fit and predict labels probabilities for sequences in X (single instance).

Convenience method to produce in-sample predictions and
cross-validated out-of-sample predictions.
Expand Down Expand Up @@ -734,7 +1016,7 @@ def _check_classifier_input(
# Check y if passed
if y is not None:
# Check y valid input
if not isinstance(y, (pd.Series, np.ndarray)):
if not isinstance(y, (pd.Series, pd.DataFrame, np.ndarray)):
raise ValueError(
f"y must be a np.array or a pd.Series, but found type: {type(y)}"
)
Expand All @@ -745,12 +1027,6 @@ def _check_classifier_input(
f"Mismatch in number of cases. Number in X = {n_cases} nos in y = "
f"{n_labels}"
)
if isinstance(y, np.ndarray):
if y.ndim > 1:
fkiraly marked this conversation as resolved.
Show resolved Hide resolved
raise ValueError(
f"np.ndarray y must be 1-dimensional, "
f"but found {y.ndim} dimensions"
)
# warn if only a single class label is seen
# this should not raise exception since this can occur by train subsampling
if len(np.unique(y)) == 1:
Expand Down
5 changes: 4 additions & 1 deletion sktime/classification/deep_learning/cnn.py
Expand Up @@ -67,7 +67,10 @@ class CNNClassifier(BaseDeepClassifier):
CNNClassifier(...)
"""

_tags = {"python_dependencies": "tensorflow"}
_tags = {
"python_dependencies": "tensorflow",
"capability:multioutput": True,
}

def __init__(
self,
Expand Down