Skip to content

Commit

Permalink
[ENH] New transformation based pipeline classifiers (#1721)
Browse files Browse the repository at this point in the history
* typo

* fitted logic in panel case

* boxcox refactored as example

* linting

* linting

* added new tags

* added checks for object dtype in Series types

* fixing if/else logic path 2 and 3

* added nested_univ to df-list conversion

* and back

* changed fit-in-transform default to True

* linting

* linting

* improved informativity of estimator tags error message

* added more tags

* fixed missing tag in boxcox refactor

* random interval transformer

* updated fit check with check_is_fitted

* removed superfluous comments

* remove return_pred_int errors

* corrected data frame check

* added inner types for panel trafos

* output conversion for Series/Primitives

* signature fixes

* linting signatures

* linting

* linting signatures

* stc transform saving

* signatures update

* typo

* coerce to pandas

* japanese vowels test

* Fixing error on parameters for experiment

* relaxing numpy series mtype

* new rotation forest

* remove useless atts

* rotf contracting, train estimate and comments

* doc fix

* rotf example fix

* stc comments p1

* include rotation forest

* hc refactor p1

* hc1 test fix

* stc rotf fix

* cboss train probs

* use transformed data in stc

* hc1 test outline

* config change

* hc2.0

* rotf seed fix

* st

* st fix

* stc and st tests and finishing touches

* remove mstc dev

* more st fixing

* import code quality

* rotf code quality

* hc1 test code quality

* hc2 tests and contracting

* hc2 stc default

* hc2 stc default (fix)

* st rename and test fixes

* hc2 train estimate fix

* stc test fixes and codeowners

* rotf fix

* remove identical shapelets at end and transform n_jobs

* rounding ST ig to prevent differences between OS

* rotf test fix

* st and stc docs

* hc1 and hc2 test fixes

* hc1 and hc2 examples fix

* higher max shapelets for tests

* more mv cases in tests

* hc test parameters

* hc test parameters 2

* hc test parameters 3

* hc test parameters 4

* wrong probas

* hopefully done

* dumb linalg packages ruin my week by causing differences between os

* experiments update and a couple of getattr bugfixes

* contrib experiments update

* set classifier hc2

* base

* enforce univariate in base class

* remove unnecessary classifier tags

* predict and predict_proba

* tweaks to classifier base class

* formatting 1

* formatting 3

* formatting 4

* formatting 6?

* blank lines or no blank lines?

* remove unnecessary argument to get_tag

* negate tag correctly, remove unnecessary get_tag argument

* correct tag negation

* _predict _predict_proba

_predict made abstract, _
predict_proba given a default implmentation
n_classes_ added to base class

* formatting 1

* formatting 2

* renamed fitted_trafos to transformers

* deprecation warning category, bump version number

* linting

* linting

* linting

* linting

* added defensive assert

* added error message to defensive assertion

* added transform-input tag in docstring

* wrong tag used, now "univariate-only"

* corrected reference in ForecastingPipeline

* clarified mtype in docstring

* linting

* fixing transformer tests to correctly refer to create_test_instance

* Revert "fixing transformer tests to correctly refer to create_test_instance"

This reverts commit ad6877f.

* transformer extension template

* extension template docstrings

* HC comments an experiments fixes

* added link to transformer extension template in README

* remove deprecated versions and more feature based updates

* test config

* feature based test tolerance

* clearer mtype/scitype tracking logic

* minor update to extension template

* refactored input checks

* bug in check_is function fixed

* linting

* linting

* removing mtypes with unsupported checks

* fixed typo in signature method

* signature _fit should ignore y

* added missing docstring in _series_as_panel/_convert

* fixed docstring

* corrected docstring

* added valueerror message and capture for dim 1 np.ndarray in series/panel conversions

* scitypes condition

* linting

* changed variable names in fit

* bugfix in output conversion in transform

* added comments suggested by Lovkush

* remove new transform classifiers

* signature and mp tests

* signature example

* reintroduce new transformers/transform classifiers

* only mv summarycls is broken now

* replaced fit-copy with fit-reference

* Update extension_templates/transformer.py

Co-authored-by: Lovkush <lovkush@gmail.com>

* clarify special case

* clarified

* clarification on output type

* fixed test reference

* replaced transform(Z) reference by X reference in test_date

* moved comments before lines

* numpydoc compliance in extension template

* clarified imports

* added clarification on init

* est2 fixed

* removed fixed arg for transform testing

* refactor summarizer and base class

* linting

* wrong place

* output for transformer fix

* had wrong X_inner_mtype

* fixed test_raises_not_fitted_error test

* changed fit-in-transform behaviour to "fit must always be called even if empty"

* fresh prince

* fresh prince 2

* i have to make this commit to swap branches and fix a bug

* fix secondary error caused by changing fit-in-transform behaviour

* testing changes, still some train estimate stuff to do

* fresh prince test

* summary classifier fix and docs

* pre-pr bug fix

* catch22 replace nans

* catch22 single feature public

* catch22 nan replacement change

* tsfresh link

Co-authored-by: Franz Király <f.kiraly@ucl.ac.uk>
Co-authored-by: a-pasos-ruiz <56823538+a-pasos-ruiz@users.noreply.github.com>
Co-authored-by: Alejandro Pasos Ruiz (CMP - Postgraduate Researcher) <fbu19zru@UEA.AC.UK>
Co-authored-by: Tony Bagnall <ajb@uea.ac.uk>
Co-authored-by: Lovkush <lovkush@gmail.com>
  • Loading branch information
6 people committed Dec 19, 2021
1 parent 5c608f0 commit 8fee732
Show file tree
Hide file tree
Showing 18 changed files with 2,336 additions and 956 deletions.
8 changes: 8 additions & 0 deletions sktime/classification/feature_based/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,23 @@
__all__ = [
"Catch22Classifier",
"MatrixProfileClassifier",
"RandomIntervalClassifier",
"SignatureClassifier",
"SummaryClassifier",
"TSFreshClassifier",
"FreshPRINCE",
]

from sktime.classification.feature_based._catch22_classifier import Catch22Classifier
from sktime.classification.feature_based._fresh_prince import FreshPRINCE
from sktime.classification.feature_based._matrix_profile_classifier import (
MatrixProfileClassifier,
)
from sktime.classification.feature_based._random_interval_classifier import (
RandomIntervalClassifier,
)
from sktime.classification.feature_based._signature_classifier import (
SignatureClassifier,
)
from sktime.classification.feature_based._summary_classifier import SummaryClassifier
from sktime.classification.feature_based._tsfresh_classifier import TSFreshClassifier
21 changes: 11 additions & 10 deletions sktime/classification/feature_based/_catch22_classifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ class Catch22Classifier(BaseClassifier):
outlier_norm : bool, default=False
Normalise each series during the two outlier catch22 features, which can take a
while to process for large values
replace_nans : bool, default=True
Replace NaN or inf values from the catch22 transform with 0.
estimator : sklearn classifier, default=None
An sklearn estimator to be built using the transformed data. Defaults to a
Random Forest with 200 trees.
Expand Down Expand Up @@ -83,11 +85,13 @@ class Catch22Classifier(BaseClassifier):
def __init__(
self,
outlier_norm=False,
replace_nans=True,
estimator=None,
n_jobs=1,
random_state=None,
):
self.outlier_norm = outlier_norm
self.replace_nans = replace_nans
self.estimator = estimator

self.n_jobs = n_jobs
Expand Down Expand Up @@ -118,7 +122,9 @@ def _fit(self, X, y):
Changes state by creating a fitted model that updates attributes
ending in "_" and sets is_fitted flag to True.
"""
self._transformer = Catch22(outlier_norm=self.outlier_norm)
self._transformer = Catch22(
outlier_norm=self.outlier_norm, replace_nans=self.replace_nans
)

self._estimator = _clone_estimator(
RandomForestClassifier(n_estimators=200)
Expand All @@ -132,7 +138,7 @@ def _fit(self, X, y):
self._estimator.n_jobs = self._threads_to_use

X_t = self._transformer.fit_transform(X, y)
X_t = np.nan_to_num(X_t, False, 0, 0, 0)

self._estimator.fit(X_t, y)

return self
Expand All @@ -150,9 +156,7 @@ def _predict(self, X):
y : array-like, shape = [n_instances]
Predicted class labels.
"""
X_t = self._transformer.transform(X)
X_t = np.nan_to_num(X_t, False, 0, 0, 0)
return self._estimator.predict(X_t)
return self._estimator.predict(self._transformer.transform(X))

def _predict_proba(self, X):
"""Predict class probabilities for n instances in X.
Expand All @@ -167,15 +171,12 @@ def _predict_proba(self, X):
y : array-like, shape = [n_instances, n_classes_]
Predicted probabilities using the ordering in classes_.
"""
X_t = self._transformer.transform(X)
X_t = np.nan_to_num(X_t, False, 0, 0, 0)

m = getattr(self._estimator, "predict_proba", None)
if callable(m):
return self._estimator.predict_proba(X_t)
return self._estimator.predict_proba(self._transformer.transform(X))
else:
dists = np.zeros((X.shape[0], self.n_classes_))
preds = self._estimator.predict(X_t)
preds = self._estimator.predict(self._transformer.transform(X))
for i in range(0, X.shape[0]):
dists[i, self._class_dictionary[preds[i]]] = 1
return dists
206 changes: 206 additions & 0 deletions sktime/classification/feature_based/_fresh_prince.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
# -*- coding: utf-8 -*-
"""FreshPRINCE Classifier.
Pipeline classifier using the full set of TSFresh features and a RotationForest
classifier.
"""

__author__ = ["MatthewMiddlehurst"]
__all__ = ["FreshPRINCE"]


from sktime.classification.base import BaseClassifier
from sktime.contrib.vector_classifiers._rotation_forest import RotationForest
from sktime.transformations.panel.tsfresh import TSFreshFeatureExtractor
from sktime.utils.validation.panel import check_X_y


class FreshPRINCE(BaseClassifier):
"""Fresh Pipeline with RotatIoN forest Classifier.
This classifier simply transforms the input data using the TSFresh [1]_
transformer with comprehensive features and builds a RotationForest estimator using
the transformed data.
Parameters
----------
default_fc_parameters : str, default="comprehensive"
Set of TSFresh features to be extracted, options are "minimal", "efficient" or
"comprehensive".
n_estimators : int, default=200
Number of estimators for the RotationForest ensemble.
verbose : int, default=0
Level of output printed to the console (for information only)
n_jobs : int, default=1
The number of jobs to run in parallel for both `fit` and `predict`.
``-1`` means using all processors.
chunksize : int or None, default=None
Number of series processed in each parallel TSFresh job, should be optimised
for efficient parallelisation.
random_state : int or None, default=None
Seed for random, integer.
Attributes
----------
n_classes_ : int
Number of classes. Extracted from the data.
classes_ : ndarray of shape (n_classes_)
Holds the label for each class.
See Also
--------
TSFreshFeatureExtractor, TSFreshClassifier, RotationForest
References
----------
.. [1] Christ, Maximilian, et al. "Time series feature extraction on basis of
scalable hypothesis tests (tsfresh–a python package)." Neurocomputing 307
(2018): 72-77.
https://www.sciencedirect.com/science/article/pii/S0925231218304843
Examples
--------
>>> from sktime.classification.feature_based import FreshPRINCE
>>> from sktime.contrib.vector_classifiers._rotation_forest import RotationForest
>>> from sktime.datasets import load_unit_test
>>> X_train, y_train = load_unit_test(split="train", return_X_y=True)
>>> X_test, y_test = load_unit_test(split="test", return_X_y=True)
>>> clf = FreshPRINCE(
... default_fc_parameters="minimal",
... n_estimators=10,
... )
>>> clf.fit(X_train, y_train)
FreshPRINCE(...)
>>> y_pred = clf.predict(X_test)
"""

_tags = {
"capability:multivariate": True,
"capability:multithreading": True,
"capability:train_estimate": True,
}

def __init__(
self,
default_fc_parameters="comprehensive",
n_estimators=200,
save_transformed_data=False,
verbose=0,
n_jobs=1,
chunksize=None,
random_state=None,
):
self.default_fc_parameters = default_fc_parameters
self.n_estimators = n_estimators

self.save_transformed_data = save_transformed_data
self.verbose = verbose
self.n_jobs = n_jobs
self.chunksize = chunksize
self.random_state = random_state

self.n_instances_ = 0
self.n_dims_ = 0
self.series_length_ = 0
self.transformed_data_ = []

self._rotf = None
self._tsfresh = None

super(FreshPRINCE, self).__init__()

def _fit(self, X, y):
"""Fit a pipeline on cases (X,y), where y is the target variable.
Parameters
----------
X : 3D np.array of shape = [n_instances, n_dimensions, series_length]
The training data.
y : array-like, shape = [n_instances]
The class labels.
Returns
-------
self :
Reference to self.
Notes
-----
Changes state by creating a fitted model that updates attributes
ending in "_" and sets is_fitted flag to True.
"""
self.n_instances_, self.n_dims_, self.series_length_ = X.shape

self._rotf = RotationForest(
n_estimators=self.n_estimators,
save_transformed_data=self.save_transformed_data,
n_jobs=self._threads_to_use,
random_state=self.random_state,
)
self._tsfresh = TSFreshFeatureExtractor(
default_fc_parameters=self.default_fc_parameters,
n_jobs=self._threads_to_use,
chunksize=self.chunksize,
show_warnings=self.verbose > 1,
disable_progressbar=self.verbose < 1,
)

X_t = self._tsfresh.fit_transform(X, y)
self._rotf.fit(X_t, y)

if self.save_transformed_data:
self.transformed_data_ = X_t

return self

def _predict(self, X):
"""Predict class values of n instances in X.
Parameters
----------
X : 3D np.array of shape = [n_instances, n_dimensions, series_length]
The data to make predictions for.
Returns
-------
y : array-like, shape = [n_instances]
Predicted class labels.
"""
return self._rotf.predict(self._tsfresh.transform(X))

def _predict_proba(self, X):
"""Predict class probabilities for n instances in X.
Parameters
----------
X : 3D np.array of shape = [n_instances, n_dimensions, series_length]
The data to make predict probabilities for.
Returns
-------
y : array-like, shape = [n_instances, n_classes_]
Predicted probabilities using the ordering in classes_.
"""
return self._rotf.predict_proba(self._tsfresh.transform(X))

def _get_train_probs(self, X, y):
self.check_is_fitted()
X, y = check_X_y(X, y, coerce_to_numpy=True)

n_instances, n_dims, series_length = X.shape

if (
n_instances != self.n_instances_
or n_dims != self.n_dims_
or series_length != self.series_length_
):
raise ValueError(
"n_instances, n_dims, series_length mismatch. X should be "
"the same as the training data used in fit for generating train "
"probabilities."
)

if not self.save_transformed_data:
raise ValueError("Currently only works with saved transform data from fit.")

return self._rotf._get_train_probs(self.transformed_data_, y)

0 comments on commit 8fee732

Please sign in to comment.