Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sktime: Python implementation of HIVE-COTE #842

Closed
iuimaki opened this issue Apr 26, 2021 · 12 comments
Closed

Sktime: Python implementation of HIVE-COTE #842

iuimaki opened this issue Apr 26, 2021 · 12 comments
Assignees
Labels
module:classification classification module: time series classification

Comments

@iuimaki
Copy link

iuimaki commented Apr 26, 2021

Is your feature request related to a problem? Please describe.
I'm new to sktime and time series classification.
I have installed sktime toolkit. I trying to apply HIVE-COTE 1.0 to my dataset. The codes are shown as follows, x_train, y_train, x_test, and y_test are my prepared dataset. I call HIVE-COTE directly through the module HIVECOTEV1, but I am not sure if this is the correct way to reproduce hive-cote algorithm. Is this HIVECOTE1 module includes ensembling part (like TimeseriesForestClassifier)?
Outlook-1gckqx1b

Another question is the time series dataset I have is 3d array, which is in the format of (number of samples, time steps, number of features). But when I fed the dataset into the Timeserisforestclassifier, the error information is shown as follows. After processing the dataset with the module 'columnconcatenator()', it can run smoothly. Does that mean only 2d array can be fed into the algorithm as well as HIVE-COTE?
The shape of x_train, y_train, x_test, and y_test: (28, 1918, 62) (16, 1918, 62) (28,) (16,)
error information:
Traceback (most recent call last):
File "h:/scipt/prediction/stacking for classification/HIVECOTE.py", line 73, in
clf.fit(x_train, y_train)
File "C:\Anaconda3\envs\lib\site-packages\sktime\series_as_features\base\estimators\interval_based_tsf.py", line 86, in fit
coerce_to_numpy=True,
File "C:\envs\ARTC\lib\site-packages\sktime\utils\validation\panel.py", line 187, in check_X_y
coerce_to_pandas=coerce_to_pandas,
File "C:\Anaconda3\envs\lib\site-packages\sktime\utils\validation\panel.py", line 87, in check_X
f"X must be univariate with X.shape[1] == 1, but found: "
ValueError: X must be univariate with X.shape[1] == 1, but found: X.shape[1] == 1918.

Describe the solution you'd like
It will be very appreciated that if you can provide a demo of Multivariate time series classification with HIVECOTE in python on Github or sktime documentation. I think that will be very helpful for us to apply this state-of-the-art algorithm for industry application.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@Remmert-A
Copy link

Remmert-A commented Apr 29, 2021

+1
An introduction into the usage and preprocessing requirements of the HIVECOTE module would be greatly appreciated from me as well.

@iuimaki
Copy link
Author

iuimaki commented Apr 30, 2021

Another question is how does the class_weight parameter in sktime work?
I want to use TimeSeriesForestClassifier to do binary classification on an unbalanced data set. The classes are labelled 0 (negative) and 1 (positive) and the observed data is in a ratio of about 9:1 with the majority of samples having negative outcome.
The documentation of sktime shows that there is a way to set class_weight parameter, similar to scikit-learn. But it failed when I run for this: clf = TimeSeriesForestClassifier(n_estimators=100, class_weight= {0:1, 1:w},). The error information is: TypeError: init() got an unexpected keyword argument 'class_weight'.

@TonyBagnall
Copy link
Contributor

hi, we are working on hive-cote for python, just tidying up and testing some features for the java version first. Once term has finished @MatthewMiddlehurst and I can give it our full attention.

@TonyBagnall
Copy link
Contributor

The python version has all sorts of pythonesque issues, memory intensive, slow etc which require significant engineering and we are not really python programmers, so it is painful. In the short term, if anyone wants to run HIVE-COTE v2 just email me ajb@uea.ac.uk, we can help you get it running in java (very easy, we can just give you the jar file and you can run it on command line or with a script) or we can run it ourselves and just send you the results files.

@iuimaki
Copy link
Author

iuimaki commented Apr 30, 2021

hi, we are working on hive-cote for python, just tidying up and testing some features for the java version first. Once term has finished @MatthewMiddlehurst and I can give it our full attention.

Heya! Thank you for the feedback :)

@paulttt
Copy link

paulttt commented May 20, 2021

Hi guys!
FYI, I was also trying to run HIVE-COTE v1 today. I got confronted with the following error. Any help is appreciated if that is a known bug and/or I make something wrong here. My data is of shape X_train.shape = (N_samples, 1, time_bins) and y_train.shape = (N_samples,) with y_train = [0, 1].

RuntimeError                              Traceback (most recent call last)
<ipython-input-18-815c2880dcd0> in <module>
----> 1 hc.fit(X_train, y_train)

~/anaconda3/envs/py38/lib/python3.7/site-packages/sktime/classification/hybrid/_hivecote_v1.py in fit(self, X, y)
    101             time_contract_in_mins=60,
    102         )
--> 103         self.stc.fit(X, y)
    104         train_preds = cross_val_predict(
    105             ShapeletTransformClassifier(

~/anaconda3/envs/py38/lib/python3.7/site-packages/sktime/classification/shapelet_based/_stc.py in fit(self, X, y)
    119         self.classes_ = class_distribution(np.asarray(y).reshape(-1, 1))[0][0]
    120 
--> 121         self.classifier_.fit(X, y)
    122 
    123         self._is_fitted = True

~/anaconda3/envs/py38/lib/python3.7/site-packages/sklearn/pipeline.py in fit(self, X, y, **fit_params)
    328         """
    329         fit_params_steps = self._check_fit_params(**fit_params)
--> 330         Xt = self._fit(X, y, **fit_params_steps)
    331         with _print_elapsed_time('Pipeline',
    332                                  self._log_message(len(self.steps) - 1)):

~/anaconda3/envs/py38/lib/python3.7/site-packages/sklearn/pipeline.py in _fit(self, X, y, **fit_params_steps)
    294                 message_clsname='Pipeline',
    295                 message=self._log_message(step_idx),
--> 296                 **fit_params_steps[name])
    297             # Replace the transformer of the step with the fitted
    298             # transformer. This is necessary when loading the transformer

~/anaconda3/envs/py38/lib/python3.7/site-packages/joblib/memory.py in __call__(self, *args, **kwargs)
    350 
    351     def __call__(self, *args, **kwargs):
--> 352         return self.func(*args, **kwargs)
    353 
    354     def call_and_shelve(self, *args, **kwargs):

~/anaconda3/envs/py38/lib/python3.7/site-packages/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
    738     with _print_elapsed_time(message_clsname, message):
    739         if hasattr(transformer, 'fit_transform'):
--> 740             res = transformer.fit_transform(X, y, **fit_params)
    741         else:
    742             res = transformer.fit(X, y, **fit_params).transform(X)

~/anaconda3/envs/py38/lib/python3.7/site-packages/sktime/transformations/base.py in fit_transform(self, Z, X)
     89         else:
     90             # Fit method of arity 2 (supervised transformation)
---> 91             return self.fit(Z, X).transform(Z)
     92 
     93     # def inverse_transform(self, Z, X=None):

~/anaconda3/envs/py38/lib/python3.7/site-packages/sktime/transformations/panel/shapelets.py in transform(self, X, y)
    699         if len(self.shapelets) == 0:
    700             raise RuntimeError(
--> 701                 "No shapelets were extracted in fit that exceeded the "
    702                 "minimum information gain threshold. Please retry with other "
    703                 "data and/or parameter settings."

RuntimeError: No shapelets were extracted in fit that exceeded the minimum information gain threshold. Please retry with other data and/or parameter settings.

@MatthewMiddlehurst
Copy link
Contributor

MatthewMiddlehurst commented May 21, 2021

Hi @paultt, its hard to know exactly whats going on without knowing a bit about the data. STC is definitetly one of the more under-developed classifiers. We are hoping to sort out all our sktime classifiers after this teaching term at UEA ends.

@paulttt
Copy link

paulttt commented May 21, 2021

Hi @MatthewMiddlehurst,
Thanks for the feedback! I will try to train my data on the shapelet classifier only and see if I face similar problems.
I use continuous-time signals recorded from 5500 neurons. So far, I had no problems with any classification model. The exact shape is (5500, 1, 1750).

@MatthewMiddlehurst
Copy link
Contributor

Im a bit late for the previous comments on this issue, but when it comes to preprocessing I would look at the data_loading and classification notebooks in the examples folder. HIVE-COTE uses the same data format as other sktime classifiers. HIVE-COTE can only take univairate data currently, not datasets with multiple series per instance.

@TonyBagnall TonyBagnall added the module:classification classification module: time series classification label Jun 20, 2021
@TonyBagnall
Copy link
Contributor

update on this, Matt now has equivalence with tsml on DrCIF, TDE and Arsenal. Just STC to sort out now, which is a summer objective for my group

@MatthewMiddlehurst
Copy link
Contributor

Close to completion, see #1504

@MatthewMiddlehurst
Copy link
Contributor

The sktime master branch has an implementation of HIVE-COTE 2.0 and an updated version of HIVE-COTE 1.0 after the merge of #1504. If any issues with these arise they can be a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:classification classification module: time series classification
Projects
None yet
Development

No branches or pull requests

5 participants