Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement important series-to-series transformers #6

Closed
12 tasks
fkiraly opened this issue Jan 4, 2019 · 17 comments
Closed
12 tasks

Implement important series-to-series transformers #6

fkiraly opened this issue Jan 4, 2019 · 17 comments
Labels
good first issue Good for newcomers implementing algorithms Implementing algorithms, estimators, objects native to sktime implementing framework Implementing or improving framework for learning tasks, e.g., base class functionality

Comments

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 4, 2019

Some series-to-series transformers that would be useful.

unfitted, single-series, simple

  • binning/aggregation transformer

Behaviour:
returns the sequence of [aggregator application] (e.g., count) in the bins. Index is start time, end time, or index (from start) of bin, depending on index hyper-parameter

Hyper-parameters:
bin specs - start: time/index, end : time/index, numbins : integer
index - 'start', 'end', or 'bin'
aggregator - function to apply to values within bin, default = count

alternative to bin specs: index sequence

  • truncation transformer

Behaviour:
cuts off any entry in the sequence with index outside [lower, upper]

Hyper-parameters:
lower, upper : time

  • simple equal spacing transformer

Behaviour:
intra/extrapolates series to the nodes by the specified strategy, e.g., fill in nearest or next (careful with boundaries)

Hyper-parameters:
node specs - start: time/index, end : time/index, numsteps : integer
index - 'start', 'end', or 'bin'
strategy - 'nearest', 'last' , 'next', 'pw_linear'

alternative to node specs: index sequence

  • re-indexing transformer

Behaviour:
changes the index by a the strategy indicated in the reindexing parameter
integer = replace with ascending count
field = get from data frame column

Hyper-parameters:
strategy - 'integer', 'field'

  • index extractor transformer

Behaviour:
creates a series from the index of the series

  • NA remover transformer

Behaviour:
removes sequence elements that are numpy.nan

  • padding transformer

Behaviour:
pads a sequence/series with value at start or end until it has the desired length

Hyper-parameters:
where - 'start', 'end'
what - value
length - integer
optional: index treatment

  • NA imputer

Behaviour:
Fills in NA values by the specified strategy

Hyper-parameters:
strategy - 'nearest', 'last' , 'next', 'pw_linear'

unfitted, single-series, reduction

  • interpolation transformer

Behaviour:
uses a scikit-learn regressor or classifier to interpolate to the specified index set.
Fits series values against series index, and uses the regressor/classifier to predict value from index

Hyper-parameters:
index set
estimator - sklearn regressor

  • Supervised NA imputer

Behaviour:
Fills in NA values by the specified strategy by using a scikit-learn regressor or classifier. Fits non-NA series values against series index, and uses the regressor/classifier to predict value from index

Hyper-parameters:
strategy - 'nearest', 'last' , 'next', 'pw_linear'

  • advanced: exogeneous or multi-column versions

unfitted, multiple-series

Note: the below are "unfitted" since they run on the entire series

  • index homogenization transformer
    Behaviour:
    Looks up the indices for all the series and introduces them for all the series. Fills in values at new nodes by the specified strategy.

Hyper-parameters:
strategy - ' NA', 'nearest', 'last', 'next'

design questions

  • would it make sense to create an "interpolator" class which in predict takes a series and an index sequence and returns the values?
  • does it make sense to expose dedicated index parameter interfaces in some of the above?
@fkiraly fkiraly added implementing framework Implementing or improving framework for learning tasks, e.g., base class functionality must - high priority labels Jan 4, 2019
@fkiraly
Copy link
Collaborator Author

fkiraly commented Jan 4, 2019

Assigned Ahmed, but only for the task to post the link mentioned in call on Jan 4. Not for implementation.

@fkiraly fkiraly added the implementing algorithms Implementing algorithms, estimators, objects native to sktime label Jan 4, 2019
@fkiraly fkiraly added this to ToDo in Use case 1: TSC/TSR Jan 4, 2019
@fkiraly
Copy link
Collaborator Author

fkiraly commented Jan 7, 2019

Copying Ahmed's email:

Dear all,

With regard to the whiteboard discussion, you may be able to reuse the series-to-series SmoothingSplineTransformer from pysf. There is an (abstract) class hierarchy involved: it implements an AbstractTransformer for reusability.

Kind regards,
Ahmed

@TonyBagnall
Copy link
Contributor

Desired transforms for TSC
TB: desired transformers for classifiers:
A) FFT/DFT
B) Autocorrelation function (ACF)
C) Partial autocorrelation function (autoregressive coefficients) (PACF)
D) Princ. Component Aanalysis (PCA).
E) Auto Regressive Moving Average (ARMA)
F) Power Spectrum
G) Cepstrum
H) Cosine
I) Piecewise Aggregate Approximation (PAA)
J) Symbolic Aggregate Approximation (SAX)
K) Symbolic Fourier Approximation (SFA)
L) Bag of Patterns (with SAX or SFA)
M) Shapelet
may be more

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jan 29, 2019

Oh dear, there's a lot of transformers!

Can we perhaps pull some of these together and/or sub-select?

Similarly, surely a lot of these should be available and readily interfaceable, e.g., in tsfresh?

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jan 29, 2019

I see that some of these are "single-row", wile others are "multi-row". I feel there should be a distinction here, like in pysf?

@TonyBagnall
Copy link
Contributor

well its a wish list, and many are very easy to implement or can use existing code, although its important to understand how the imported code works as it can effect the classifier (e.g. padding, normalisation etc). All apart for PCA are series to series transforms (although SFA works with a data set transform called Multiple Coefficient Binning, which we can add. Single series transforms should fit on sets of series or single series, data set transforms should not! Happy to make a distinction, perhaps by different directories (I wont say packages, as that means something else over here), but I would like each in its own file if your design will permit, I dont like the scikit style of chucking a whole bunch of stuff in one file myself. But then I wouldnt, I write Java, so will defer to python style if preferred.

@TonyBagnall
Copy link
Contributor

so A to F should be available in any sensible stats package. G maybe, but its strangely often not. H is trivial. I, J, K we should do from scratch as they are fundamental to many other operations, but we have stable (?) java implementations. L is probably a heavyweight implementation. M) is a project on its own, and I have someone lined up to do it.

@fkiraly
Copy link
Collaborator Author

fkiraly commented Feb 5, 2019

also check pysf, and pyts for interfaceability or viability of using interface (low-level)

@fkiraly
Copy link
Collaborator Author

fkiraly commented Feb 12, 2019

Q: can we re-use functionTranformer from sklearn?

@mloning mloning moved this from ToDo to In progress in Use case 1: TSC/TSR Feb 12, 2019
@mloning
Copy link
Contributor

mloning commented Feb 13, 2019

Available transforms in Python:

Transform Package/Function
A) FFT/DFT https://docs.scipy.org/doc/numpy-1.15.0/reference/routines.fft.html
B) Autocorrelation function (ACF) https://www.statsmodels.org/0.9.0/generated/statsmodels.tsa.stattools.acf.html
C) Partial autocorrelation function (autoregressive coefficients) (PACF) https://www.statsmodels.org/0.9.0/generated/statsmodels.tsa.stattools.pacf.html
D) PCA https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
E) Auto Regressive Moving Average (ARMA) https://www.statsmodels.org/0.9.0/generated/statsmodels.tsa.arima_model.ARMA.html
F) Power Spectrum https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.signal.periodogram.html
G) Cepstrum https://github.com/python-acoustics/python-acoustics/blob/master/acoustics/cepstrum.py
H) Cosine https://docs.scipy.org/doc/numpy/reference/generated/numpy.cos.html
I) Piecewise Aggregate Approximation (PAA) https://tslearn.readthedocs.io/en/latest/gen_modules/piecewise/tslearn.piecewise.PiecewiseAggregateApproximation.html,
J) Symbolic Aggregate Approximation (SAX) https://pyts.readthedocs.io/en/latest/quantization.html#pyts.quantization.SAX
K) Symbolic Fourier Approximation (SFA) https://pyts.readthedocs.io/en/latest/quantization.html#pyts.quantization.SAF
L) BOSS/Bag of Patterns (with SAX or SFA) https://pyts.readthedocs.io/en/latest/transformation.html#pyts.transformation.BOSS (with SFA)
M) Shapelet https://tslearn.readthedocs.io/en/latest/gen_modules/tslearn.shapelets.html

@fkiraly
Copy link
Collaborator Author

fkiraly commented Feb 27, 2019

all except (D) and (M) are series-to-series.
Consider building a multiplexer/selector class for multiple interfacing?

@mloning mloning added the good first issue Good for newcomers label Jun 14, 2019
@jesellier
Copy link
Contributor

Hello all. I just committed prototype design in DEV for the transformers.

In a nutshell :

  • There is a BASE abstract that store the lambdas functions into a static member dict and contain a single 'transform' method
  • Then the different SUBS just needs to override a method that return a dict of parameters to be passed into the lambdas.
  • I tried to pass as much functionality as possible into the BASE

Happy to receive feedback/comments !

@jesellier
Copy link
Contributor

done A, B, C, F and H

@TonyBagnall
Copy link
Contributor

so revised list of incomplete

  1. PCA
  2. ARMA
  3. Cepstrum
  4. Cosine

all of which are off the shelf transforms, so should be easy. Shapelets still need some work so go back on the list
5) Shapelets: Revise and resubmit
and lets add
6) Truncator: Truncate series to the shortest length
7) Padder: zero pad to the longest length
(I started these but got side tracked)

@mloning
Copy link
Contributor

mloning commented Jul 25, 2019

Could we allow the Truncator to truncate the series to a given length as well? (via some kwargs maybe?) That may become useful in forecasting for efficiency reasons (actually used by some methods in M4).

@fkiraly
Copy link
Collaborator Author

fkiraly commented Jun 19, 2020

Updating first post for the 2020 dev sprint.

This was referenced Jun 22, 2020
@mloning
Copy link
Contributor

mloning commented Nov 13, 2020

Close this in favour of #483

@mloning mloning closed this as completed Nov 13, 2020
Use case 1: TSC/TSR automation moved this from In progress to Done Nov 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers implementing algorithms Implementing algorithms, estimators, objects native to sktime implementing framework Implementing or improving framework for learning tasks, e.g., base class functionality
Projects
None yet
Development

No branches or pull requests

6 participants