Implement important series-to-series transformers #6

fkiraly · 2019-01-04T16:12:08Z

Some series-to-series transformers that would be useful.

unfitted, single-series, simple

binning/aggregation transformer

Behaviour:
returns the sequence of [aggregator application] (e.g., count) in the bins. Index is start time, end time, or index (from start) of bin, depending on index hyper-parameter

Hyper-parameters:
bin specs - start: time/index, end : time/index, numbins : integer
index - 'start', 'end', or 'bin'
aggregator - function to apply to values within bin, default = count

alternative to bin specs: index sequence

truncation transformer

Behaviour:
cuts off any entry in the sequence with index outside [lower, upper]

Hyper-parameters:
lower, upper : time

simple equal spacing transformer

Behaviour:
intra/extrapolates series to the nodes by the specified strategy, e.g., fill in nearest or next (careful with boundaries)

Hyper-parameters:
node specs - start: time/index, end : time/index, numsteps : integer
index - 'start', 'end', or 'bin'
strategy - 'nearest', 'last' , 'next', 'pw_linear'

alternative to node specs: index sequence

re-indexing transformer

Behaviour:
changes the index by a the strategy indicated in the reindexing parameter
integer = replace with ascending count
field = get from data frame column

Hyper-parameters:
strategy - 'integer', 'field'

index extractor transformer

Behaviour:
creates a series from the index of the series

NA remover transformer

Behaviour:
removes sequence elements that are numpy.nan

padding transformer

Behaviour:
pads a sequence/series with value at start or end until it has the desired length

Hyper-parameters:
where - 'start', 'end'
what - value
length - integer
optional: index treatment

NA imputer

Behaviour:
Fills in NA values by the specified strategy

Hyper-parameters:
strategy - 'nearest', 'last' , 'next', 'pw_linear'

unfitted, single-series, reduction

interpolation transformer

Behaviour:
uses a scikit-learn regressor or classifier to interpolate to the specified index set.
Fits series values against series index, and uses the regressor/classifier to predict value from index

Hyper-parameters:
index set
estimator - sklearn regressor

Supervised NA imputer

Behaviour:
Fills in NA values by the specified strategy by using a scikit-learn regressor or classifier. Fits non-NA series values against series index, and uses the regressor/classifier to predict value from index

Hyper-parameters:
strategy - 'nearest', 'last' , 'next', 'pw_linear'

advanced: exogeneous or multi-column versions

unfitted, multiple-series

Note: the below are "unfitted" since they run on the entire series

index homogenization transformer
Behaviour:
Looks up the indices for all the series and introduces them for all the series. Fills in values at new nodes by the specified strategy.

Hyper-parameters:
strategy - ' NA', 'nearest', 'last', 'next'

design questions

would it make sense to create an "interpolator" class which in predict takes a series and an index sequence and returns the values?
does it make sense to expose dedicated index parameter interfaces in some of the above?

fkiraly · 2019-01-04T16:13:59Z

Assigned Ahmed, but only for the task to post the link mentioned in call on Jan 4. Not for implementation.

fkiraly · 2019-01-07T18:12:46Z

Copying Ahmed's email:

Dear all,

With regard to the whiteboard discussion, you may be able to reuse the series-to-series SmoothingSplineTransformer from pysf. There is an (abstract) class hierarchy involved: it implements an AbstractTransformer for reusability.

Kind regards,
Ahmed

TonyBagnall · 2019-01-29T10:33:10Z

Desired transforms for TSC
TB: desired transformers for classifiers:
A) FFT/DFT
B) Autocorrelation function (ACF)
C) Partial autocorrelation function (autoregressive coefficients) (PACF)
D) Princ. Component Aanalysis (PCA).
E) Auto Regressive Moving Average (ARMA)
F) Power Spectrum
G) Cepstrum
H) Cosine
I) Piecewise Aggregate Approximation (PAA)
J) Symbolic Aggregate Approximation (SAX)
K) Symbolic Fourier Approximation (SFA)
L) Bag of Patterns (with SAX or SFA)
M) Shapelet
may be more

fkiraly · 2019-01-29T10:52:47Z

Oh dear, there's a lot of transformers!

Can we perhaps pull some of these together and/or sub-select?

Similarly, surely a lot of these should be available and readily interfaceable, e.g., in tsfresh?

fkiraly · 2019-01-29T10:54:55Z

I see that some of these are "single-row", wile others are "multi-row". I feel there should be a distinction here, like in pysf?

TonyBagnall · 2019-01-29T11:03:17Z

well its a wish list, and many are very easy to implement or can use existing code, although its important to understand how the imported code works as it can effect the classifier (e.g. padding, normalisation etc). All apart for PCA are series to series transforms (although SFA works with a data set transform called Multiple Coefficient Binning, which we can add. Single series transforms should fit on sets of series or single series, data set transforms should not! Happy to make a distinction, perhaps by different directories (I wont say packages, as that means something else over here), but I would like each in its own file if your design will permit, I dont like the scikit style of chucking a whole bunch of stuff in one file myself. But then I wouldnt, I write Java, so will defer to python style if preferred.

TonyBagnall · 2019-01-29T11:08:03Z

so A to F should be available in any sensible stats package. G maybe, but its strangely often not. H is trivial. I, J, K we should do from scratch as they are fundamental to many other operations, but we have stable (?) java implementations. L is probably a heavyweight implementation. M) is a project on its own, and I have someone lined up to do it.

fkiraly · 2019-02-05T14:12:40Z

also check pysf, and pyts for interfaceability or viability of using interface (low-level)

fkiraly · 2019-02-12T14:42:18Z

Q: can we re-use functionTranformer from sklearn?

mloning · 2019-02-13T12:40:55Z

Available transforms in Python:

Transform	Package/Function
A) FFT/DFT	https://docs.scipy.org/doc/numpy-1.15.0/reference/routines.fft.html
B) Autocorrelation function (ACF)	https://www.statsmodels.org/0.9.0/generated/statsmodels.tsa.stattools.acf.html
C) Partial autocorrelation function (autoregressive coefficients) (PACF)	https://www.statsmodels.org/0.9.0/generated/statsmodels.tsa.stattools.pacf.html
D) PCA	https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
E) Auto Regressive Moving Average (ARMA)	https://www.statsmodels.org/0.9.0/generated/statsmodels.tsa.arima_model.ARMA.html
F) Power Spectrum	https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.signal.periodogram.html
G) Cepstrum	https://github.com/python-acoustics/python-acoustics/blob/master/acoustics/cepstrum.py
H) Cosine	https://docs.scipy.org/doc/numpy/reference/generated/numpy.cos.html
I) Piecewise Aggregate Approximation (PAA)	https://tslearn.readthedocs.io/en/latest/gen_modules/piecewise/tslearn.piecewise.PiecewiseAggregateApproximation.html,
J) Symbolic Aggregate Approximation (SAX)	https://pyts.readthedocs.io/en/latest/quantization.html#pyts.quantization.SAX
K) Symbolic Fourier Approximation (SFA)	https://pyts.readthedocs.io/en/latest/quantization.html#pyts.quantization.SAF
L) BOSS/Bag of Patterns (with SAX or SFA)	https://pyts.readthedocs.io/en/latest/transformation.html#pyts.transformation.BOSS (with SFA)
M) Shapelet	https://tslearn.readthedocs.io/en/latest/gen_modules/tslearn.shapelets.html

fkiraly · 2019-02-27T12:33:23Z

all except (D) and (M) are series-to-series.
Consider building a multiplexer/selector class for multiple interfacing?

jesellier · 2019-06-19T14:20:36Z

Hello all. I just committed prototype design in DEV for the transformers.

In a nutshell :

There is a BASE abstract that store the lambdas functions into a static member dict and contain a single 'transform' method
Then the different SUBS just needs to override a method that return a dict of parameters to be passed into the lambdas.
I tried to pass as much functionality as possible into the BASE

Happy to receive feedback/comments !

jesellier · 2019-06-19T15:45:18Z

done A, B, C, F and H

TonyBagnall · 2019-07-24T20:31:46Z

so revised list of incomplete

PCA
ARMA
Cepstrum
Cosine

all of which are off the shelf transforms, so should be easy. Shapelets still need some work so go back on the list
5) Shapelets: Revise and resubmit
and lets add
6) Truncator: Truncate series to the shortest length
7) Padder: zero pad to the longest length
(I started these but got side tracked)

mloning · 2019-07-25T13:18:55Z

Could we allow the Truncator to truncate the series to a given length as well? (via some kwargs maybe?) That may become useful in forecasting for efficiency reasons (actually used by some methods in M4).

fkiraly · 2020-06-19T19:21:33Z

Updating first post for the 2020 dev sprint.

mloning · 2020-11-13T08:40:28Z

Close this in favour of #483

fkiraly assigned mloning and jasonlines Jan 4, 2019

fkiraly added implementing framework Implementing or improving framework for learning tasks, e.g., base class functionality must - high priority labels Jan 4, 2019

fkiraly assigned ahmedgc Jan 4, 2019

fkiraly mentioned this issue Jan 4, 2019

TSC/TSR: implement pipeline building functionality with transformers on y #7

Closed

fkiraly added the implementing algorithms Implementing algorithms, estimators, objects native to sktime label Jan 4, 2019

fkiraly added this to ToDo in Use case 1: TSC/TSR Jan 4, 2019

fkiraly mentioned this issue Feb 5, 2019

Prioritize transformers for implementation or interfacing #17

Closed

mloning moved this from ToDo to In progress in Use case 1: TSC/TSR Feb 12, 2019

mloning added the good first issue Good for newcomers label Jun 14, 2019

mloning removed the must - high priority label Jan 30, 2020

mloning unassigned ahmedgc, mloning and jasonlines Jun 19, 2020

This was referenced Jun 22, 2020

Truncate Transformer #315

Merged

Padding transformer #318

Merged

mloning closed this as completed Nov 13, 2020

Use case 1: TSC/TSR automation moved this from In progress to Done Nov 13, 2020

mmaaz-git mentioned this issue Apr 2, 2024

[BUG] MiniRocket segfault #6252

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement important series-to-series transformers #6

Implement important series-to-series transformers #6

fkiraly commented Jan 4, 2019 •

edited

fkiraly commented Jan 4, 2019

fkiraly commented Jan 7, 2019 •

edited

TonyBagnall commented Jan 29, 2019

fkiraly commented Jan 29, 2019

fkiraly commented Jan 29, 2019

TonyBagnall commented Jan 29, 2019

TonyBagnall commented Jan 29, 2019

fkiraly commented Feb 5, 2019

fkiraly commented Feb 12, 2019

mloning commented Feb 13, 2019

fkiraly commented Feb 27, 2019

jesellier commented Jun 19, 2019

jesellier commented Jun 19, 2019

TonyBagnall commented Jul 24, 2019

mloning commented Jul 25, 2019 •

edited

fkiraly commented Jun 19, 2020

mloning commented Nov 13, 2020

Implement important series-to-series transformers #6

Implement important series-to-series transformers #6

Comments

fkiraly commented Jan 4, 2019 • edited

unfitted, single-series, simple

unfitted, single-series, reduction

unfitted, multiple-series

design questions

fkiraly commented Jan 4, 2019

fkiraly commented Jan 7, 2019 • edited

TonyBagnall commented Jan 29, 2019

fkiraly commented Jan 29, 2019

fkiraly commented Jan 29, 2019

TonyBagnall commented Jan 29, 2019

TonyBagnall commented Jan 29, 2019

fkiraly commented Feb 5, 2019

fkiraly commented Feb 12, 2019

mloning commented Feb 13, 2019

fkiraly commented Feb 27, 2019

jesellier commented Jun 19, 2019

jesellier commented Jun 19, 2019

TonyBagnall commented Jul 24, 2019

mloning commented Jul 25, 2019 • edited

fkiraly commented Jun 19, 2020

mloning commented Nov 13, 2020

fkiraly commented Jan 4, 2019 •

edited

fkiraly commented Jan 7, 2019 •

edited

mloning commented Jul 25, 2019 •

edited