New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement important series-to-series transformers #6
Comments
Assigned Ahmed, but only for the task to post the link mentioned in call on Jan 4. Not for implementation. |
Copying Ahmed's email: Dear all, With regard to the whiteboard discussion, you may be able to reuse the series-to-series SmoothingSplineTransformer from pysf. There is an (abstract) class hierarchy involved: it implements an AbstractTransformer for reusability. Kind regards, |
Desired transforms for TSC |
Oh dear, there's a lot of transformers! Can we perhaps pull some of these together and/or sub-select? Similarly, surely a lot of these should be available and readily interfaceable, e.g., in tsfresh? |
I see that some of these are "single-row", wile others are "multi-row". I feel there should be a distinction here, like in pysf? |
well its a wish list, and many are very easy to implement or can use existing code, although its important to understand how the imported code works as it can effect the classifier (e.g. padding, normalisation etc). All apart for PCA are series to series transforms (although SFA works with a data set transform called Multiple Coefficient Binning, which we can add. Single series transforms should fit on sets of series or single series, data set transforms should not! Happy to make a distinction, perhaps by different directories (I wont say packages, as that means something else over here), but I would like each in its own file if your design will permit, I dont like the scikit style of chucking a whole bunch of stuff in one file myself. But then I wouldnt, I write Java, so will defer to python style if preferred. |
so A to F should be available in any sensible stats package. G maybe, but its strangely often not. H is trivial. I, J, K we should do from scratch as they are fundamental to many other operations, but we have stable (?) java implementations. L is probably a heavyweight implementation. M) is a project on its own, and I have someone lined up to do it. |
also check pysf, and pyts for interfaceability or viability of using interface (low-level) |
Q: can we re-use functionTranformer from sklearn? |
all except (D) and (M) are series-to-series. |
Hello all. I just committed prototype design in DEV for the transformers. In a nutshell :
Happy to receive feedback/comments ! |
done A, B, C, F and H |
so revised list of incomplete
all of which are off the shelf transforms, so should be easy. Shapelets still need some work so go back on the list |
Could we allow the |
Updating first post for the 2020 dev sprint. |
Close this in favour of #483 |
Some series-to-series transformers that would be useful.
unfitted, single-series, simple
Behaviour:
returns the sequence of [aggregator application] (e.g., count) in the bins. Index is start time, end time, or index (from start) of bin, depending on index hyper-parameter
Hyper-parameters:
bin specs - start: time/index, end : time/index, numbins : integer
index - 'start', 'end', or 'bin'
aggregator - function to apply to values within bin, default = count
alternative to bin specs: index sequence
Behaviour:
cuts off any entry in the sequence with index outside [lower, upper]
Hyper-parameters:
lower, upper : time
Behaviour:
intra/extrapolates series to the nodes by the specified strategy, e.g., fill in nearest or next (careful with boundaries)
Hyper-parameters:
node specs - start: time/index, end : time/index, numsteps : integer
index - 'start', 'end', or 'bin'
strategy - 'nearest', 'last' , 'next', 'pw_linear'
alternative to node specs: index sequence
Behaviour:
changes the index by a the strategy indicated in the reindexing parameter
integer = replace with ascending count
field = get from data frame column
Hyper-parameters:
strategy - 'integer', 'field'
Behaviour:
creates a series from the index of the series
Behaviour:
removes sequence elements that are numpy.nan
Behaviour:
pads a sequence/series with value at start or end until it has the desired length
Hyper-parameters:
where - 'start', 'end'
what - value
length - integer
optional: index treatment
Behaviour:
Fills in NA values by the specified strategy
Hyper-parameters:
strategy - 'nearest', 'last' , 'next', 'pw_linear'
unfitted, single-series, reduction
Behaviour:
uses a scikit-learn regressor or classifier to interpolate to the specified index set.
Fits series values against series index, and uses the regressor/classifier to predict value from index
Hyper-parameters:
index set
estimator - sklearn regressor
Behaviour:
Fills in NA values by the specified strategy by using a scikit-learn regressor or classifier. Fits non-NA series values against series index, and uses the regressor/classifier to predict value from index
Hyper-parameters:
strategy - 'nearest', 'last' , 'next', 'pw_linear'
unfitted, multiple-series
Note: the below are "unfitted" since they run on the entire series
Behaviour:
Looks up the indices for all the series and introduces them for all the series. Fills in values at new nodes by the specified strategy.
Hyper-parameters:
strategy - ' NA', 'nearest', 'last', 'next'
design questions
The text was updated successfully, but these errors were encountered: