-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
sktime dev days 2021 forecasting work stream notes
Session: Forecasting/Annotation
Reporter: Taiwo Owoseni
- [name=Markus (he/him)] / PhD student at UCL / t: @mloning_ / gh: @mloning 🐢
- [name=Martin Walter] Senior Data Scientist @ Mercedes-Benz AG
- [name=Lovkush Agarwal] / Data Scientist at Shell / www.lovkush.com / gh: @Lovkush-A 🐢
- [name=Franz Kiraly] / Data Scientist at Shell, hon lecturer at UCL
- [name=Satya (he/him)] / Data Scientist at FICO / linkedin / gh: [@satya-pattnaik]
- [name=Taiwo (She/Her)] / Sktime Intern / t: twitter is suspended in Nigeria : / gh: @thayeylolu
- [name=Guzal (she/her)] / Sktime summer intern with Outreachy / linkedin / gh: @GuzalBulatova
What is the area to work on during the dev days, as a group? Be as specific as you can, without going too much into details.
- finish refactoring of forecaster interface - done = all forecasters in sktime are interface compliant
- "multivariate forecasting" - what's the goal?
- interface design for multivariate
- "automatic" handling of different input/output types
- interfacing concrete multivariate forecasters like VARIMAX
- Multivariate pipelining (already work in progress), most important compositors (tuning etc) too
- annotation design and prototype finished, should include multivariate and panel (!)
- example annotators implemented in prototype
- implement pyOD wrapper
Only issues that already exist.
- Finish Refactoring Forecasting Interface 👍
- multivariate interface with base functionality for multiple input/output types https://github.com/alan-turing-institute/sktime/pull/980
- Multivariate pipelining: open PR
- multivariate forecasting draft pr, https://github.com/alan-turing-institute/sktime/pull/980
- PyOD Adapter - https://github.com/alan-turing-institute/sktime/issues/798
- VAR and VECM forecaster https://github.com/alan-turing-institute/sktime/issues/929
- refactoring forecaster.update, .update_predict, etc. https://github.com/alan-turing-institute/sktime/issues/982
issues and prs relating to refactoring univariate forecasters:
- refactoring existing forecasters to new interface, central issue https://github.com/alan-turing-institute/sktime/issues/955
- refactoring bunch of forecasters, pr, https://github.com/alan-turing-institute/sktime/pull/977
- CLOSED. refactoring bunch of forecasters (old version of pr 977), pr, https://github.com/alan-turing-institute/sktime/pull/965
- refactor polynomial forecaster, https://github.com/alan-turing-institute/sktime/pull/1003
- refactor hcrystalball forecaster, https://github.com/alan-turing-institute/sktime/pull/1004
- refactor fbprophet forecaster, https://github.com/alan-turing-institute/sktime/pull/1005
- refactoring reducer, pr, https://github.com/alan-turing-institute/sktime/pull/976
enhancement proposals:
- Multivariate forecasting: https://github.com/sktime/enhancement-proposals/pull/4
- I/O checks/conversions: https://github.com/sktime/enhancement-proposals/tree/main/steps/05_scitype_based_IO_checks
doc related:
- forecasting tutorial, ToC https://github.com/alan-turing-institute/sktime/issues/986
- forecasting tutorial, advanced composition and tuning https://github.com/alan-turing-institute/sktime/issues/988
- forecasting documentation re-write. https://github.com/alan-turing-institute/sktime/pull/972
missing/need:
- issue for composition patterns in multivariate forecasting
- issue related to wrappers/composers that turn univariate to multivariate
- issue related to annotation
- issue related to concrete annotators (pyOD already exists)
- annotation sub-case specific: segmentation
- annotation sub-case specific: change point detection
- annotation sub-case specific: outlier detection
- annotation conditional compositors, e.g., conditional removal or series-to-panel (e.g., epoching)
Identify the most important work items, in bullet points. Identify which are crucial dependencies, which are optional. Estimate how much time the work will roughly take. Tentatively put names against the work items.
Think carefully about:
- what is realistic to achieve during the dev sprint (3 days)
- what should go on longer roadmap
- which items are "good first issues", which ones are expert issues
Create a work plan for the week. Prioritize so crucial items are covered. Ensure there are a number of "good first issues" for new community members
Work item | Coordinator |
---|---|
list of forecasters to work on: https://github.com/alan-turing-institute/sktime/issues/955 | Taiwo |
reduction module refactoring sktime.forecasting.compose._reduce.py
|
Taiwo, Lovkush, Markus |
refactor fbprophet, https://github.com/alan-turing-institute/sktime/pull/1005 | Help needed |
forecasting tutorial, advanced composition and tuning https://github.com/alan-turing-institute/sktime/issues/988 | Martin |
Forecasting Refactoring and Progress https://github.com/alan-turing-institute/sktime/issues/1007 | Taiwo |
Examplary refactoring: https://github.com/alan-turing-institute/sktime/pull/953
Work item | Coordinator |
---|---|
finish PR for annotation framework (base annotator and unit testing) | Satya |
finish PR for PyOD wrapper | Satya |
add unit test for PyOD wrapper | Satya |
handle PyOD as soft dependency | Satya |
annotation data container designs | Franz |
unsupervised segmentation | Franz |
interfacing basic segmenters: hmmlearn etc | Franz |
supervised segmentation/annotation | Franz |
alignment & distances? | Franz |
need to ensure compatibility between outlier detection, different tasks, and new forecasting interface
- conditional on univariate forecasting refactoring
Work item | Coordinator |
---|---|
API design: multivariate interface with base functionality for multiple input/output types https://github.com/alan-turing-institute/sktime/pull/980 | Franz |
group existing forecasting functionality into whether it can be easily extended to handle multivariate series or not | Lovkush (or Markus?) |
"obvious" conversion wrappers like "apply-per-variable" | |
Multivariate Pipelining (ForecastingPipeline ) |
Martin |
interfaces for new multivariate forecasters | Lovkush (or Markus?) |
The reporter should prepare a quick summary of the above.
Markdown is perfectly fine here, but can also be PowerPoint or Paint.
Turn the high-level work plan into issues!
Write descriptive issue descriptions, with a clear definition of "done".
Consider using "checkbox items" to create sub-tasks - i.e., use -[]
in the issue description.
Consider using a project board (but don't overcomplicate it) or linking the issue to an existing board.
Add issue tags.
Tracking which forecasters have been started; there should be a PR for the ones that are ticked
-
NaiveForecasters
#953 -
EnsembleForecaster
#977 -
MultiplexForecaster
#977 -
TransformedTargetForecaster
#977 -
_Reducer
and related code #1031 -
StackingForecaster
#977 -
ForecastingGridSearchCV
,RandomizedGridSearchCV
, andBaseGridSearch
-
OnlineEnsembleForecaster
and descendants #1015 -
_PmdArimaAdapter
and descendantsARIMA
,AutoARIMA
#1016 - adapter only -
BATS
,TBATS
, and_Tbatsadapter
#1017 adapter only -
_StatsModelAdapter
and descendantsAutoETS
,ExponentialSmoothing
,ThetaForecaster
-
HCrystalBallForecaster
#1004 -
Prophet
and_ProphetAdapter
#1005 -
PolynomialTrendForecaster
#1003
Tracking which forecasters have been finished; there should be a closed PR for the ones that are ticked
-
NaiveForecasters
#953 -
EnsembleForecaster
-
MultiplexForecaster
-
TransformedTargetForecaster
-
_Reducer
and related code -
StackingForecaster
-
ForecastingGridSearchCV
,RandomizedGridSearchCV
, andBaseGridSearch
-
OnlineEnsembleForecaster
and descendants -
_PmdArimaAdapter
and descendantsARIMA
,AutoARIMA
-
BATS
,TBATS
, and_Tbatsadapter
-
_StatsModelAdapter
and descendantsAutoETS
,ExponentialSmoothing
,ThetaForecaster
-
HCrystalBallForecaster
-
Prophet
and_ProphetAdapter
-
PolynomialTrendForecaster
:::info
Example: (sklearn pipeline) Create issues on GitHub about:
- Implementing
ColumnTransformer
- Implementing
FeatureUnion
- Updating existing pipeline class
- Update docs :::
- tags discussion - Franz, Markus, Martin, Taiwo, Tony
- use of tags - semantic/indexing lookup for user, or only internal/testing?
- importance in checks and conversions (e.g., what to)
- input validity & related properties
- algorithmic features/properties, availability of interface points e.g., can produce performance estimates
- writing common tests for estimators with the same tags
- user guidance
- inheritance and default values
- if inheritance: child classes to specify all tags, or only tags deviating from default? (FK: perhaps all, more robust against changes to defaults)
- inheritance yes, and override only non-defaults
- tests for robustness
- object or class level
- both for long-term; short/mid-term, focus on class level
- object and class level should both be inspectable
- user wants to be shown object level one typically
- documentation of tags - where, how?
- fixed description of tags, but where?
- ALL_TAGS to be factored out and supplemented by plain english descriptions; docs auto-generated from that
- how do we agree on which tags?
- that file has codeowners, it's us
- "meta-tags", e.g., which tags apply to which scitypes?
- dataframe with three columns? tag name, list of scitypes, plain English description?
- maybe structured strings in tags, like
"forecaster:supports_exogeneous_X"
; some might be generic, like"handles_missing_data"
or"multivariate"
(?)
- forecaster tags - boolean?
- can have non-boolean, but we need to carefully watch testing
- which tags should we have? for forecasters?
- lookup of tags,
all_tags
likeall_estimators
? by scitype?- yes, and is easy with the table above
- lookup of estimators based on tags?
- ML: yes should be easy to implement via an additional filtering of the list of collected estimators
- display of estimators with tags, #995, #996
- great idea, should be autogenerated from above
- tag refactor? #1013
- use of tags - semantic/indexing lookup for user, or only internal/testing?
(written by LA, and so represents their perspective/their biases) Discussion focussed on this PR https://github.com/alan-turing-institute/sktime/pull/980.
Points of agreement:
- IO type conversions is fundamentally separate to multivariate forecasting (though they are related)
- We should have some sort of wrapper that converts univariate forecasters into multivariate forecasters. But how precisely this is done needs to be determined
- (Thought by LA later, this sounds like a reduction, of multivariate forecasting to univariate forecasting. This suggests a reduction style interface, where user creates multivariate forecaster using reduction function and a univariate forecaster object. Separate to this, can have reductions from multivariate forecasting to tabular regression)
- I think there is agreement that it is bad idea to re-write all forecasters so that they only use methods that can deal with both pandas dataframes and pandas series.
- Note that even in the example Markus showed, there was if loop to distinguish between dataframe and series in one of the imputations
ML's concerns:
- IO type conversion is significant change, and wants more time for him (and others) to fully think through the consequences.
- E.g. how large is the cost of the various conversions?
- FK's counter to cost concerns. FK says the only time IO type conversion creates extra cost is if otherwise there is an error message.
- E.g. philosophy of allowing multiple internal types, rather than sticking to some single internal type
- E.g. how large is the cost of the various conversions?
- we want to create some sort of multivariate functionality quickly, quicker than the time needed to consider the IO type conversion question
FK's concerns:
- There is risk of delaying decision on input IO type conversion if we do not make decision now or if we do not have explicit plan for decision to be made
- Dev days week is best time for FK to spend time on sktime and on big tasks, so quicker decision will allow FK to do more
Ryan's thoughts:
- IO type conversion is elegant, but needs time to think through consequences
- Ryan: Not sure how big the intersection of multivariate and univariate algorithms is:
- believes that majority of univariate forecasters will become multivariate by doing things column-by-column (i.e. via the reduction wrapper) b/c optimal hyper-params will vary by series (column).
- The multivariate algorithms that can be applied to univariate data typically do so when that algorithm can simplify to a univariate model.
- Are there alot of models where we wouldn't have a univariate model already?
- What is the cost of raising an informative message to point users to correct univariate implementation?
LA's thoughts:
- Should pause on IO type conversion to give MK (and others) time to consider the consequences
- Believes that IO type conversions is a good idea, and should be implemented at some point.
- (Variable names can be improved in the PR)
- Didn't say this during discussion, but I don't think writing the multivariate wrapper is a quick task - lots of design decisions need to be made for it.
Guzal's thoughts:
- Likes the logic of FK's IO conversion, but doesn't know enough to judge if ML's concerns are valid or not
Taiwo:
- No thoughts on these issues