sktime dev days 2021 forecasting work stream notes

sktime 2021 dev days - break-out session notes

Session: Forecasting/Annotation

Reporter: Taiwo Owoseni

👋 Roll call

[name=Markus (he/him)] / PhD student at UCL / t: @mloning_ / gh: @mloning 🐢
[name=Martin Walter] Senior Data Scientist @ Mercedes-Benz AG
[name=Lovkush Agarwal] / Data Scientist at Shell / www.lovkush.com / gh: @Lovkush-A 🐢
[name=Franz Kiraly] / Data Scientist at Shell, hon lecturer at UCL
[name=Satya (he/him)] / Data Scientist at FICO / linkedin / gh: [@satya-pattnaik]
[name=Taiwo (She/Her)] / Sktime Intern / t: twitter is suspended in Nigeria : / gh: @thayeylolu
[name=Guzal (she/her)] / Sktime summer intern with Outreachy / linkedin / gh: @GuzalBulatova

💡 Define the workstream scope (10min)

What is the area to work on during the dev days, as a group? Be as specific as you can, without going too much into details.

Univariate forecasting

finish refactoring of forecaster interface - done = all forecasters in sktime are interface compliant

Multivariate forecasting

"multivariate forecasting" - what's the goal?
interface design for multivariate
"automatic" handling of different input/output types
interfacing concrete multivariate forecasters like VARIMAX
Multivariate pipelining (already work in progress), most important compositors (tuning etc) too

Annotation

annotation design and prototype finished, should include multivariate and panel (!)
example annotators implemented in prototype
implement pyOD wrapper

docs as we go along!

🚧 Collect related issues (15min)

Only issues that already exist.

Finish Refactoring Forecasting Interface 👍
multivariate interface with base functionality for multiple input/output types https://github.com/alan-turing-institute/sktime/pull/980
Multivariate pipelining: open PR
multivariate forecasting draft pr, https://github.com/alan-turing-institute/sktime/pull/980
PyOD Adapter - https://github.com/alan-turing-institute/sktime/issues/798
VAR and VECM forecaster https://github.com/alan-turing-institute/sktime/issues/929
refactoring forecaster.update, .update_predict, etc. https://github.com/alan-turing-institute/sktime/issues/982

issues and prs relating to refactoring univariate forecasters:

refactoring existing forecasters to new interface, central issue https://github.com/alan-turing-institute/sktime/issues/955
refactoring bunch of forecasters, pr, https://github.com/alan-turing-institute/sktime/pull/977
CLOSED. refactoring bunch of forecasters (old version of pr 977), pr, https://github.com/alan-turing-institute/sktime/pull/965
refactor polynomial forecaster, https://github.com/alan-turing-institute/sktime/pull/1003
refactor hcrystalball forecaster, https://github.com/alan-turing-institute/sktime/pull/1004
refactor fbprophet forecaster, https://github.com/alan-turing-institute/sktime/pull/1005
refactoring reducer, pr, https://github.com/alan-turing-institute/sktime/pull/976

enhancement proposals:

Multivariate forecasting: https://github.com/sktime/enhancement-proposals/pull/4
I/O checks/conversions: https://github.com/sktime/enhancement-proposals/tree/main/steps/05_scitype_based_IO_checks

doc related:

forecasting tutorial, ToC https://github.com/alan-turing-institute/sktime/issues/986
forecasting tutorial, advanced composition and tuning https://github.com/alan-turing-institute/sktime/issues/988
forecasting documentation re-write. https://github.com/alan-turing-institute/sktime/pull/972

missing/need:

issue for composition patterns in multivariate forecasting
issue related to wrappers/composers that turn univariate to multivariate
issue related to annotation
issue related to concrete annotators (pyOD already exists)
annotation sub-case specific: segmentation
annotation sub-case specific: change point detection
annotation sub-case specific: outlier detection
annotation conditional compositors, e.g., conditional removal or series-to-panel (e.g., epoching)

🔍 High-level work plan (20min)

Identify the most important work items, in bullet points. Identify which are crucial dependencies, which are optional. Estimate how much time the work will roughly take. Tentatively put names against the work items.

Think carefully about:

what is realistic to achieve during the dev sprint (3 days)
what should go on longer roadmap
which items are "good first issues", which ones are expert issues

Create a work plan for the week. Prioritize so crucial items are covered. Ensure there are a number of "good first issues" for new community members

Univariate forecasting

Work item	Coordinator
list of forecasters to work on: https://github.com/alan-turing-institute/sktime/issues/955	Taiwo
reduction module refactoring `sktime.forecasting.compose._reduce.py`	Taiwo, Lovkush, Markus
refactor fbprophet, https://github.com/alan-turing-institute/sktime/pull/1005	Help needed
forecasting tutorial, advanced composition and tuning https://github.com/alan-turing-institute/sktime/issues/988	Martin
Forecasting Refactoring and Progress https://github.com/alan-turing-institute/sktime/issues/1007	Taiwo

Examplary refactoring: https://github.com/alan-turing-institute/sktime/pull/953

Annotation

Work item	Coordinator
finish PR for annotation framework (base annotator and unit testing)	Satya
finish PR for PyOD wrapper	Satya
add unit test for PyOD wrapper	Satya
handle PyOD as soft dependency	Satya
annotation data container designs	Franz
unsupervised segmentation	Franz
interfacing basic segmenters: hmmlearn etc	Franz
supervised segmentation/annotation	Franz
alignment & distances?	Franz

need to ensure compatibility between outlier detection, different tasks, and new forecasting interface

Multivariate forecasting

conditional on univariate forecasting refactoring

Work item	Coordinator
API design: multivariate interface with base functionality for multiple input/output types https://github.com/alan-turing-institute/sktime/pull/980	Franz
group existing forecasting functionality into whether it can be easily extended to handle multivariate series or not	Lovkush (or Markus?)
"obvious" conversion wrappers like "apply-per-variable"
Multivariate Pipelining (`ForecastingPipeline`)	Martin
interfaces for new multivariate forecasters	Lovkush (or Markus?)

📝 Prepare the report-out (10min)

The reporter should prepare a quick summary of the above.

Markdown is perfectly fine here, but can also be PowerPoint or Paint.

🔧 Create issues (15min, can also be done later & iteratively)

Turn the high-level work plan into issues!

Write descriptive issue descriptions, with a clear definition of "done".

Consider using "checkbox items" to create sub-tasks - i.e., use -[] in the issue description.

Consider using a project board (but don't overcomplicate it) or linking the issue to an existing board.

Add issue tags.

Tracking for refactoring of forecasters

Tracking which forecasters have been started; there should be a PR for the ones that are ticked

Tracking which forecasters have been finished; there should be a closed PR for the ones that are ticked

:::info

Example: (sklearn pipeline) Create issues on GitHub about:

Implementing ColumnTransformer
Implementing FeatureUnion
Updating existing pipeline class
Update docs :::

2021-06-23 discussion points

tags discussion - Franz, Markus, Martin, Taiwo, Tony
- use of tags - semantic/indexing lookup for user, or only internal/testing?
  - importance in checks and conversions (e.g., what to)
  - input validity & related properties
  - algorithmic features/properties, availability of interface points e.g., can produce performance estimates
  - writing common tests for estimators with the same tags
  - user guidance
- inheritance and default values
- if inheritance: child classes to specify all tags, or only tags deviating from default? (FK: perhaps all, more robust against changes to defaults)
  - inheritance yes, and override only non-defaults
  - tests for robustness
- object or class level
  - both for long-term; short/mid-term, focus on class level
  - object and class level should both be inspectable
  - user wants to be shown object level one typically
- documentation of tags - where, how?
  - fixed description of tags, but where?
  - ALL_TAGS to be factored out and supplemented by plain english descriptions; docs auto-generated from that
- how do we agree on which tags?
  - that file has codeowners, it's us
- "meta-tags", e.g., which tags apply to which scitypes?
  - dataframe with three columns? tag name, list of scitypes, plain English description?
  - maybe structured strings in tags, like "forecaster:supports_exogeneous_X"; some might be generic, like "handles_missing_data" or "multivariate" (?)
- forecaster tags - boolean?
  - can have non-boolean, but we need to carefully watch testing
- which tags should we have? for forecasters?
  - https://github.com/alan-turing-institute/sktime/issues/957
- lookup of tags, all_tags like all_estimators? by scitype?
  - yes, and is easy with the table above
- lookup of estimators based on tags?
  - ML: yes should be easy to implement via an additional filtering of the list of collected estimators
- display of estimators with tags, #995, #996
  - great idea, should be autogenerated from above
- tag refactor? #1013

2021-06-24 discussion points on type-conversions and multivariate forecasting

(written by LA, and so represents their perspective/their biases) Discussion focussed on this PR https://github.com/alan-turing-institute/sktime/pull/980.

Points of agreement:

IO type conversions is fundamentally separate to multivariate forecasting (though they are related)
We should have some sort of wrapper that converts univariate forecasters into multivariate forecasters. But how precisely this is done needs to be determined
- (Thought by LA later, this sounds like a reduction, of multivariate forecasting to univariate forecasting. This suggests a reduction style interface, where user creates multivariate forecaster using reduction function and a univariate forecaster object. Separate to this, can have reductions from multivariate forecasting to tabular regression)
I think there is agreement that it is bad idea to re-write all forecasters so that they only use methods that can deal with both pandas dataframes and pandas series.
- Note that even in the example Markus showed, there was if loop to distinguish between dataframe and series in one of the imputations

ML's concerns:

IO type conversion is significant change, and wants more time for him (and others) to fully think through the consequences.
- E.g. how large is the cost of the various conversions?
  - FK's counter to cost concerns. FK says the only time IO type conversion creates extra cost is if otherwise there is an error message.
- E.g. philosophy of allowing multiple internal types, rather than sticking to some single internal type
we want to create some sort of multivariate functionality quickly, quicker than the time needed to consider the IO type conversion question

FK's concerns:

There is risk of delaying decision on input IO type conversion if we do not make decision now or if we do not have explicit plan for decision to be made
Dev days week is best time for FK to spend time on sktime and on big tasks, so quicker decision will allow FK to do more

Ryan's thoughts:

IO type conversion is elegant, but needs time to think through consequences
Ryan: Not sure how big the intersection of multivariate and univariate algorithms is:
- believes that majority of univariate forecasters will become multivariate by doing things column-by-column (i.e. via the reduction wrapper) b/c optimal hyper-params will vary by series (column).
- The multivariate algorithms that can be applied to univariate data typically do so when that algorithm can simplify to a univariate model.
  - Are there alot of models where we wouldn't have a univariate model already?
  - What is the cost of raising an informative message to point users to correct univariate implementation?

LA's thoughts:

Should pause on IO type conversion to give MK (and others) time to consider the consequences
Believes that IO type conversions is a good idea, and should be implemented at some point.
(Variable names can be improved in the PR)
Didn't say this during discussion, but I don't think writing the multivariate wrapper is a quick task - lots of design decisions need to be made for it.

Guzal's thoughts:

Likes the logic of FK's IO conversion, but doesn't know enough to judge if ML's concerns are valid or not

Taiwo:

No thoughts on these issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sktime dev days 2021 forecasting work stream notes

sktime 2021 dev days - break-out session notes

👋 Roll call

💡 Define the workstream scope (10min)

Univariate forecasting

Multivariate forecasting

Annotation

docs as we go along!

🚧 Collect related issues (15min)

🔍 High-level work plan (20min)

Univariate forecasting

Annotation

Multivariate forecasting

📝 Prepare the report-out (10min)

🔧 Create issues (15min, can also be done later & iteratively)

Tracking for refactoring of forecasters

2021-06-23 discussion points

2021-06-24 discussion points on type-conversions and multivariate forecasting

Clone this wiki locally