Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STL decomposition #907

Merged
merged 17 commits into from
Jul 6, 2022
Merged

STL decomposition #907

merged 17 commits into from
Jul 6, 2022

Conversation

paxcema
Copy link
Contributor

@paxcema paxcema commented Jun 20, 2022

Why

To improve the set of tools available for time series mixers to use if it is deemed useful.

How

This PR does a bunch of stuff, namely:

  • Big refactor in the TS transform step. We now default to using pandas date/time utilities, meaning that indexes are now proper timestamps (multi-index in the case of grouped tasks). This both simplifies the logic and makes it more robust for things like inferring new rows according to the observed series frequencies.
  • Introduces the NeuralTs mixer, which inherits from the vanilla Neural but adds time-series specific logic. This was done to keep specialized procedures separated from the (simpler) classification and regression tasks. JsonAI dispatch has been modified accordingly to make use of this.
  • Available time series groups at training time are now logged and registered inside the StatisticalAnalysis object.
  • Some simplifications to ConcatedEncodedDs, though we should probably remove this abstraction entirely (see Make EncodedDs interface simpler #746).
  • Moves a bunch of TS utilities to lightwood.helpers.ts
  • ts_analysis phase now finds (and fits) optimal deseasonalizer and detrender objects for each series available at train time, for which we use Optuna.
  • The respective methods that mixers can use to leverage these STL blocks are in mixers.helpers.ts
  • LightGBMArray, SkTime, Prophet have been updated to use STL blocks, controllable by JsonAI args if needed. (NOTE: NeuralTs calls the blocks but there's no effect because it uses encoded values inside the DS, this will be addressed in a new PR)
  • Encoder TsNumericEncoder has been simplified to avoid storing the sign, instead simply applying the underlying normalizer. This seems to be especially useful for NeuralTs to achieve improved forecasts, but formal benchmarks are still pending.
  • Modifications to the analysis.nc block to work with all the above.

@paxcema paxcema marked this pull request as ready for review July 6, 2022 22:25
@paxcema paxcema merged commit f3bc6aa into staging Jul 6, 2022
paxcema added a commit that referenced this pull request Jul 7, 2022
Fix #907: covers edge case with no group subset
paxcema added a commit that referenced this pull request Jul 8, 2022
@paxcema paxcema mentioned this pull request Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant