Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] HierarchyEnsembleForecaster for level- or node-wise application of forecasters on panel/hierarchical data #3905

Merged
merged 27 commits into from
Jan 28, 2023

Conversation

VyomkeshVyas
Copy link
Contributor

@VyomkeshVyas VyomkeshVyas commented Dec 8, 2022

Reference Issues/PRs

Fixes #2764

What does this implement/fix? Explain your changes.

HierarchyEnsembleForecaster() aggregates panel-type data and applies different univariate forecasters on the aggregated data by each hierarchical level/node. For aggregation, it employs sktime's bulit-in 'Aggregator' class.

Does your contribution introduce a new dependency? If yes, which one?

No

What should a reviewer concentrate their feedback on?

A reviewer should concentrate their feedback on the forecaster's ability to :-

  • Fit a separate forecaster on each hierarchical level of the panel/hierarchical data, with and without exogenous data.
  • Fit a separate forecaster on each hierarchical node of the panel/hierarchical data, with and without exogenous data.
  • Fit a 'default' forecaster (if passed as argument) on nodes/levels not mentioned in the 'forecasters' argument.
  • Make predictions by each hierarchical level of the fitted aggregated panel data.
  • Make predictions by each hierarchical node of the fitted aggregated panel data.

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors.
  • Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
  • I've added unit tests and made sure they pass locally.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.
For new estimators
  • I've added the estimator to the online documentation.
  • I've updated the existing example notebooks or provided a new one to showcase how my estimator works.

@fkiraly
Copy link
Collaborator

fkiraly commented Dec 10, 2022

neat! let us know when you would like a review

@VyomkeshVyas VyomkeshVyas marked this pull request as ready for review December 13, 2022 20:12
@VyomkeshVyas
Copy link
Contributor Author

neat! let us know when you would like a review

Hi @fkiraly, this PR is ready for review now. Thanks.

@fkiraly
Copy link
Collaborator

fkiraly commented Dec 20, 2022

excellent! Will start the CI and we'll see if anything fails.

@fkiraly
Copy link
Collaborator

fkiraly commented Dec 22, 2022

the doctest is failing, since it checks actual printout against expected.

When you run the line
>>> forecaster.fit(y, fh=[1, 2, 3]),

the printout is HierarchyEnsembleForecaster(

You can catch that by adding the line

HierarchyEnsembleForecaster(...) (like this, with three dots, and without three >), directly after

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, looks good!

Some small things:

  • can you kindly fix the doctest? Explanation is above.
  • there is a merge conflict with the contributors file, kindly fix

@fkiraly
Copy link
Collaborator

fkiraly commented Dec 23, 2022

hm, these look like genuine failures when you are trying to hash your node dict

@fkiraly
Copy link
Collaborator

fkiraly commented Dec 23, 2022

recommendation: run your tests locally!

@fkiraly fkiraly changed the title [ENH] HierarchyEnsembleForecaster() for panel/hierarchical data [ENH] HierarchyEnsembleForecaster for panel/hierarchical data Dec 31, 2022
@VyomkeshVyas
Copy link
Contributor Author

recommendation: run your tests locally!

Its strange but the tests didn't fail locally. Fixed the doctsring error, but still not able to recreate the unhashable type failure.

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 10, 2023

how odd. Should we try to run them again? I'll restart.

There is also a merge conflict in the contributors file, kindly update from main.

@VyomkeshVyas VyomkeshVyas reopened this Jan 10, 2023
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are genuine failures now.

Have you run check_estimator locally and tried to debug?

I would recommend that, see here:
https://www.sktime.org/en/stable/developer_guide/add_estimators.html

@VyomkeshVyas
Copy link
Contributor Author

These are genuine failures now.

Have you run check_estimator locally and tried to debug?

I would recommend that, see here: https://www.sktime.org/en/stable/developer_guide/add_estimators.html

Hi @fkiraly, I have the updated the code and majority of the issues are fixed. But still, some tests (around 12) are failing due to a common error :- ValueError('Length of names must match number of levels in MultiIndex.').

This error is inevitable for a particular test instance, the way the check_estimator tests are designed currently. Why I think so?
I'll give a very brief overview of HierarchyEnsemblerForecaster(). The hier-ensm forecaster first aggregates the data and then fits a separate forecaster either by level or by nodes. For that, it takes three arguments : 'by', 'forecasters' and 'default'.
'by' can be 'level' or 'node', 'forecasters' can be list of tuples (name, BaseForecaster, level/node) or BaseForecaster and 'default' is BaseForecaster (which is None if not specified).

The above error is linked with all the test instances with by = 'node'. Ideally, the length of a particular node being passed in 'forecasters' attributes should be N-1, where N is the levels of multiindex data. (The last level of the data is assumed to be a timepoint index and hence N-1). The way the tests are designed, the hier-ensm forecaster will require different length of nodes being passed in 'forecasters' argument for different category of data (for eg univariate and panel data). Since, nodes can only be specified once for a particular category of data before running the tests, it fails for the other categories.

Could you please give some suggestions on how to handle this issue?

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 22, 2023

@VyomkeshVyas, sorry for the delay in the reply. I was fixing some merge conflicts and doctest errors in this PR so the tests would run through to the failures that you are referring to (hope that's ok). Once I can see the failures, I'll have a look.

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 23, 2023

Could you please give some suggestions on how to handle this issue?

Thanks for the explanation!
I think this is precisely the issue. It's a specific instance where the forecaster requires assumptions on the data format that are stronger than the input contract.

Indeed, as the test framework is designed, all inputs are passed to all forecasters, which in this case upsets the new forecaster, as its parameters need to match the levels.

I see multiple solutions:

  • test the node case separately, not using the default framework. I.e., skip the appropriate framework tests by not including the parameters in get_test_params or skipping tests via tests/_config, and add manual tests instead
  • modify the forecaster so it does something sensible for data that doesn't match the node specification (e.g., nodes not present are simply ignored or similar)
  • an extension to the testing framework or estimators that allows compatibility checks between parameters and data - I have been thinking about this but this would require a bigger design (e.g., STEP), written by me, you or someone else, so it may be overkill for the problem at hand

@VyomkeshVyas
Copy link
Contributor Author

@fkiraly Thanks for the suggestions! That's very helpful.
I have added a new functionality for a test instance when length of individual node being passed mismatches the level of multi-index data. The fix seems to work well as all the tests are passed now but, I would like to have your opinion whether that's a right solution. For example, if I have a data with multi-index (A,B,C,D) with D being the timepoint index and 'forecasters' being passed is ('f', F, [(x, y)] ). Then, the forecaster F will be fitted to the data with multi-index (A, B) == (x, y), which previously would have required (A, B, C) == (x, y, z).

@VyomkeshVyas
Copy link
Contributor Author

I forgot to add aggregation levels in X, y in update and predict functions. I am working on it.

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 24, 2023

i.e., you went with option 2, right?
Makes sense, with the "ignore nodes" option.

I forgot to add aggregation levels in X, y in update and predict functions. I am working on it.

Let us know when you think this is ready.

@VyomkeshVyas
Copy link
Contributor Author

i.e., you went with option 2, right? Makes sense, with the "ignore nodes" option.

yea, with option 2. But, instead I am not ignoring the mismatched node, rather grouping all the nodes which are super set of mismatch node.

I forgot to add aggregation levels in X, y in update and predict functions. I am working on it.

Let us know when you think this is ready.

Its ready now. Thanks again.

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Impressive for a first contribution!

To be frank, when I saw the first version and the specs, I thought, "well this might end in sweat and tears", because it was very ambitious - hierarchical data, _HeterogenousMetaEstimator which is not easy to inherit from, the issue with "data must fit the parameters" which I also didn't fully see how to solve, a docstring which is difficult to write with formal accuracy, etcetera.

But none of that blocked you!
Absolutely impressive!!

Welcome to sktime, @VyomkeshVyas!
Way to make an entrance.

@VyomkeshVyas
Copy link
Contributor Author

@fkiraly Thank you very much !!
It's been a great learning for me and I totally enjoyed it. A big shout out to @ciaran-g for continuous guidance, without which it might actually have ended in "sweat and tears".

@fkiraly
Copy link
Collaborator

fkiraly commented Jan 27, 2023

A big shout out to @ciaran-g for continuous guidance, without which it might actually have ended in "sweat and tears".

Well, that's why sktime is a community of contributors - to help reach the best of one's potential :-)

@fkiraly fkiraly changed the title [ENH] HierarchyEnsembleForecaster for panel/hierarchical data [ENH] HierarchyEnsembleForecaster for level- or node-wise application of forecasters on panel/hierarchical data Jan 28, 2023
@fkiraly fkiraly merged commit eb2ca69 into sktime:main Jan 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ENH] An ensemble type forecaster for panel/hierarchical data
3 participants