Tests to run and document and list for a forecaster #4762

yarnabrina · 2023-06-25T07:39:53Z

yarnabrina
Jun 25, 2023
Collaborator

Let's consider a forecasting algorithm ABC.

As an user, I expect ABC to support the following types of data:

univariate endogenous variable without exogenous variables (if scitype:y is not multivariate)
- y as series
- y as single column dataframe
multivariate endogenous variable without exogenous variables
- y as multiple column dataframe
univariate endogenous variable with exogenous variables
- X as series
- X as single column dataframe
- X as multiple column dataframe
multivariate endogenous variable with exogenous variables
- X as single column dataframe
- X as multiple column dataframe

(There may be more types if y_inner_mtype or X_inner_mtype allow. I have skipped those as I have not used myself, and even few of the above may be wrong due to my lack of familiarity.)

Now, for all of these types of data inputs, ABC should be able to do the following:

fit on data from scratch having seen no data before
update on new data on top of existing data
predict mean of endogenous variables
predict intervals/quantiles/etc. of endogenous variables (if capability:pred_int is True)

Now, the main interest of user will be on the prediction methods, and ABC results for these must satisfy the following:

match format of documented contract
- type
- shape
- row indices
- column indices
match results from underlying estimator
- indices
- values

As a user, I think these are the main expectations I have from ABC.

(There are sub-cases, like whether missing values are allowed or not based on handles-missing-data, or whether index of passed data is only indexed by time or indexed by both level(s) and time, or whether passing exogenous is useless based on ignores-exogeneous-X and so on. There are likely other combinations which I have not used myself yet, but skipping those here for the sake of brevity.)

Given these user expectations, as a developer I want to know what functionalities are being tested explicitly and where. So, ideally I am looking for a way to get a list of tests that are applicable for ABC, which should be based on just the tags of ABC and nothing else. And I am also looking for a way to get a list of tests that are missing, which should be based on _config.py only.

As of now, running check_estimator returns a huge dictionary, with keys being <name of test function>[<name of estimator>-<values of fixtures>-<names and values of parameters>].

Personally, I am of the opinion that these names are not really human-friendly. One has to know the test function signature to understand what values correspond to what, then they have to be familiar with the fixture generation process (if that's condional on tags, needs even more familiarity with testing flow), etc.

So, what I am hoping for is a quick and easy way to get the list of applicable, available and missing tests given a forecaster, e.g. ABC. I understand that this "difficulty" may be just mine because of unfamiliarity, in which case this is not worth doing any changes.

But if others feel similarly (not just for forecasters, other estimators as well which I have not used at all myself), may be some chnage in testing structure will help.

I do not have an explicit suggestion on how this can be realised. One way may be to implement all possible tests (superset, and not power set, of all "applicable" tests assuming/hoping tag functionalities are mostly independent) just once. These tests will be parameterised with name and arguments of estimator (or by an instance of estimator) and not by other fixtures.

If this is agreed upon (I am just sharing my spontaneous ideas here, not thought out plans, so it is extremely likely that these are not going to cover all cases), it will be a huge effort that will probably need continuous and serious commitment from at least one developer throughout to ensure code quality and consistency throughout, given the scope of work. This would be very difficult for open source with everyone have other things to do.

fkiraly · 2023-06-25T10:26:14Z

fkiraly
Jun 25, 2023
Maintainer

as a developer I want to know what functionalities are being tested explicitly and where

Currently, the best public way I know is, run check_estimator and get the keys (these are identical with the pytest keys of the tests run).

The other way is to either look into check_estimator, and this information should also be in the test guide. Each estimator has all the tests run in TestAllObjects, TestAllEstimators and TestAll[type_of_estimator] (e.g., TestAllForecasters). That is, all tests in these methods are run for all estimators for a given type, and tags control only fixtures like scenarios, not the set of tests that is executed.

And I am also looking for a way to get a list of tests that are missing

What do you mean by this, can you explain? Missing from where, under what condition?
From the above, there are no tests that are not run for a given estimator, in the three test classes. Of course, TestAllClassifiers is not run for af forecaster etc. But it is always entire classes.

So, what I am hoping for is a quick and easy way to get the list of applicable, available and missing tests given a forecaster

Can you give a definition for these three terms, "applicable", "available", "missing"?

5 replies

yarnabrina Jun 25, 2023
Collaborator Author

Can you give a definition for these three terms, "applicable", "available", "missing"?

All expected functionality of an estimator is part of "applicable". For example, ARIMA should be able to take univariate and multivariate y (and X), and it should be able to handle missing values, and it should be able to predict probabilitic forecasts, and etc.

All tests that are disabled, or simply not implemented, for an estimator is part of "missing". For example, if there is no test for ARIMA to check whether it can handle missing values or not, it's a "missing" test.

available = set difference of missing from applicable

Unless I am very much mistaken, TestAllForecasters test mainly the sktime contracts (from my post above: "match format of documented contract"), not the results values (from above post: "match results from underlying estimator"). There are few to check the score or index, but it's not the majority, and admittedly it possibly can't be ever being generic to every subclass of BaseForecasters. The latter type of tests are covered in test_arimax.py, test_var.py, etc. modules, if implemented for a forecaster. As a result, checking all tests for an estimator seems to be allowed only by pytest --co -k '<estimator name>', and to me personally, these names in their current form with parameters and fixtures are not human friendly, at least not very easily readable by developers not familiar withe terminologies used throughout the base/testing codebase.

As you say, these are generated by pytest and not in sktime's control. I agree with that, and that's why commented about out-of-scope or possibly-infeasible etc. in the first issue. But it remains a serious frustration for me nevertheless not able to check what I want at a quick glance.

fkiraly Jun 26, 2023
Maintainer

I see. Quick question: have you seen the developer guide on the testing framework?
http://www.sktime.net/en/latest/developer_guide/testing_framework.html

Can you critically have a look and be specific about whether/how it does not meet the "information" requirement in the above?

Regarding what you say, can you outline what your expectation would be. For instance, would you expect that check_estimator also collects all tests that are specific to an estimator, rather than specific to an estimator type?

fkiraly Jun 29, 2023
Maintainer

?, @yarnabrina

yarnabrina Jun 30, 2023
Collaborator Author

@fkiraly I didn't get email previously. May be discussions don't send mails without mention? Not sure.

I did read the guide, and may be there is a chance that I am not explaining what I am looking for. The guide is fine for overall documentation, general for all estimators. What I am looking for is a per estimator thing, and I don't think that can be part of the general testing guide.

What really I am looking for (and what I think this may be something others trying on bug fixes may look for) is a list of tests per estimator. Let's consider three functions:

list_applicable_estimator_tests - takes name of estimator, returns list of tests - example for applicability is test dataframe support, test series support, test probabilistic prediction support, etc. for SARIMAX.
list_missing_estimator_tests - takes name of estimator, returns list of tests - example for missing is test for probabilistic predictions in presence of exogenous variables for VECM etc.
list_available_estimator_tests - takes name of estimator, returns list of tests - basically set difference of above two for a given estimator

If I am not wrong, check_estimator run the tests, given an estimator and list of tests (optionally). I am looking for a step before, which gives the developer a chance to see what are present, skip a few that is not of their interest, and then pass rest to check_estimator, which should run those tests, possibly ignoring _config.py becuase specifically requested by developer.

As a developer, I know that even if I just want to check dataframe support, there are possibly subcases, depending on number of columns, type of index, etc. But the main goal to test for an estimator is the dataframe support, and rest is really sub-cases for it for generalisation. If all I have to test something is run check_estimator and see a huge dictionary with keys that are not very readable () without going to the test function itself (and often more beyond just that), personally it's a bit scary. So, if I could have a function to see these are the expected features (these are probably results of estimator tags itself without relying on anything else), I'll find it much easier to go into details only for the cases I want to.

(I do not know if I make sense, and even if I do I understand whatever I am saying is only from my personal perspective, and quite possibly it's not an issue for other core developers. But I have a feeling that a quite a few of users are data scientists or statistics students etc. which do not have a strong coding background to be very familiar with testing standards/approaches or fixtures (which are really pytest) specific, and it may be a bit challenging for them. If you think that this is not significant/common enough situation, that's okay and we can close this discussion.)

fkiraly Jul 1, 2023
Maintainer

Thanks for explaining, @yarnabrina!

I have to say, I'm still trying to understand you idea - which may be well be an issue on my side of the information channel.

Could you kindly, for my benefit, give an as detailed example output as possible for the output of list_applicable_estimator_tests and the two others, for one or two estimators of your choice? Optimally, with multiple (but not all) tests in the output that showcase how the outputs of the thre are different?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests to run and document and list for a forecaster #4762

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Tests to run and document and list for a forecaster #4762

yarnabrina Jun 25, 2023 Collaborator

Replies: 1 comment · 5 replies

fkiraly Jun 25, 2023 Maintainer

yarnabrina Jun 25, 2023 Collaborator Author

fkiraly Jun 26, 2023 Maintainer

fkiraly Jun 29, 2023 Maintainer

yarnabrina Jun 30, 2023 Collaborator Author

fkiraly Jul 1, 2023 Maintainer

yarnabrina
Jun 25, 2023
Collaborator

Replies: 1 comment 5 replies

fkiraly
Jun 25, 2023
Maintainer

yarnabrina Jun 25, 2023
Collaborator Author

fkiraly Jun 26, 2023
Maintainer

fkiraly Jun 29, 2023
Maintainer

yarnabrina Jun 30, 2023
Collaborator Author

fkiraly Jul 1, 2023
Maintainer