Fix short data CV #70

MichalChromcak · 2022-04-01T14:03:36Z

In certain cases, the current behaviour of the scorer stores the cv_data wrongly (see the split numbers). Further plotting and evaluation functionality is partially affected by that.

This MR aims to fix this.

Example:

import pandas as pd
from sklearn.linear_model import LinearRegression

df = pd.DataFrame(
    {"target":list(range(7))}, 
    index=pd.date_range(start="2021-03-31", end="2021-09-30", freq="M")
)

	target
2021-03-31	0
2021-04-30	1
2021-05-31	2
2021-06-30	3
2021-07-31	4
2021-08-31	5
2021-09-30	6

ms = ModelSelector(
    horizon=1,
    frequency="M",
)
ms.create_gridsearch(
    sklearn_models=False,
    n_splits=4,
    between_split_lag=None,
    sklearn_models_optimize_for_horizon=False,
    autosarimax_models=False,
    prophet_models=False,
    tbats_models=False,
    exp_smooth_models=False,
    average_ensembles=False,
    stacking_ensembles=False,
)
ms.add_model_to_gridsearch(get_sklearn_wrapper(LinearRegression, name="linreg_3", lags=3))
ms.add_model_to_gridsearch(get_sklearn_wrapper(LinearRegression, name="linreg_1", lags=1))

ms.select_model(
    df=df,
    target_col_name="target",
)

print(ms.results[0].cv_data)

	split	y_true	b80cee186b053880a84ec8d7c4692365	e474b1ddba8a0a6f849b49abf903a4e3
2021-07-31	0	4.0	3.0	3.0
2021-08-31	1	5.0	5.0	4.0
2021-09-30	2	6.0	6.0	5.0
2021-06-30	0	3.0	NaN	3.0

MichalChromcak · 2022-04-01T14:04:26Z

@pavelkrizek FYI

codecov-commenter · 2022-04-02T07:41:44Z

Codecov Report

Merging #70 (24bbfb3) into master (11166bd) will decrease coverage by 0.06%.
The diff coverage is 90.24%.

@@            Coverage Diff             @@
##           master      #70      +/-   ##
==========================================
- Coverage   93.79%   93.73%   -0.07%     
==========================================
  Files          56       56              
  Lines        2853     2888      +35     
==========================================
+ Hits         2676     2707      +31     
- Misses        177      181       +4

Impacted Files	Coverage Δ
src/hcrystalball/wrappers/_sklearn.py	`94.80% <ø> (ø)`
src/hcrystalball/metrics/_scorer.py	`90.66% <78.94%> (-4.50%)`	⬇️
tests/unit/metrics/test_scorer.py	`93.93% <100.00%> (+1.73%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eeb32b9...24bbfb3. Read the comment docs.

pavelkrizek · 2022-04-04T08:11:19Z

@MichalChromcak Thanks for catching and fixing the bug! Everything is good from my side, just the function results_to_cv_data is quite complex and it's hard to see what exactly is happening there, so a more descriptive docstring would be helpful.

…private

Michal Chromcak added 3 commits April 1, 2022 15:35

Merge branch 'master' of github.com:MichalChromcak/hcrystalball

9c1208a

Merge branch 'master' of github.com:MichalChromcak/hcrystalball

262340e

fix bug in cv data for failing models

0cf8096

MichalChromcak self-assigned this Apr 1, 2022

Michal Chromcak added 5 commits April 1, 2022 16:43

update precommit hooks

0147bf7

add --show-source to flake8 lint in CI

91ee87a

fix split numbers

651a9cf

fix bugs

c72b805

rethink non consequtive missing splits

495dc3c

add test

b9bc6ce

Michal Chromcak added 3 commits April 4, 2022 16:23

update documentation for results_to_cv_data, make results_to_cv_data …

24bbfb3

…private

update endog to X in ARIMA and AutoARIMA

20f12fd

fix bug with np.array vs pd.series in scorer, fix some warnings

83f07cd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix short data CV #70

Fix short data CV #70

MichalChromcak commented Apr 1, 2022 •

edited

Loading

MichalChromcak commented Apr 1, 2022

codecov-commenter commented Apr 2, 2022 •

edited

Loading

pavelkrizek commented Apr 4, 2022

Fix short data CV #70

Are you sure you want to change the base?

Fix short data CV #70

Conversation

MichalChromcak commented Apr 1, 2022 • edited Loading

MichalChromcak commented Apr 1, 2022

codecov-commenter commented Apr 2, 2022 • edited Loading

Codecov Report

pavelkrizek commented Apr 4, 2022

MichalChromcak commented Apr 1, 2022 •

edited

Loading

codecov-commenter commented Apr 2, 2022 •

edited

Loading