Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix short data CV #70

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

MichalChromcak
Copy link
Collaborator

@MichalChromcak MichalChromcak commented Apr 1, 2022

In certain cases, the current behaviour of the scorer stores the cv_data wrongly (see the split numbers). Further plotting and evaluation functionality is partially affected by that.

This MR aims to fix this.

Example:

import pandas as pd
from sklearn.linear_model import LinearRegression

df = pd.DataFrame(
    {"target":list(range(7))}, 
    index=pd.date_range(start="2021-03-31", end="2021-09-30", freq="M")
)
target
2021-03-310
2021-04-301
2021-05-312
2021-06-303
2021-07-314
2021-08-315
2021-09-306
ms = ModelSelector(
    horizon=1,
    frequency="M",
)
ms.create_gridsearch(
    sklearn_models=False,
    n_splits=4,
    between_split_lag=None,
    sklearn_models_optimize_for_horizon=False,
    autosarimax_models=False,
    prophet_models=False,
    tbats_models=False,
    exp_smooth_models=False,
    average_ensembles=False,
    stacking_ensembles=False,
)
ms.add_model_to_gridsearch(get_sklearn_wrapper(LinearRegression, name="linreg_3", lags=3))
ms.add_model_to_gridsearch(get_sklearn_wrapper(LinearRegression, name="linreg_1", lags=1))

ms.select_model(
    df=df,
    target_col_name="target",
)

print(ms.results[0].cv_data)
splity_trueb80cee186b053880a84ec8d7c4692365e474b1ddba8a0a6f849b49abf903a4e3
2021-07-3104.03.03.0
2021-08-3115.05.04.0
2021-09-3026.06.05.0
2021-06-3003.0NaN3.0

@MichalChromcak MichalChromcak self-assigned this Apr 1, 2022
@MichalChromcak
Copy link
Collaborator Author

@pavelkrizek FYI

@codecov-commenter
Copy link

codecov-commenter commented Apr 2, 2022

Codecov Report

Merging #70 (24bbfb3) into master (11166bd) will decrease coverage by 0.06%.
The diff coverage is 90.24%.

@@            Coverage Diff             @@
##           master      #70      +/-   ##
==========================================
- Coverage   93.79%   93.73%   -0.07%     
==========================================
  Files          56       56              
  Lines        2853     2888      +35     
==========================================
+ Hits         2676     2707      +31     
- Misses        177      181       +4     
Impacted Files Coverage Δ
src/hcrystalball/wrappers/_sklearn.py 94.80% <ø> (ø)
src/hcrystalball/metrics/_scorer.py 90.66% <78.94%> (-4.50%) ⬇️
tests/unit/metrics/test_scorer.py 93.93% <100.00%> (+1.73%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update eeb32b9...24bbfb3. Read the comment docs.

@pavelkrizek
Copy link
Contributor

@MichalChromcak Thanks for catching and fixing the bug! Everything is good from my side, just the function results_to_cv_data is quite complex and it's hard to see what exactly is happening there, so a more descriptive docstring would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants