Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Ensembles works incorrectly with HierarchicalPipeline #167

Closed
1 task done
alex-hse-repository opened this issue Nov 30, 2023 · 0 comments · Fixed by #177
Closed
1 task done

[BUG] Ensembles works incorrectly with HierarchicalPipeline #167

alex-hse-repository opened this issue Nov 30, 2023 · 0 comments · Fixed by #177
Assignees
Labels
bug Something isn't working

Comments

@alex-hse-repository
Copy link
Collaborator

alex-hse-repository commented Nov 30, 2023

🐛 Bug Report

Ensembles fails to works with HierarchicalPipeline, they lose hierarchical structure during forecast and don't use hierarchical structure in backtest

Expected behavior

You can run fit/forecast/backtest without errors and losing hierarchical structure

How To Reproduce

Code

import pandas as pd
from etna.metrics import SMAPE
from etna.models import SeasonalMovingAverageModel
from etna.pipeline import HierarchicalPipeline
from etna.ensembles import VotingEnsemble
from etna.datasets import TSDataset
from etna.reconciliation import BottomUpReconciliator


# Download data 'curl "https://robjhyndman.com/data/hier1_with_names.csv" --ssl-no-revoke -o "hier1_with_names.csv"'

df = pd.read_csv("hier1_with_names.csv")

# Prepare Dataframe

periods = len(df)
city_segments = list(filter(lambda name: name.count("-") == 2, df.columns))

df["timestamp"] = pd.date_range("2006-01-01", periods=periods, freq="MS")
df.set_index("timestamp", inplace=True)

hierarchical_df = []
for segment_name in city_segments:
    segment = df[segment_name]
    region, reason, city = segment_name.split(" - ")

    seg_df = pd.DataFrame(
        data={
            "timestamp": segment.index,
            "target": segment.values,
            "city_level": [city] * periods,
            "region_level": [region] * periods,
            "reason_level": [reason] * periods,
        },
    )
    hierarchical_df.append(seg_df)

hierarchical_df = pd.concat(hierarchical_df, axis=0)


# Create Dataset
hierarchical_df, hierarchical_structure = TSDataset.to_hierarchical_dataset(
    df=hierarchical_df, level_columns=["reason_level", "region_level", "city_level"]
)
hierarchical_ts = TSDataset(df=hierarchical_df, freq="MS", hierarchical_structure=hierarchical_structure)


# Create pipeline
pipeline_1 = HierarchicalPipeline(
    model=SeasonalMovingAverageModel(window=1, seasonality=1),
    reconciliator=BottomUpReconciliator(target_level="region_level", source_level="city_level"),
)
pipeline_2 = HierarchicalPipeline(
    model=SeasonalMovingAverageModel(window=1, seasonality=2),
    reconciliator=BottomUpReconciliator(target_level="region_level", source_level="city_level"),
)
pipeline_vote = VotingEnsemble(pipelines=[pipeline_1, pipeline_2])

bottom_up_metrics, _, _ = pipeline_vote.backtest(
    ts=hierarchical_ts, metrics=[SMAPE()], n_folds=3, aggregate_metrics=True
)

Error

ValueError: There are segments in y_pred that are not in y_true, for example: vfr_NT, vfr_QLD, hol_WA, hol_QLD, vfr_VIC

Environment

No response

Additional context

  1. We have special backtest for hierarchical pipeline, it is not taken into account in ensembles
  2. We lose hierarchical structure here in VotingEnsemble

Checklist

  • Bug appears at the latest library version
@alex-hse-repository alex-hse-repository added the bug Something isn't working label Nov 30, 2023
@alex-hse-repository alex-hse-repository self-assigned this Nov 30, 2023
@alex-hse-repository alex-hse-repository moved this from New to Todo in etna board Nov 30, 2023
@alex-hse-repository alex-hse-repository moved this from Todo to In Progress in etna board Dec 5, 2023
@martins0n martins0n moved this from In Progress to In Review in etna board Dec 6, 2023
@github-project-automation github-project-automation bot moved this from In Review to Done in etna board Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant