Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ProphetModel to handle external timestamp #203

Merged
merged 12 commits into from Dec 26, 2023
Merged

Conversation

d-a-bunin
Copy link
Collaborator

@d-a-bunin d-a-bunin commented Dec 20, 2023

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Proposed Changes

Look #186.

Closing issues

Closes #186.

# Conflicts:
#	tests/test_models/test_inference/test_forecast.py
#	tests/test_models/test_inference/test_predict.py
@d-a-bunin d-a-bunin self-assigned this Dec 20, 2023
Copy link

github-actions bot commented Dec 20, 2023

🚀 Deployed on https://deploy-preview-203--etna-docs.netlify.app

@github-actions github-actions bot temporarily deployed to pull request December 20, 2023 15:07 Inactive
Copy link

codecov bot commented Dec 20, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ No coverage uploaded for pull request base (unaligned-data@4f3afd5). Click here to learn what that means.

Additional details and impacted files
@@                Coverage Diff                @@
##             unaligned-data     #203   +/-   ##
=================================================
  Coverage                  ?   89.88%           
=================================================
  Files                     ?      198           
  Lines                     ?    13231           
  Branches                  ?        0           
=================================================
  Hits                      ?    11893           
  Misses                    ?     1338           
  Partials                  ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot temporarily deployed to pull request December 20, 2023 16:22 Inactive
if not pd.api.types.is_datetime64_dtype(df[self.timestamp_column]):
raise ValueError("Invalid timestamp_column! Only datetime type is supported.")

if len(df[self.timestamp_column]) >= 3 and pd.infer_freq(df[self.timestamp_column]) is None:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't check that frequency is always the same. For example, it works fine if we have one frequency for train and for test. In theory, prophet can work fine even if there no regular frequency, but I'm not sure should we support this case or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could probably infer frequency during train, for example, and check if it is as expected.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should check the freq and give a warning if the freq for the train is different from the freq for the test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont like the idea of warning. It will be thrown in every per-segment model, which seems too much.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a bad idea to have different frequencies, so the only option is to fail in this situation.

# Conflicts:
#	CHANGELOG.md
#	tests/test_models/test_inference/test_forecast.py
#	tests/test_models/test_inference/test_predict.py
@github-actions github-actions bot temporarily deployed to pull request December 21, 2023 15:38 Inactive
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we don't add this code to tests/conftest.py ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, it seems specific for inference tests. I'll think about moving it higher.

@@ -15,14 +16,26 @@
from tests.test_models.utils import assert_sampling_is_valid


@pytest.fixture
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code duplication (like in tests/test_models/test_inference/conftest.py)

@martins0n
Copy link
Collaborator

@d-a-bunin What do you think if Prophet would pick timestamp implicitly?
We can choose timestamp column by default. If timestamp column is of type int, we should choose the next timestamp typed column. If there are multiple timestamp typed columns - choose the first one.

And we should specify this implicit logic

@d-a-bunin
Copy link
Collaborator Author

Where do you think this could be useful? I can see it useful for working with pre-defined models on different datasets, e.g. in auto-ml there set of configs is pre-defined.

  • For now, it seems implicit, and I don't really like it. We could probably add such behavior in the future with some special parameter.
  • It still breaks if we have multiple timestamp columns.
  • We don't have such logic for DateFlagsTransform, TimeFlagsTransform and HolidayTransform.
  • I don't know how should we make auto-ml configs work in all those conditions.

We could make smth simple (like it works now) and easy-to-improve and then improve it according to our needs later.

@martins0n
Copy link
Collaborator

martins0n commented Dec 25, 2023

  • It still breaks if we have multiple timestamp columns.

No, if we choose random one

Where do you think this could be useful?

It's just simpler to support.

You've changed a lot of code here. Do we really want to change so many places just for supporting corner case?

We don't have such logic for DateFlagsTransform, TimeFlagsTransform and HolidayTransform.

Because we already have similiar beahivour for other transforms

@d-a-bunin
Copy link
Collaborator Author

No, if we choose random one

It works unpredictable, I don't think it is a good idea. Moreover, you suggested selecting the first one in the first message.

It's just simpler to support.

I don't think so. We have to make the same code changes + add logic for automatic column detection, which selects timestamp_column automatically instead of manually. I think that if we are going to provide some automatic selection we should also provide ability of a manual selection.

Current solution could be extended into automatic in the future by adding, e.g. parameter timestamp_mode with two possible values: "auto", "manual" with default "manual" to save current behavior.

@github-actions github-actions bot temporarily deployed to pull request December 26, 2023 07:36 Inactive
@github-actions github-actions bot temporarily deployed to pull request December 26, 2023 08:39 Inactive
@github-actions github-actions bot temporarily deployed to pull request December 26, 2023 08:59 Inactive
@d-a-bunin d-a-bunin merged commit 1e9cb9d into unaligned-data Dec 26, 2023
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants