Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update documentation with new tutorial about working with misaligned data #288

Merged
merged 7 commits into from Apr 1, 2024

Conversation

d-a-bunin
Copy link
Collaborator

@d-a-bunin d-a-bunin commented Mar 27, 2024

Before submitting (must do checklist)

  • Did you read the contribution guide?
  • Did you update the docs? We use Numpy format for all the methods and classes.
  • Did you write any new necessary tests?
  • Did you update the CHANGELOG?

Proposed Changes

Look #277.

Closing issues

Closes #277.

@d-a-bunin d-a-bunin self-assigned this Mar 27, 2024
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

github-actions bot commented Mar 27, 2024

🚀 Deployed on https://deploy-preview-288--etna-docs.netlify.app

@github-actions github-actions bot temporarily deployed to pull request March 27, 2024 10:04 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 27, 2024 10:38 Inactive
Copy link

codecov bot commented Mar 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.92%. Comparing base (a0e4cb2) to head (af78a0d).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #288   +/-   ##
=======================================
  Coverage   89.91%   89.92%           
=======================================
  Files         200      200           
  Lines       13968    13971    +3     
=======================================
+ Hits        12560    12563    +3     
  Misses       1408     1408           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

docs/source/glossary.rst Outdated Show resolved Hide resolved
docs/source/glossary.rst Outdated Show resolved Hide resolved
docs/source/glossary.rst Outdated Show resolved Hide resolved
docs/source/glossary.rst Show resolved Hide resolved
docs/source/glossary.rst Show resolved Hide resolved
Copy link

review-notebook-app bot commented Mar 28, 2024

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2024-03-28T10:17:33Z
----------------------------------------------------------------

How this logic matches with inference strategies?


d-a-bunin commented on 2024-03-28T11:26:07Z
----------------------------------------------------------------

I don't really get what do you mean under inference strategies here, can you elaborate?

alex-hse-repository commented on 2024-03-29T05:50:23Z
----------------------------------------------------------------

For example we can fit the pipeline on misaligned data and run inference on aligned dataset

d-a-bunin commented on 2024-03-29T07:25:16Z
----------------------------------------------------------------

It doesn't work that simple. Working with misaligned data is implemented by working with integer timestamp, that's all.

You can fit on data with integer timestamp where some segments are misaligned to others (e.g. they are really old) and later make an inference on subset of segments that you want to forecast.

alex-hse-repository commented on 2024-03-29T08:54:07Z
----------------------------------------------------------------

Let's add example for this scenario "pipeline on misaligned data and run inference on aligned dataset"

Copy link

review-notebook-app bot commented Mar 28, 2024

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2024-03-28T10:17:34Z
----------------------------------------------------------------

Which combinations of alignment and "regularity" do we support?

  1. aligned + regular
  2. aligned + irregular
  3. misaligned + regular
  4. misaligned + irregular


d-a-bunin commented on 2024-03-28T11:25:22Z
----------------------------------------------------------------

  • aligned + regular: support natively
  • aligned + irregular: support* using int index + external timestamp, but we don't have any guarantees that it will work fine
  • misaligned + regular: support using int index + external timestamp
  • misaligned + irregular: support* using int index + external timestamp, but we don't have any guarantees that it will work fine

alex-hse-repository commented on 2024-03-29T05:51:49Z
----------------------------------------------------------------

Is it helpful info, how do you think?

Copy link

review-notebook-app bot commented Mar 28, 2024

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2024-03-28T10:17:35Z
----------------------------------------------------------------

We need to visualize somehow that series are now misaligned


Copy link

review-notebook-app bot commented Mar 28, 2024

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2024-03-28T10:17:36Z
----------------------------------------------------------------

  1. May be highlight " integer timestamp"
  2. a special utilities -> special utilities

Copy link

review-notebook-app bot commented Mar 28, 2024

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2024-03-28T10:17:37Z
----------------------------------------------------------------

Don't get why do we need future_steps parameter? Can't we extend any timestamp to infty?


d-a-bunin commented on 2024-03-28T11:15:31Z
----------------------------------------------------------------

These external timestamps are sitting inside df_exog, they are limmited by the size of df_exog. If we want to extend them into infinity we need a different logic.

alex-hse-repository commented on 2024-03-29T05:54:15Z
----------------------------------------------------------------

Looks like parameter which I always set to 100000

d-a-bunin commented on 2024-03-29T07:25:47Z
----------------------------------------------------------------

If you have a lot of memory you are free to go)

Copy link

review-notebook-app bot commented Mar 28, 2024

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2024-03-28T10:17:37Z
----------------------------------------------------------------

We need highlight the blocks with different utilities, may be add subsections


Copy link

review-notebook-app bot commented Mar 28, 2024

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2024-03-28T10:17:38Z
----------------------------------------------------------------

What about model where "we should also pass a parameter timestamp_column to work."? Better to add an example for Prophet


Copy link

review-notebook-app bot commented Mar 28, 2024

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2024-03-28T10:17:39Z
----------------------------------------------------------------

Better to add more examples of such transform


Copy link

review-notebook-app bot commented Mar 28, 2024

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2024-03-28T10:17:40Z
----------------------------------------------------------------

Line #5.    date_flags = DateFlagsTransform(

May be separate this transform from others to highlight that we set the in_column with "external_timestamp"


Copy link

review-notebook-app bot commented Mar 28, 2024

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2024-03-28T10:17:40Z
----------------------------------------------------------------

What about forecasting? I want to forecast misaligned time series and get forecasts with original index, how can I do it?

Can we plot forecasts in original index?


d-a-bunin commented on 2024-03-28T11:22:19Z
----------------------------------------------------------------

You are forecasting the values with integer index, original index is just a feature.

If you have this feature in your forecast you know which value correspond to which timestamp.

If you don't have this feature in your forecast you could use make_timestamp_df_from_alignment to recreate it, I think. I haven't really thought about this problem. Do you have any ideas how should it work for the user?

No, currently we can't draw forecasts with original timestamps.

alex-hse-repository commented on 2024-03-29T05:57:41Z
----------------------------------------------------------------

  1. I don't know the use case of forecasts with misaligned data, I thought we did this track to handle cases where we wan to train on misaligned data and forecast on aligned data
  2. May be we can add such functionality(about plotting forecasts) -- but not sure, see (1)

d-a-bunin commented on 2024-03-29T07:28:04Z
----------------------------------------------------------------

Let's assume we have a data there some segments are really old and doesn't have to be forecasted anymore. We can align them with new segments and fit our pipeline. After that we can use that pipeline to forecast only recent segments that we are working with.

In that scenario we were able to use the patterns that we learnt from old segments.

Copy link

review-notebook-app bot commented Mar 28, 2024

View / edit / reply to this conversation on ReviewNB

alex-hse-repository commented on 2024-03-28T10:17:41Z
----------------------------------------------------------------

May be we can add an example here?


Copy link
Collaborator Author

These external timestamps are sitting inside df_exog, they are limmited by the size of df_exog. If we want to extend them into infinity we need a different logic.


View entire conversation on ReviewNB

Copy link
Collaborator Author

You are forecasting the values with integer index, original index is just a feature.

If you have this feature in your forecast you know which value correspond to which timestamp.

If you don't have this feature in your forecast you could use make_timestamp_df_from_alignment to recreate it, I think. I haven't really thought about this problem. Do you have any ideas how should it work for the user?

No, currently we can't draw forecasts with original timestamps.


View entire conversation on ReviewNB

Copy link
Collaborator Author

  • aligned + regular: support natively
  • aligned + irregular: support* using int index + external timestamp, but we don't have any guarantees that it will work fine
  • misaligned + regular: support using int index + external timestamp
  • misaligned + irregular: support* using int index + external timestamp, but we don't have any guarantees that it will work fine


View entire conversation on ReviewNB

Copy link
Collaborator Author

I don't really get what do you mean under inference strategies here, can you elaborate?


View entire conversation on ReviewNB

@github-actions github-actions bot temporarily deployed to pull request March 28, 2024 13:49 Inactive
Copy link
Collaborator

For example we can fit the pipeline on misaligned data and run inference on aligned dataset


View entire conversation on ReviewNB

Copy link
Collaborator

Is it helpful info, how do you think?


View entire conversation on ReviewNB

Copy link
Collaborator

Looks like parameter which I always set to 100000


View entire conversation on ReviewNB

Copy link
Collaborator

  1. I don't know the use case of forecasts with misaligned data, I thought we did this track to handle cases where we wan to train on misaligned data and forecast on aligned data
  2. May be we can add such functionality(about plotting forecasts) -- but not sure, see (1)

View entire conversation on ReviewNB

Copy link
Collaborator

@alex-hse-repository alex-hse-repository left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments

Copy link
Collaborator Author

It doesn't work that simple. Working with misaligned data is implemented by working with integer timestamp, that's all.

You can fit on data with integer timestamp where some segments are misaligned to others (e.g. they are really old) and later make an inference on subset of segments that you want to forecast.


View entire conversation on ReviewNB

Copy link
Collaborator Author

If you have a lot of memory you are free to go)


View entire conversation on ReviewNB

Copy link
Collaborator Author

Let's assume we have a data there some segments are really old and doesn't have to be forecasted anymore. We can align them with new segments and fit our pipeline. After that we can use that pipeline to forecast only recent segments that we are working with.

In that scenario we were able to use the patterns that we learnt from old segments.


View entire conversation on ReviewNB

Copy link
Collaborator

Let's add example for this scenario "pipeline on misaligned data and run inference on aligned dataset"


View entire conversation on ReviewNB

@github-actions github-actions bot temporarily deployed to pull request March 29, 2024 11:43 Inactive
@github-actions github-actions bot temporarily deployed to pull request March 29, 2024 12:00 Inactive
@d-a-bunin d-a-bunin merged commit 0b2112f into master Apr 1, 2024
16 checks passed
@d-a-bunin d-a-bunin deleted the issue-277 branch April 1, 2024 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update documentation after "Handling unaligned data" track
2 participants