Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time Series Split #163

Merged
merged 9 commits into from
Apr 9, 2020
Merged

Time Series Split #163

merged 9 commits into from
Apr 9, 2020

Conversation

Srinidhi-Patil
Copy link
Contributor

@Srinidhi-Patil Srinidhi-Patil commented Apr 2, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Time Series Cross-Validation:
Implemented Time Series Split and Blocking Time Series Split with Tests

Any other comments?


def time_series_split(time_series: pd.DataFrame, n_splits=4, split_on='index'):
"""
Split the input DataFrame into n_splits. If the data is not a timeries then the split
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

gtime/model_selection/cross_validation.py Outdated Show resolved Hide resolved
gtime/model_selection/cross_validation.py Show resolved Hide resolved
gtime/model_selection/cross_validation.py Outdated Show resolved Hide resolved
gtime/model_selection/cross_validation.py Outdated Show resolved Hide resolved
gtime/model_selection/tests/test_cross_validation.py Outdated Show resolved Hide resolved
gtime/model_selection/tests/test_cross_validation.py Outdated Show resolved Hide resolved
gtime/model_selection/cross_validation.py Outdated Show resolved Hide resolved
next_date = start_date + pd.Timedelta(split_length)

for split in range(n_splits - 1):
time_fold = time_series[(time_series.index >= start_date) & (time_series.index < next_date)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do these operations diectly on the index, so that you don't copy the whole time series

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by 'copy the whole time series' ?
Are you suggesting that I should not assign it and directly send the yield like below

for split in range(n_splits - 1):
     yield time_series[(time_series.index >= start_date) & (time_series.index < next_date)].index
     next_date += pd.Timedelta(split_length)
yield time_series[0:].index

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First select the index and then compute the folds on it.
This line: time_series[(time_series.index >= start_date) & (time_series.index < next_date)] copies the all dataframe, data included

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


def time_series_split(time_series: pd.DataFrame, n_splits=4, split_on='index'):
"""
Split the input DataFrame into n_splits. If the data is not a timeries then the split
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, it is not super clear the description

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@deatinor deatinor merged commit 380d356 into giotto-ai:master Apr 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants