-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time Series Split #163
Time Series Split #163
Conversation
|
||
def time_series_split(time_series: pd.DataFrame, n_splits=4, split_on='index'): | ||
""" | ||
Split the input DataFrame into n_splits. If the data is not a timeries then the split |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
next_date = start_date + pd.Timedelta(split_length) | ||
|
||
for split in range(n_splits - 1): | ||
time_fold = time_series[(time_series.index >= start_date) & (time_series.index < next_date)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do these operations diectly on the index, so that you don't copy the whole time series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by 'copy the whole time series' ?
Are you suggesting that I should not assign it and directly send the yield like below
for split in range(n_splits - 1):
yield time_series[(time_series.index >= start_date) & (time_series.index < next_date)].index
next_date += pd.Timedelta(split_length)
yield time_series[0:].index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First select the index and then compute the folds on it.
This line: time_series[(time_series.index >= start_date) & (time_series.index < next_date)]
copies the all dataframe, data included
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
||
def time_series_split(time_series: pd.DataFrame, n_splits=4, split_on='index'): | ||
""" | ||
Split the input DataFrame into n_splits. If the data is not a timeries then the split |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, it is not super clear the description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Implemented Blocking Time Series Split
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Time Series Cross-Validation:
Implemented Time Series Split and Blocking Time Series Split with Tests
Any other comments?