-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Panel splitter classes #1220
Conversation
3c90dad
to
1ef087b
Compare
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## develop #1220 +/- ##
===========================================
+ Coverage 75.97% 76.35% +0.37%
===========================================
Files 60 60
Lines 5345 5349 +4
===========================================
+ Hits 4061 4084 +23
+ Misses 1284 1265 -19
|
da64d8c
to
55b57f8
Compare
self.n_splits = n_splits | ||
self.n_folds = n_splits + 1 | ||
|
||
def split( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add return type hints
To be reviewed on my end now. |
cc27bac
to
7a10702
Compare
7a10702
to
44ce2b7
Compare
sorted(Xy.index.get_level_values(0).unique()) | ||
) | ||
real_dates = Xy.index.get_level_values(1).unique().sort_values() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add:
freq_est = pd.infer_freq(real_dates)
freq_offset = pd.tseries.frequencies.to_offset(freq_est)
|
||
xranges_train: List[ | ||
Tuple[pd.Timestamp, pd.Timedelta] | ||
] = self._calculate_xranges(cs_train_dates, real_dates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add freq_offset as an argument to calculate_xranges
) | ||
|
||
def _calculate_xranges( | ||
self, cs_dates: pd.DatetimeIndex, real_dates: pd.DatetimeIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add argument
:param <pd.DateOffset> freq_offset: DateOffset object representing the frequency of the dates in the panel.
# A single contiguous range of dates. | ||
if len(difference) == 0: | ||
xranges.append( | ||
(cs_dates.min(), cs_dates.max() - cs_dates.min() + pd.Timedelta(days=1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace pd.Timedelta(days=1) with freq_offset
difference = difference[(difference >= cs_dates.min())] | ||
|
||
xranges.append( | ||
(cs_dates.min(), cs_dates.max() - cs_dates.min() + pd.Timedelta(days=1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace pd.Timedelta(days=1) with freq_offset
The original
PanelTimeSeriesSplit
class is refactored into three classes:ForwardPanelSplit
PanelTimeSeriesSplit
withn_split_method="expanding"
KFoldPanelSplit
PanelTimeSeriesSplit
withn_split_method="rolling"
IntervalPanelSplit
PanelTimeSeriesSplit
withtrain_intervals
The
split()
methods nowyield
the splits instead of a returning a list of all splits.