New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc Improve. Replace timeseries_split unit-tests with a simple test #4070
Doc Improve. Replace timeseries_split unit-tests with a simple test #4070
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice clean up patch, thanks a lot
I made all sorts of minor comments, let's address them and then this is ready to be merged if travis is happy
@@ -6,16 +6,16 @@ RegressionLabels labels(f_labels) | |||
|
|||
#![set parameters] | |||
int num_subsets = 5 | |||
int min_subset_size = 4 | |||
int future_steps = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest future_offset
but that is minor
#![set parameters] | ||
|
||
#![build subsets] | ||
splitting.build_subsets() | ||
#![build subsets] | ||
|
||
#![generate subsets and inverse (aka test labels and train labels)] | ||
IntVector test_labels_indices = splitting.generate_subset_indices(1) | ||
#![generate subsets and inverse (aka vaildation set and train set)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tag for the cookbooks, not free form comment. So pls make it super short (see other examples)
} | ||
|
||
void CTimeSeriesSplitting::build_subsets() | ||
{ | ||
REQUIRE(m_num_subsets > 0, "Number of subsets should be greater than 0."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing newline
|
||
/* filling current with indices on right end */ | ||
/* filling current with indices on right side */ | ||
for (auto k = split_index; k < indices.vlen; ++k) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto k : Range(indices.vlen)
but it is minor
REQUIRE(min_size > 0, "Minimum subset size should be atleast 1.") | ||
/* future_steps should be less than the difference between number of labels | ||
* and split index of second last fold */ | ||
index_t future_steps_upperbound = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
max_future_steps
or better max_future_offset
* indices greater than a split index. The split indices are \f$ c[N/K] \f$ | ||
/** @brief Implements a timeseries splitting strategy for cross-validation, | ||
* respecting time. | ||
* Each fold splits timeseries into train (subset_inverse) and validation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls remove the (subset_inverse) here. This is documented in the base class
The newlines are awkwards, could you remove them?
* (subset) set. | ||
* Train set contains indices less than a split index | ||
* and validation set contains rest of the indices. | ||
* The split indices are \f$ C*floor(N/K) \f$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in latex math, you don't use * for multiplication but \cdot or just nothing
splitting->build_subsets(); | ||
|
||
for (index_t i = 0; i < num_subsets; ++i) | ||
labels = new CRegressionLabels(num_labels); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
labels = some(num_labels); and then you dont have to SG_UNREF later
SG_UNREF(splitting); | ||
for (auto i : range(12)) | ||
{ | ||
EXPECT_EQ(train_indices[i], i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this is a better test in fact!
@Sahil333 any update on this? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue is now being closed due to a lack of activity. Feel free to reopen it. |
No description provided.