Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doc Improve. Replace timeseries_split unit-tests with a simple test #4070

Closed
wants to merge 1 commit into from
Closed

Doc Improve. Replace timeseries_split unit-tests with a simple test #4070

wants to merge 1 commit into from

Conversation

Sahil333
Copy link
Contributor

@Sahil333 Sahil333 commented Jan 7, 2018

No description provided.

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice clean up patch, thanks a lot

I made all sorts of minor comments, let's address them and then this is ready to be merged if travis is happy

@@ -6,16 +6,16 @@ RegressionLabels labels(f_labels)

#![set parameters]
int num_subsets = 5
int min_subset_size = 4
int future_steps = 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest future_offset but that is minor

#![set parameters]

#![build subsets]
splitting.build_subsets()
#![build subsets]

#![generate subsets and inverse (aka test labels and train labels)]
IntVector test_labels_indices = splitting.generate_subset_indices(1)
#![generate subsets and inverse (aka vaildation set and train set)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tag for the cookbooks, not free form comment. So pls make it super short (see other examples)

}

void CTimeSeriesSplitting::build_subsets()
{
REQUIRE(m_num_subsets > 0, "Number of subsets should be greater than 0.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline


/* filling current with indices on right end */
/* filling current with indices on right side */
for (auto k = split_index; k < indices.vlen; ++k)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto k : Range(indices.vlen) but it is minor

REQUIRE(min_size > 0, "Minimum subset size should be atleast 1.")
/* future_steps should be less than the difference between number of labels
* and split index of second last fold */
index_t future_steps_upperbound =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max_future_steps
or better max_future_offset

* indices greater than a split index. The split indices are \f$ c[N/K] \f$
/** @brief Implements a timeseries splitting strategy for cross-validation,
* respecting time.
* Each fold splits timeseries into train (subset_inverse) and validation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove the (subset_inverse) here. This is documented in the base class
The newlines are awkwards, could you remove them?

* (subset) set.
* Train set contains indices less than a split index
* and validation set contains rest of the indices.
* The split indices are \f$ C*floor(N/K) \f$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in latex math, you don't use * for multiplication but \cdot or just nothing

splitting->build_subsets();

for (index_t i = 0; i < num_subsets; ++i)
labels = new CRegressionLabels(num_labels);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

labels = some(num_labels); and then you dont have to SG_UNREF later

SG_UNREF(splitting);
for (auto i : range(12))
{
EXPECT_EQ(train_indices[i], i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes this is a better test in fact!

@vigsterkr
Copy link
Member

@Sahil333 any update on this?

@stale
Copy link

stale bot commented Feb 26, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 26, 2020
@stale
Copy link

stale bot commented Mar 4, 2020

This issue is now being closed due to a lack of activity. Feel free to reopen it.

@stale stale bot closed this Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants