New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing splitting strategy to clean up Cross Validation API #4180
Changing splitting strategy to clean up Cross Validation API #4180
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx for the patch!
I think the constness of the train/validation methods should be achieved otherwise
@@ -23,7 +23,7 @@ CCrossValidationSplitting::CCrossValidationSplitting( | |||
m_rng = sg_rand; | |||
} | |||
|
|||
void CCrossValidationSplitting::build_subsets() | |||
void CCrossValidationSplitting::build_subsets() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method cannot be const, it changes members
@@ -33,7 +33,7 @@ CSplittingStrategy::CSplittingStrategy(CLabels* labels, int32_t num_subsets) | |||
reset_subsets(); | |||
} | |||
|
|||
void CSplittingStrategy::reset_subsets() | |||
void CSplittingStrategy::reset_subsets() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cannot be const ...
@@ -69,13 +69,11 @@ CSplittingStrategy::~CSplittingStrategy() | |||
SG_UNREF(m_subset_indices); | |||
} | |||
|
|||
SGVector<index_t> CSplittingStrategy::generate_subset_indices(index_t subset_idx) | |||
const SGVector<index_t> CSplittingStrategy::validation(index_t subset_idx) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a good idea to make this method const (if things are precomputed)
However, there is a "build_subsets" call below, which cannot be const, so the current const doenst make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the function arg can be const :)
*/ | ||
SGVector<index_t> generate_subset_indices(index_t subset_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this renaming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comment above
|
||
/** @return number of subsets. */ | ||
index_t get_num_subsets() const; | ||
|
||
protected: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you make this method private. Why not call it in the constructor then? Then we can make the train/validation method const..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed that, thanks.
@@ -99,14 +96,14 @@ class CSplittingStrategy: public CSGObject | |||
CLabels* m_labels; | |||
|
|||
/** subset indices */ | |||
CDynamicObjectArray* m_subset_indices; | |||
mutable CDynamicObjectArray* m_subset_indices; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now what your thinking was.
I don't think this is the best style, making it mutable and the other things const.
There are simpler ways of reaching this goal imo
35357cb
to
694d066
Compare
Hi @karlnapf @vigsterkr ! Thanks for the review. I've changed the constness strategy and now build_subsets() is called in the constructors of the derived classes. After these changes, |
*/ | ||
SGVector<index_t> generate_subset_indices(index_t subset_idx); | ||
const SGVector<index_t> validation(const index_t subset_idx) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why we want the return type to be const?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this according to the Issue description. I think this shouldn't be a problem, given the possibility to return it to const or non-const variables. Should I remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be ok
@@ -45,6 +45,7 @@ CTimeSeriesSplitting::CTimeSeriesSplitting(CLabels* labels, index_t num_subsets) | |||
: CSplittingStrategy(labels, num_subsets) | |||
{ | |||
init(); | |||
build_subsets(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the method seem to be called in init
, why call it again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build_subsets() you mean? It isn't called on init() in this class.
@@ -99,6 +100,9 @@ void CTimeSeriesSplitting::set_min_subset_size(index_t min_size) | |||
"subsets and labels.", | |||
num_labels - (num_subsets - 1) * (num_labels / num_subsets) - 1); | |||
m_min_subset_size = min_size; | |||
|
|||
// Rebuild subsets considering the new minimum subset size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it is built twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In init() m_min_subset_size
is set to 1, which is then used by build_subsets() (called by the constructor). If set_min_subset_size() is called, we should rebuild the subsets. I've found this due to a failure of timeseries_subset_linear_splits
unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to avoid this. Maybe you can shift things around that it is only built ones .. ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way to achieve this would be to allow m_min_subset_size to be passed as an argument in the constructor (default value 1) and remove the method set_min_subset_size(). Do you think that would be a problem?
@@ -69,7 +69,8 @@ CSplittingStrategy::~CSplittingStrategy() | |||
SG_UNREF(m_subset_indices); | |||
} | |||
|
|||
SGVector<index_t> CSplittingStrategy::generate_subset_indices(index_t subset_idx) | |||
const SGVector<index_t> | |||
CSplittingStrategy::validation(const index_t subset_idx) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for const on index_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Added a few comments, the build seems to be fine. So tests passing. |
Any updates here? |
694d066
to
74f35a6
Compare
Hi @karlnapf ! Sorry for my late reply. I'm sending a new patch set changing build_subsets() in As far as I've checked, all python notebooks and meta examples for the new API were changed. |
SGVector<index_t> subset=splitting->generate_subset_indices(i); | ||
SGVector<index_t> inverse=splitting->generate_subset_inverse(i); | ||
SGVector<index_t> subset = splitting->validation(i); | ||
SGVector<index_t> inverse = splitting->train(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just realised a minor inconsistency. Should be called "training" if the counterpart is called "validation"
But I would even prefer "train" and "test" ..
/** custom rng if using cross validation across different threads */ | ||
CRandom * m_rng; | ||
|
||
protected: | ||
/** implementation of the standard cross-validation splitting strategy */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment seems weird to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and will remove it.
This is almost ready. |
I like "train" and "test"! I'll make the changes and submit another patch set. Thanks! |
74f35a6
to
0a2a42a
Compare
Hi @karlnapf ! Any other comments in this PR? Can I squash the commits to clean it up? |
Thanks for the udpate and ping! :) Looks good! No need to squash as github can do that these days One thing I want to make sure.Did you run all unit and integration tests? |
@@ -21,6 +21,7 @@ CCrossValidationSplitting::CCrossValidationSplitting( | |||
CSplittingStrategy(labels, num_subsets) | |||
{ | |||
m_rng = sg_rand; | |||
build_subsets(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a problem here is that we can only call the default ctor from swig, then we may still need to call build_subset from the outside? @karlnapf
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue is now being closed due to a lack of activity. Feel free to reopen it. |
Changing splitting strategy to clean up Cross Validation API (#4049)
respectively, to validation() and train();
in the constructor of the derived classes;