New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue #3612 #4127
Fix issue #3612 #4127
Conversation
src/shogun/kernel/CombinedKernel.cpp
Outdated
auto rhs_subset = | ||
rhs_subset_stack->get_last_subset()->get_subset_idx(); | ||
// clear all previous subsets | ||
((CCustomKernel*)k)->remove_all_col_subsets(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you shouldn’t clear the subsets here.
This should happen from outside (and only the last one) as otherwise you might remove stuff a user has set previously
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I operated under the assumption that the subset should be maintained in sync with the combinedfeatures used to init the combinedkernel. Normally, as per my understanding, the subset for the subkernels is obtained from the features used to init them. (The init(features l, features r) method of custom kernel in fact clears the subsets).
As the custom kernel has no features in the 1st place, I attempted to replicate what an init() would do if it had features.
Alternative approach: I could instantiate CIndexFeatures using lhs_subset/rhs_subset, then init the custom kernel using these. This might actually be a better approach, on a second though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: There seems to be an inconsistency within CCustomKernel::init(CFeatures* l, CFeatures* r)
.
Namely, if l & r are of type CIndexFeatures the normalizer doesnt get called, see following block
if (l->get_feature_class()==C_INDEX && r->get_feature_class()==C_INDEX)
{
CIndexFeatures* l_idx = (CIndexFeatures*)l;
CIndexFeatures* r_idx = (CIndexFeatures*)r;
remove_all_col_subsets();
remove_all_row_subsets();
add_row_subset(l_idx->get_feature_index());
add_col_subset(r_idx->get_feature_index());
lhs_equals_rhs=m_is_symmetric;
return true;
}
/* For other types of CFeatures do the default actions below */
CKernel::init(l, r);
lhs_equals_rhs=m_is_symmetric;
SG_DEBUG("num_vec_lhs: %d vs num_rows %d\n", l->get_num_vectors(), kmatrix.num_rows)
SG_DEBUG("num_vec_rhs: %d vs num_cols %d\n", r->get_num_vectors(), kmatrix.num_cols)
ASSERT(l->get_num_vectors()==kmatrix.num_rows)
ASSERT(r->get_num_vectors()==kmatrix.num_cols)
return init_normalizer();
Is this intentional or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most things in there were written in demand if one specific use case, so probably no.
I think everything should be consistent between normal kernels and custom kernels.
So maybe we keep it like this.
Memory check is important before merging though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% certain what do you mean re: consistency; could you please clarify it to me, perhaps on irc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. I meant do it the same way for custom kernels as for normal kernels (clearing the stack)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so keep the current method, rather than providing dummy indexfeatures, and calling the customkernels init on those? (+ and fix the memory issues)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great thanks for the fix
Definitely useful!
Did you check your changes wth valgrind? There are memory leaks since you don’t unref some of the object you obtain from getters.
I am a bit concerned about the clearing of subsets, as said, can’t you remove the last one from outside the function in xvalidation?
A minimal unit test that checks the matrices as in your output would be super helpful |
…xes an issue in which combined kernels would not subset custom kernels if features other than combinedfeatures were passed The latter is achieved by replacing the previous recursive call with a call to a helper function made aware of any subsets contained.
Update; the fix turned out to be a bit larger than expected; mainly because the added unit test revealed subsetting combined kernel wouldnt behave as expected if custom kernels were contained, and the features passed during init were of other typed than combined features. In order to fix this it was necessary to make the method aware of any subsets present. Hence instead of the past recursive call, there's a call to a helper method which now contains the bulk of the old init, and in addition taking the subsets as arguments. |
src/shogun/kernel/CombinedKernel.cpp
Outdated
"No kernel matrix was assigned to this Custom kernel\n") | ||
|
||
// clear all previous subsets | ||
((CCustomKernel*)k)->remove_all_row_subsets(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of the explicit cast could we do a dynamic_cast here so that way we can clearly get a bad_cast exception in case of some weird k
?
src/shogun/kernel/CombinedKernel.cpp
Outdated
if ( ((CCombinedFeatures*) l)->get_num_feature_obj()<=0 || | ||
((CCombinedFeatures*) l)->get_num_feature_obj() != ((CCombinedFeatures*) r)->get_num_feature_obj() ) | ||
SG_ERROR("CombinedKernel: Number of features/kernels does not match - bailing out\n") | ||
if (((CCombinedFeatures*)l)->get_num_feature_obj() <= 0 || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can dynamic cast these into a var and then use them... one would of course suppose that the compiler would do this as well under the hood... but it might be easier to read the code...?
thnx for the fix! |
src/shogun/kernel/CombinedKernel.cpp
Outdated
@@ -84,6 +84,9 @@ bool CCombinedKernel::init_with_extracted_subsets( | |||
CFeatures* rf = NULL; | |||
CKernel* k = NULL; | |||
|
|||
auto l_combined = dynamic_cast<CCombinedFeatures*>(l); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo this will be a formatting error :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh that's odd, no clue why my editor suddenly switched to spaces instead of tabs. (
|
||
bool result=true; | ||
CKernel::init(l, r); | ||
REQUIRE( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you require this, you could have asked for those types in the signature of the method, no?
CKernel::init(l, r); | ||
REQUIRE( | ||
l->get_feature_class() == C_COMBINED, | ||
"%s::init(): LHS features are" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we actually don't do "CLASSNAME::METHODNAME():" anymore
bool result=true; | ||
CKernel::init(l, r); | ||
REQUIRE( | ||
l->get_feature_class() == C_COMBINED, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this segfaults if the caller passes nullptr
CFeatures* rf = NULL; | ||
CKernel* k = NULL; | ||
|
||
auto l_combined = dynamic_cast<CCombinedFeatures*>(l); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why dynamic cast if you don't check whether it worked?
init_subkernel_weights(); | ||
} | ||
/* | ||
* The two subsets, we will be passing those to init_with_extracted_subsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually, we try to not have comments like this, a bit superflous, the code tells the story :)
SGVector<index_t> rhs_subset; | ||
|
||
/* | ||
* We will be passing these features to init_with_extracted_subsets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
CCombinedFeatures* combined_r; | ||
|
||
/* | ||
* Extract the subsets so that we can pass them on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and same again
SG_UNREF(l_subset_stack); | ||
SG_UNREF(r_subset_stack); | ||
|
||
/* if the specified features are not combined features, but a single other |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a good comment
else | ||
{ | ||
/* | ||
* Otherwise, we just pass l & r straight on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can go
TEST(CombinedKernelTest, test_subset_mixed) | ||
{ | ||
|
||
CMeanShiftDataGenerator* gen = new CMeanShiftDataGenerator(0, 2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use auto gen = some<CMeanShiftDataGenerator>(0,2);
, then you dont need to SG_UNREF
later on
{ | ||
for (index_t k = 0; k < 10; ++k) | ||
{ | ||
EXPECT_LE( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EXPECT_NEAR would be much cleaner
SGVector<index_t> inds(10); | ||
inds.range_fill(); | ||
|
||
for (index_t i = 0; i < 10; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not put the 10
into a variable with a meaninful name?
CMeanShiftDataGenerator* gen = new CMeanShiftDataGenerator(0, 2); | ||
CFeatures* feats = gen->get_streamed_features(10); | ||
|
||
CCombinedFeatures* cf = new CCombinedFeatures(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better name: feats_combined
(good names avoid the need for code comments)
combined->init(cf, cf); | ||
|
||
CKernel* k_g = combined->get_kernel(1); | ||
CKernel* k_0 = combined->get_kernel(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these names are confusing me
SGMatrix<float64_t> custom_matrix_1 = k_0->get_kernel_matrix(); | ||
SGMatrix<float64_t> custom_matrix_2 = k_3->get_kernel_matrix(); | ||
|
||
for (index_t j = 0; j < 10; ++j) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think SGMatrix
has an equals
method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch!
I have made some comments that could go into a clean-up PR that would greatly increase the readability of this code
Fix issue shogun-toolbox#3612 adds unit test functions for combined kernel subsetting behaviour. Fixes an issue in which combined kernels would not subset custom kernels if features other than combinedfeatures were passed The latter is achieved by replacing the previous recursive call with a call to a helper function made aware of any subsets contained.
Fix issue shogun-toolbox#3612 adds unit test functions for combined kernel subsetting behaviour. Fixes an issue in which combined kernels would not subset custom kernels if features other than combinedfeatures were passed The latter is achieved by replacing the previous recursive call with a call to a helper function made aware of any subsets contained.
This PR addresses issue #3612, by allowing CombinedKernel init to properly subset CustomKernel subkernels. Hence CustomKernel should now work with mkl / x-validation
Test code used to verify the fix:
Sample output for the test to verify subsetting works properly:
print(k1.get_kernel_matrix())
print(k2.get_kernel_matrix())
print(k1.get_kernel_matrix()-k2.get_kernel_matrix())