Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue #3612 #4127

Merged
merged 16 commits into from Jan 29, 2018
Merged

Fix issue #3612 #4127

merged 16 commits into from Jan 29, 2018

Conversation

dhelekal
Copy link
Contributor

This PR addresses issue #3612, by allowing CombinedKernel init to properly subset CustomKernel subkernels. Hence CustomKernel should now work with mkl / x-validation

Test code used to verify the fix:

from shogun import *
from numpy import *

init_shogun_with_defaults()

feats_train=RealFeatures(random.rand(10,10))  
labels=BinaryLabels(array([1,1,1,1,1,-1,-1,-1,-1,-1]))   

kerg = GaussianKernel(2.0)

kerg.init(feats_train,feats_train)

ker1 = CustomKernel(kerg.get_kernel_matrix())
ker2 = GaussianKernel(2.0)

cker = CombinedKernel()

cker.append_kernel(ker1)
cker.append_kernel(ker2)

cft = CombinedFeatures()

cft.append_feature_obj(feats_train)
cft.append_feature_obj(feats_train)

cker.init(cft,cft)

libsvm = LibSVM()
svm = MKLClassification(libsvm)
svm.set_interleaved_optimization_enabled(False)
svm.set_kernel(cker)
svm.set_C_mkl(1000)
svm.set_mkl_norm(1)

splitting_strategy = StratifiedCrossValidationSplitting(labels, 3)
evaluation_criterium = AccuracyMeasure()
cross = CrossValidation(svm, cft, labels, splitting_strategy, evaluation_criterium)
cross.set_autolock(False)
cross.set_num_runs(2)

mkl_obs = ParameterObserverCV(True)
cross.subscribe_to_parameters(mkl_obs)

result = CrossValidationResult()
result = CrossValidationResult.obtain_from_generic(cross.evaluate())

obs = mkl_obs.get_observation(0)
fold = obs.get_fold(0)
machine = MKLClassification.obtain_from_generic(fold.get_trained_machine())

k = CombinedKernel.obtain_from_generic(machine.get_kernel())
k1 = k.get_kernel(0)
k2 = k.get_kernel(1)

print(k1.get_kernel_matrix())
print("\n")
print(k2.get_kernel_matrix())
print("\n")
print(k1.get_kernel_matrix()-k2.get_kernel_matrix())

Sample output for the test to verify subsetting works properly:
print(k1.get_kernel_matrix())

[[ 1.          0.50979882  0.49804497  0.28813449  0.27958548  0.30324122]
 [ 0.50979882  1.          0.38971996  0.27236992  0.29889661  0.40002599]
 [ 0.49804497  0.38971996  1.          0.25557584  0.49059519  0.41748011]
 [ 0.28813449  0.27236992  0.25557584  1.          0.48876277  0.16951311]
 [ 0.27958548  0.29889661  0.49059519  0.48876277  1.          0.53417933]
 [ 0.30324122  0.40002599  0.41748011  0.16951311  0.53417933  1.        ]]

print(k2.get_kernel_matrix())

[[ 1.          0.50979884  0.49804496  0.28813448  0.27958548  0.30324122]
 [ 0.50979884  1.          0.38971996  0.27236991  0.29889661  0.400026  ]
 [ 0.49804496  0.38971996  1.          0.25557583  0.49059518  0.41748012]
 [ 0.28813448  0.27236991  0.25557583  1.          0.48876277  0.1695131 ]
 [ 0.27958548  0.29889661  0.49059518  0.48876277  1.          0.53417934]
 [ 0.30324122  0.400026    0.41748012  0.1695131   0.53417934  1.        ]]

print(k1.get_kernel_matrix()-k2.get_kernel_matrix())

[[  0.00000000e+00  -1.68008413e-08   6.80469564e-09   9.58425667e-09
   -2.85334684e-09   6.21391272e-11]
 [ -1.68008413e-08   0.00000000e+00   5.66272729e-09   1.40209140e-08
    3.09241743e-09  -3.52935375e-09]
 [  6.80469564e-09   5.66272729e-09   0.00000000e+00   3.50433216e-09
    1.25876629e-08  -1.27479330e-08]
 [  9.58425667e-09   1.40209140e-08   3.50433216e-09   0.00000000e+00
   -6.48492471e-09   6.75771872e-09]
 [ -2.85334684e-09   3.09241743e-09   1.25876629e-08  -6.48492471e-09
    0.00000000e+00  -9.37992795e-09]
 [  6.21391272e-11  -3.52935375e-09  -1.27479330e-08   6.75771872e-09
   -9.37992795e-09   0.00000000e+00]]

auto rhs_subset =
rhs_subset_stack->get_last_subset()->get_subset_idx();
// clear all previous subsets
((CCustomKernel*)k)->remove_all_col_subsets();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you shouldn’t clear the subsets here.
This should happen from outside (and only the last one) as otherwise you might remove stuff a user has set previously

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I operated under the assumption that the subset should be maintained in sync with the combinedfeatures used to init the combinedkernel. Normally, as per my understanding, the subset for the subkernels is obtained from the features used to init them. (The init(features l, features r) method of custom kernel in fact clears the subsets).
As the custom kernel has no features in the 1st place, I attempted to replicate what an init() would do if it had features.
Alternative approach: I could instantiate CIndexFeatures using lhs_subset/rhs_subset, then init the custom kernel using these. This might actually be a better approach, on a second though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: There seems to be an inconsistency within CCustomKernel::init(CFeatures* l, CFeatures* r).
Namely, if l & r are of type CIndexFeatures the normalizer doesnt get called, see following block

	if (l->get_feature_class()==C_INDEX && r->get_feature_class()==C_INDEX)
	{
		CIndexFeatures* l_idx = (CIndexFeatures*)l;
		CIndexFeatures* r_idx = (CIndexFeatures*)r;

		remove_all_col_subsets();
		remove_all_row_subsets();

		add_row_subset(l_idx->get_feature_index());
		add_col_subset(r_idx->get_feature_index());

		lhs_equals_rhs=m_is_symmetric;

		return true;
	}

	/* For other types of CFeatures do the default actions below */
	CKernel::init(l, r);

	lhs_equals_rhs=m_is_symmetric;

	SG_DEBUG("num_vec_lhs: %d vs num_rows %d\n", l->get_num_vectors(), kmatrix.num_rows)
	SG_DEBUG("num_vec_rhs: %d vs num_cols %d\n", r->get_num_vectors(), kmatrix.num_cols)
	ASSERT(l->get_num_vectors()==kmatrix.num_rows)
	ASSERT(r->get_num_vectors()==kmatrix.num_cols)
	return init_normalizer();

Is this intentional or not?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most things in there were written in demand if one specific use case, so probably no.

I think everything should be consistent between normal kernels and custom kernels.

So maybe we keep it like this.

Memory check is important before merging though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% certain what do you mean re: consistency; could you please clarify it to me, perhaps on irc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I meant do it the same way for custom kernels as for normal kernels (clearing the stack)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so keep the current method, rather than providing dummy indexfeatures, and calling the customkernels init on those? (+ and fix the memory issues)

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thanks for the fix

Definitely useful!
Did you check your changes wth valgrind? There are memory leaks since you don’t unref some of the object you obtain from getters.

I am a bit concerned about the clearing of subsets, as said, can’t you remove the last one from outside the function in xvalidation?

@karlnapf
Copy link
Member

A minimal unit test that checks the matrices as in your output would be super helpful

DavidH added 4 commits January 27, 2018 18:28
…xes an issue in which combined kernels would not subset custom kernels if features other than combinedfeatures were passed

The latter is achieved by replacing the previous recursive call with a call to a helper function made aware of any subsets contained.
@dhelekal
Copy link
Contributor Author

Update; the fix turned out to be a bit larger than expected; mainly because the added unit test revealed subsetting combined kernel wouldnt behave as expected if custom kernels were contained, and the features passed during init were of other typed than combined features. In order to fix this it was necessary to make the method aware of any subsets present. Hence instead of the past recursive call, there's a call to a helper method which now contains the bulk of the old init, and in addition taking the subsets as arguments.

"No kernel matrix was assigned to this Custom kernel\n")

// clear all previous subsets
((CCustomKernel*)k)->remove_all_row_subsets();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of the explicit cast could we do a dynamic_cast here so that way we can clearly get a bad_cast exception in case of some weird k?

if ( ((CCombinedFeatures*) l)->get_num_feature_obj()<=0 ||
((CCombinedFeatures*) l)->get_num_feature_obj() != ((CCombinedFeatures*) r)->get_num_feature_obj() )
SG_ERROR("CombinedKernel: Number of features/kernels does not match - bailing out\n")
if (((CCombinedFeatures*)l)->get_num_feature_obj() <= 0 ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can dynamic cast these into a var and then use them... one would of course suppose that the compiler would do this as well under the hood... but it might be easier to read the code...?

@vigsterkr
Copy link
Member

thnx for the fix!

@@ -84,6 +84,9 @@ bool CCombinedKernel::init_with_extracted_subsets(
CFeatures* rf = NULL;
CKernel* k = NULL;

auto l_combined = dynamic_cast<CCombinedFeatures*>(l);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo this will be a formatting error :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh that's odd, no clue why my editor suddenly switched to spaces instead of tabs. (

@vigsterkr
Copy link
Member

@vigsterkr vigsterkr merged commit eed22ff into shogun-toolbox:develop Jan 29, 2018
@dhelekal dhelekal deleted the feature/customkernel-subset-fix branch January 29, 2018 09:26

bool result=true;
CKernel::init(l, r);
REQUIRE(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you require this, you could have asked for those types in the signature of the method, no?

CKernel::init(l, r);
REQUIRE(
l->get_feature_class() == C_COMBINED,
"%s::init(): LHS features are"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we actually don't do "CLASSNAME::METHODNAME():" anymore

bool result=true;
CKernel::init(l, r);
REQUIRE(
l->get_feature_class() == C_COMBINED,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this segfaults if the caller passes nullptr

CFeatures* rf = NULL;
CKernel* k = NULL;

auto l_combined = dynamic_cast<CCombinedFeatures*>(l);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why dynamic cast if you don't check whether it worked?

init_subkernel_weights();
}
/*
* The two subsets, we will be passing those to init_with_extracted_subsets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually, we try to not have comments like this, a bit superflous, the code tells the story :)

SGVector<index_t> rhs_subset;

/*
* We will be passing these features to init_with_extracted_subsets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

CCombinedFeatures* combined_r;

/*
* Extract the subsets so that we can pass them on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and same again

SG_UNREF(l_subset_stack);
SG_UNREF(r_subset_stack);

/* if the specified features are not combined features, but a single other
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good comment

else
{
/*
* Otherwise, we just pass l & r straight on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can go

TEST(CombinedKernelTest, test_subset_mixed)
{

CMeanShiftDataGenerator* gen = new CMeanShiftDataGenerator(0, 2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you use auto gen = some<CMeanShiftDataGenerator>(0,2);, then you dont need to SG_UNREF later on

{
for (index_t k = 0; k < 10; ++k)
{
EXPECT_LE(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EXPECT_NEAR would be much cleaner

SGVector<index_t> inds(10);
inds.range_fill();

for (index_t i = 0; i < 10; ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not put the 10 into a variable with a meaninful name?

CMeanShiftDataGenerator* gen = new CMeanShiftDataGenerator(0, 2);
CFeatures* feats = gen->get_streamed_features(10);

CCombinedFeatures* cf = new CCombinedFeatures();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better name: feats_combined (good names avoid the need for code comments)

combined->init(cf, cf);

CKernel* k_g = combined->get_kernel(1);
CKernel* k_0 = combined->get_kernel(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these names are confusing me

SGMatrix<float64_t> custom_matrix_1 = k_0->get_kernel_matrix();
SGMatrix<float64_t> custom_matrix_2 = k_3->get_kernel_matrix();

for (index_t j = 0; j < 10; ++j)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think SGMatrix has an equals method

Copy link
Member

@karlnapf karlnapf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!

I have made some comments that could go into a clean-up PR that would greatly increase the readability of this code

karlnapf pushed a commit to karlnapf/shogun that referenced this pull request Feb 4, 2018
Fix issue shogun-toolbox#3612 
adds unit test functions for combined kernel subsetting behaviour. Fixes an issue in which combined kernels would not subset custom kernels if features other than combinedfeatures were passed
The latter is achieved by replacing the previous recursive call with a call to a helper function made aware of any subsets contained.
vigsterkr pushed a commit that referenced this pull request Feb 6, 2018
ktiefe pushed a commit to ktiefe/shogun that referenced this pull request Jul 30, 2019
Fix issue shogun-toolbox#3612 
adds unit test functions for combined kernel subsetting behaviour. Fixes an issue in which combined kernels would not subset custom kernels if features other than combinedfeatures were passed
The latter is achieved by replacing the previous recursive call with a call to a helper function made aware of any subsets contained.
ktiefe pushed a commit to ktiefe/shogun that referenced this pull request Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants