Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ref-counting / memory error in CombinedKernel #4416

Open
karlnapf opened this issue Nov 15, 2018 · 10 comments
Open

Ref-counting / memory error in CombinedKernel #4416

karlnapf opened this issue Nov 15, 2018 · 10 comments

Comments

@karlnapf
Copy link
Member

The following listing illustrates the problem. If the last statement is remove, Python segfaults on exit as it tries to destroy the already de-allocated f object

import shogun as sg

import numpy as np
x_tr = np.random.randn(10, 2)

f = sg.RealFeatures(x_tr)
feats_train = sg.CombinedFeatures()
feats_train.append_feature_obj( f)

kernel = sg.CombinedKernel()
kernel.append_kernel(sg.GaussianKernel(3.0))
kernel.append_kernel(sg.PolyKernel(10, 1))

kernel.init(feats_train, feats_train)

sg.get_global_io().set_loglevel(0)
del kernel
print f # segfaults as object's refcounter decreases to zero when destroying the kernel

See also #4410 for the first report

@karlnapf
Copy link
Member Author

Interestingly, the above listing doesnt segfault when only one kernel is added to the combined kernel

@zym-wade
Copy link

Interestingly, the above listing doesnt segfault when only one kernel is added to the combined kernel
I am sorry, I just came back to school. This problem is still not solved. I tried your demo and it will display the following error.

[GCDEBUG] unref() refcount 3 obj DenseFeatures (0x16034f0) decreased
[GCDEBUG] unref() refcount 0, obj CombinedKernel (0x1bea950) destroying
[GCDEBUG] ref() refcount 2 obj GaussianKernel (0x1bee9c0) increased
[GCDEBUG] unref() refcount 2 obj DenseFeatures (0x16034f0) decreased
[GCDEBUG] unref() refcount 1 obj GaussianKernel (0x1bee9c0) decreased
[GCDEBUG] ref() refcount 2 obj PolyKernel (0x1bf1980) increased
[GCDEBUG] unref() refcount 1 obj DenseFeatures (0x16034f0) decreased
[GCDEBUG] unref() refcount 1 obj PolyKernel (0x1bf1980) decreased
[GCDEBUG] ref() refcount 2 obj GaussianKernel (0x1bee9c0) increased
[GCDEBUG] unref() refcount 1 obj GaussianKernel (0x1bee9c0) decreased
[GCDEBUG] ref() refcount 2 obj PolyKernel (0x1bf1980) increased
[GCDEBUG] unref() refcount 1 obj PolyKernel (0x1bf1980) decreased
[GCDEBUG] ref() refcount 2 obj GaussianKernel (0x1bee9c0) increased
[GCDEBUG] unref() refcount 1 obj GaussianKernel (0x1bee9c0) decreased
[GCDEBUG] ref() refcount 2 obj PolyKernel (0x1bf1980) increased
[GCDEBUG] unref() refcount 1 obj PolyKernel (0x1bf1980) decreased
[GCDEBUG] ref() refcount 2 obj GaussianKernel (0x1bee9c0) increased
[GCDEBUG] unref() refcount 1 obj GaussianKernel (0x1bee9c0) decreased
[GCDEBUG] ref() refcount 2 obj PolyKernel (0x1bf1980) increased
[GCDEBUG] unref() refcount 1 obj PolyKernel (0x1bf1980) decreased
[GCDEBUG] unref() refcount 1 obj CombinedFeatures (0x1bc7ca0) decreased
[GCDEBUG] unref() refcount 0, obj DynamicObjectArray (0x1bec7e0) destroying
[GCDEBUG] unref() refcount 0, obj GaussianKernel (0x1bee9c0) destroying
[GCDEBUG] unref() refcount 0, obj EuclideanDistance (0x1bf09d0) destroying
[GCDEBUG] unref() refcount 0, obj DenseFeatures (0x16034f0) destroying
[GCDEBUG] unref() refcount 0, obj SubsetStack (0x1b19ac0) destroying
[GCDEBUG] unref() refcount 0, obj DynamicObjectArray (0x1b4c5c0) destroying
[GCDEBUG] SGObject destroyed (0x1b4c5c0)
[GCDEBUG] SGObject destroyed (0x1b19ac0)
[GCDEBUG] unref() refcount 0, obj DynamicObjectArray (0x1b2bc70) destroying
[GCDEBUG] SGObject destroyed (0x1b2bc70)
[GCDEBUG] SGObject destroyed (0x16034f0)
段错误 (核心已转储)

@karlnapf
Copy link
Member Author

Yes the bug is still open.

@zym-wade
Copy link

Yes the bug is still open.

So now the two-category multi-core learning can't be used, is there any other way to use it?

@karlnapf
Copy link
Member Author

This example doesnt work?
http://www.shogun-toolbox.org/examples/latest/examples/regression/multiple_kernel_learning.html
It might be excluded of our test build due to license issues...but it should still work

And then there is:
http://www.shogun-toolbox.org/notebook/latest/MKL.html

@rrkarim
Copy link
Contributor

rrkarim commented Jan 8, 2019

I can work on this. Maybe you already have some ideas where the problem comes from, @karlnapf @zym-wade ?

@karlnapf
Copy link
Member Author

You are more than welcome to do.... the above snippet will reproduce the error. We don't know more atm

@rrkarim
Copy link
Contributor

rrkarim commented Jan 13, 2019

@karlnapf ok, tho I've checked that for C snippet like:

int main() {
	init_shogun_with_defaults();

	auto f_feats_train = some<CCSVFile>("../../data/classifier_4class_2d_linear_features_train.dat");

	auto f = some<CDenseFeatures<float64_t>>(f_feats_train); // real_features
	auto features_train = new CCombinedFeatures(); 
	features_train->append_feature_obj(f);

	auto poly_kernel = some<CPolyKernel>(10, 1);
	auto gauss_kernel = some<CGaussianKernel>(3.0);

	auto combined_kernel = some<CCombinedKernel>();
	combined_kernel->append_kernel(gauss_kernel);
	combined_kernel->append_kernel(poly_kernel);

	combined_kernel->init(features_train, features_train);
	delete combined_kernel;

	return 0;
}

error is not raised. It is obvious that C++ internal object is destroyed and python just don't see it. I don't know the internals quite good for now so I need more time to analyze it.

@karlnapf
Copy link
Member Author

no need to delete when you use some

@karlnapf
Copy link
Member Author

But yeah, this will be interface related probably, not c++ lib related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants