Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SystemError: Out of memory error when computing mmd.compute_statistic() #4605

Closed
amir-rahnama opened this issue Apr 10, 2019 · 4 comments
Closed

Comments

@amir-rahnama
Copy link

I am following http://shogun-toolbox.org/notebook/latest/mmd_two_sample_testing.html to run a Kernel MMD test. I am hitting a memory error when calling compute_statistic on my data, since it is a dense vector of (100, 19666):

I am using

# turn data into Shogun representation (columns vectors)
feat_p=sg.RealFeatures(X.reshape(1, 98330000))
feat_q=sg.RealFeatures(Y.reshape(1, 98330000))

error message:

 biased_statistic=mmd.compute_statistic()
SystemError: Out of memory error, tried to allocate 61880248960000 bytes using malloc

I assume that the problem is somewhere else since Shogun should work with high dimensional data, no?

@karlnapf
Copy link
Member

It should bit you may have spotted a problem.
@lambday any ideas?
I’ll try to check this within the next days
Thanks for the report

@karlnapf
Copy link
Member

karlnapf commented Apr 13, 2019

Just looking at this again. The line sg.RealFeatures(X.reshape(1, 98330000)) creates 98330000 vectors of dimension 1. Building the kernel matrix for that many data is hopeless. Not sure why you added that, but it should be OK to compute the kernel matrix between 19666 vectors of dimension 100. Could you confirm?

@amir-rahnama
Copy link
Author

Yes very true. I got misinformed because in one of your tutorials they were represented as column vectors, so I assumed that's the formation you need to fit them.

Thanks for the help

@karlnapf
Copy link
Member

Yes, all shogun vectors are in column major format, so number of rows is the dimension and number of Cols is the number of vectors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants