Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV on MulticlassLibSVM with SubsequenceStringKernel #4157

Closed
bmurauer opened this issue Feb 7, 2018 · 15 comments
Closed

SIGSEGV on MulticlassLibSVM with SubsequenceStringKernel #4157

bmurauer opened this issue Feb 7, 2018 · 15 comments

Comments

@bmurauer
Copy link

bmurauer commented Feb 7, 2018

Hi, i was playing around with the python API of shogun, my goal is to use SubsequenceStringKernel for multiclass text document classification. Therefore, i wrote this code:

import numpy as np
from shogun import SubsequenceStringKernel
from shogun import MulticlassAccuracy
from shogun import MulticlassLabels
from shogun import MulticlassLibSVM
from shogun import StringCharFeatures
from shogun import RAWBYTE
from sklearn.datasets import fetch_20newsgroups
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split

news = fetch_20newsgroups(subset='train')
x = news.data
y = news.target

le = LabelEncoder()
y = le.fit_transform(y)
y = np.array(y, dtype=np.float64)
x_train, y_train, x_test, y_test = train_test_split(x, y)

features_train = StringCharFeatures(x_train, RAWBYTE)
features_test = StringCharFeatures(x_test, RAWBYTE) 

labels_train = MulticlassLabels(y_train)
labels_test = MulticlassLabels(y_test)

C = 1.0
epsilon = 0.0001
kernel = SubsequenceStringKernel(features_train, features_train, 5, 1.0)

svm = MulticlassLibSVM(C, kernel, labels_train)
svm.set_epsilon(epsilon)
svm.train()
labels_predict = svm.apply_multiclass(features_test)
accu = MulticlassAccuracy()
print(accu.evaluate(labels_predict, labels_test))

Which causes as SIGSEGV.

@vigsterkr
Copy link
Member

@bmurauer thnx for reporting! which version are you using?

@bmurauer
Copy link
Author

bmurauer commented Feb 7, 2018

Sorry, forgot to mention. Latest from GIT, the NEWS file says 6.2.0.
Running on Arch Linux, Kernel 4.14.15, Python 3.6.4

@vigsterkr
Copy link
Member

@bmurauer just checkd with python 2.7 + shogun and i'm getting a nicer

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-1-efadaa457eea> in <module>()
     19 x_train, y_train, x_test, y_test = train_test_split(x, y)
     20
---> 21 features_train = StringCharFeatures(x_train, RAWBYTE)
     22 features_test = StringCharFeatures(x_test, RAWBYTE)
     23

NotImplementedError: Wrong number or type of arguments for overloaded function 'new_StringCharFeatures'.
  Possible C/C++ prototypes are:
    shogun::CStringFeatures< char >::CStringFeatures()
    shogun::CStringFeatures< char >::CStringFeatures(shogun::EAlphabet)
    shogun::CStringFeatures< char >::CStringFeatures(shogun::SGStringList< char >,shogun::EAlphabet)
    shogun::CStringFeatures< char >::CStringFeatures(shogun::SGStringList< char >,shogun::CAlphabet *)
    shogun::CStringFeatures< char >::CStringFeatures(shogun::CAlphabet *)
    shogun::CStringFeatures< char >::CStringFeatures(shogun::CStringFeatures< char > const &)
    shogun::CStringFeatures< char >::CStringFeatures(shogun::CFile *,shogun::EAlphabet)
    shogun::CStringFeatures< char >::CStringFeatures(shogun::CFile *)

lemme check why do you have a sigsegv with py36 :(
still of course this doesn't solve your problem :(

@vigsterkr
Copy link
Member

@bmurauer ok so i've managed to fix the problem in py27... currently the kernel blew up my memory :)
i'll try to test it with a smaller dataset to first see what could be the problem then i'll check what's the problem on py3

@vigsterkr
Copy link
Member

vigsterkr commented Feb 12, 2018

btw for the record there's a bug in your code:

x_train, y_train, x_test, y_test = train_test_split(x, y)

should be:

x_train, x_test, y_train, y_test = train_test_split(x, y)

@vigsterkr
Copy link
Member

ok i've managed to reproduce the error finally... lemme see if i can get a fix for it today

@vigsterkr
Copy link
Member

ok so it's basically because of this in the typemaps:

PyBytes_AsString(PyUnicode_AsASCIIString(const_cast<PyObject*>(o)));

one of the strings fails with PyUnicode_AsASCIIString and since there's not return value check (sic!)
the PyBytes_AsString(NULL) will cause a SIGSEGV :) i think we should just simply convert it to UTF8 string and use that as feature. :)

@vigsterkr
Copy link
Member

@bmurauer feature/numpy1.7 branch should have a fix for your problem.. i'll see if travis CI jobs passes in case yes i'll merge this into develop. if you could test is - as i haven't tested yet the python3 version - i would really appreciate it.

@vigsterkr
Copy link
Member

ps: the plz don't try with on the whole dataset as that'll just take all your ram in the machine :)

@bmurauer
Copy link
Author

Hi, thanks for your responses!
i just cloned a fresh version and checked out the feature/numpy1.7 branch and tried to build it with:

mkdir build
cd build
cmake -DINTERFACE_PYTHON=ON -DLICENSE_GPL_SHOGUN=OFF -DUSE_SVMLIGHT=OFF ..
make

which unfortunately yielded this error message at the end of the build:

/opt/shogun/build/src/interfaces/python/shogunPYTHON_wrap.cxx: In function ‘bool string_from_strpy(shogun::SGStringList<ST>&, PyObject*, int)’:
/opt/shogun/build/src/interfaces/python/shogunPYTHON_wrap.cxx:7162:70: error: cannot convert ‘index_t* {aka int*}’ to ‘Py_ssize_t* {aka long int*}’ for argument ‘2’ to ‘char* PyUnicode_AsUTF8AndSize(PyObject*, Py_ssize_t*)’
                     const char* str = PyUnicode_AsUTF8AndSize(o, &len);
                                                                      ^
make[2]: *** [src/interfaces/python/CMakeFiles/_interface_python.dir/build.make:74: src/interfaces/python/CMakeFiles/_interface_python.dir/shogunPYTHON_wrap.cxx.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:464: src/interfaces/python/CMakeFiles/_interface_python.dir/all] Error 2
make: *** [Makefile:152: all] Error 2

Do I have to specify some other make flags for this branch?

@vigsterkr
Copy link
Member

@bmurauer sorry about it i was coding blindly as i really haven't tested it on py3. i'll do it now and force push. once tested/done i'll ping u here :)

@vigsterkr
Copy link
Member

@bmurauer ok now the HEAD of the feature branch (note i've force pushed) contains the fix that worked for me with python35. lemme know how does it work for you.

@bmurauer
Copy link
Author

Thank you, no more SIGSEGV :) However, i seem to have another issue with the multiclass-ity, which i will post in a separate thread since I don't think it is related.

@vigsterkr
Copy link
Member

@bmurauer cool thnx for testing! let's keep the issue open until i merge the feature branch into develop :)

@vigsterkr
Copy link
Member

fix merged, we can close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants