Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault (core dumped) when training LMNN #3975

Closed
ealtamir opened this issue Sep 6, 2017 · 10 comments
Closed

Segmentation fault (core dumped) when training LMNN #3975

ealtamir opened this issue Sep 6, 2017 · 10 comments

Comments

@ealtamir
Copy link

ealtamir commented Sep 6, 2017

Hello,

I get a Segmentation fault (core dumped) when training an LMNN using Python. My features are of dimensions (695, 512) and classes (695,). I compiled Shogun in an Ubuntu LTS environment. The cmake command I used was:

cmake -DPYTHON_INCLUDE_DIR=/home/enzo/art_rm/env/include/python3.5m -DPYTHON_EXECUTABLE:FILEPATH=/home/enzo/art_rm/env/bin/python3.5 -DPYTHON_PACKAGES_PATH=/home/enzo/art_rm/env/lib/python3.5/site-packages -DPythonModular=ON -DINTERFACE_PYTHON=ON -DBUILD_META_EXAMPLES=OFF -DBUNDLE_EIGEN=ON ..

I'm able to run the LMNN example shown in this notebook: https://nbviewer.jupyter.org/gist/iglesias/6576096. One important detail might be that I'm using
from shogun import LMNN
instead of
from modshogun import LMNN

which is what I've seen in every example. Any help is greatly appreciated.

@ealtamir
Copy link
Author

ealtamir commented Sep 6, 2017

After playing a bit with this issue, I realized that there was a class that only appeared once in the dataset, thus it conflicted with the k=3. After setting k=1 I got a new error:

python3.5: /usr/include/eigen3/Eigen/src/Core/Block.h:123: Eigen::Block<XprType, BlockRows, BlockCols, InnerPanel>::Block(XprType&, Eigen::Index) [with XprType = Eigen::Map<const Eigen::Matrix<double, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >; int BlockRows = -1; int BlockCols = 1; bool InnerPanel = true; Eigen::Index = long int]: Assertion (i>=0) && ( ((BlockRows==1) && (BlockCols==XprType::ColsAtCompileTime) && i<xpr.rows()) ||((BlockRows==XprType::RowsAtCompileTime) && (BlockCols==1) && i<xpr.cols()))' failed.

@ealtamir
Copy link
Author

ealtamir commented Sep 6, 2017

I was able to make it work. The problem was related to the dataset, there weren't enough number of datapoints for certain classes so the k I selected wasn't feasible. Apparently k=1 also gives some problems. My solution was to clean the dataset so as to make sure there are enough samples for each class.

@ealtamir ealtamir closed this as completed Sep 6, 2017
@vigsterkr
Copy link
Member

@ealtamir great! although i think we should have the ticket open to fix it with an assertation error instead of a segmentation fault ;)

@vigsterkr vigsterkr reopened this Sep 6, 2017
@karlnapf
Copy link
Member

karlnapf commented Sep 14, 2017 via email

@iglesias iglesias self-assigned this Sep 14, 2017
@iglesias
Copy link
Collaborator

iglesias commented Mar 8, 2018

@vinx13, since you have already touched LMNN and are at least a bit familiar with it, would you like to pick this one up? :-)

@vinx13
Copy link
Member

vinx13 commented Mar 9, 2018

sure, I will pick it up

@iglesias
Copy link
Collaborator

iglesias commented Apr 2, 2018

How is it going with this one, @vinx13? I wonder, no chase ;-)

@vinx13
Copy link
Member

vinx13 commented Apr 3, 2018

I have tried locally. When I set a K greater than num samples, the program crashed, which I think it is a issue related to KNN. However, I haven't reproduce the error above.

@karlnapf
Copy link
Member

karlnapf commented Apr 3, 2018

Ill make this an entrance task:

  • put graceful error messages if users have set wrong paramerters or provide data that doesnt make sense
  • fix the k=1 error
  • make sure than LMNN works

@iglesias iglesias added this to ToDo in Belgrade Sprint via automation Apr 16, 2018
@iglesias iglesias moved this from ToDo to InProgress in Belgrade Sprint Apr 16, 2018
iglesias added a commit to iglesias/shogun that referenced this issue Apr 17, 2018
Add input check and assertion in LMNN regarding k used in KNN and
the number of examples per class.
@iglesias iglesias moved this from InProgress to Finito in Belgrade Sprint Apr 17, 2018
@iglesias
Copy link
Collaborator

iglesias commented Apr 17, 2018

Thanks a lot for reporting @ealtamir.

The fix explains both issues (including the one with k=1). You were using a data set where one class had only 1 example, this is too few for the method's implementation. As an anecdote, I checked in the other public implementations of LMNN that I know of and they also have this "bug". At least now our implementation stops gracefully and points you at the error 😸

iglesias added a commit to iglesias/shogun that referenced this issue Apr 17, 2018
Add input check and assertion in LMNN regarding k used in KNN and
the number of examples per class.
iglesias added a commit that referenced this issue Apr 20, 2018
Add input check and assertion in LMNN regarding k used in KNN and
the number of examples per class.
ktiefe pushed a commit to ktiefe/shogun that referenced this issue Jul 30, 2019
Add input check and assertion in LMNN regarding k used in KNN and
the number of examples per class.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Belgrade Sprint
  
Finito
Development

No branches or pull requests

5 participants