Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"SVM set failed" exception #5034

Closed
javiermaldonadoc opened this issue May 14, 2020 · 16 comments
Closed

"SVM set failed" exception #5034

javiermaldonadoc opened this issue May 14, 2020 · 16 comments
Labels

Comments

@javiermaldonadoc
Copy link

Hello,
I'm using a I'm using the MultiClassLibSVM with 5 class dataset. But in the end run throw an exception "SVM set failed". Also, the process is very slow, about two hours to process a training dataset of 125K rows and 41 columns (NSL-KDD dataset).
As additional information: I'm tried the same dataset with random forest and CART with no problem.

This is the function with the corresponding output.
SVM-output.txt

@karlnapf
Copy link
Member

karlnapf commented May 14, 2020 via email

@javiermaldonadoc
Copy link
Author

javiermaldonadoc commented May 14, 2020

Thanks for your response,

I've made the transposition and give me this error:
Created. Training...[05/14/20 17:31:27 info] 125973 trainlabels, 5 classes
[05/14/20 17:31:27 error] Number of training vectors does not match number of labels
terminate called after throwing an instance of 'shogun::ShogunException'
what(): Number of training vectors does not match number of labels

And when i try to run random forest and CART with this transposition, give me this error:
terminate called after throwing an instance of 'fmt::v6::format_error'
what(): argument index out of range
Aborted (core dumped)

Before transposing, random forest and CART, works...
Any idea? Thanks!

Hi Shogun stores data in column major format. This mean your data is interpreted as 41 vectors of dimension 125k, which probably causes problems. Try transposing.... Best H
On Thu, 14 May 2020 at 21:47, javiermaldonadoc @.***> wrote: Hello, I'm using a I'm using the MultiClassLibSVM with 5 class dataset. But in the end run throw an exception "SVM set failed". Also, the process is very slow, about two hours to process a training dataset of 125K rows and 41 columns (NSL-KDD dataset). As additional information: I'm tried the same dataset with random forest and CART with no problem. This is the function with the corresponding output. SVM-output.txt https://github.com/shogun-toolbox/shogun/files/4630785/SVM-output.txt — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#5034>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFKVP6STJZOZIPVYT6OIFLRRRKFRANCNFSM4NBAR4UA .
-- Sent from my phone

@javiermaldonadoc
Copy link
Author

By executing
features_train = create(f_feats_train);
cout<<"Features Train: "<<features_train->get_num_vectors()<<endl;
features_test = create(f_feats_test);
cout<<"Features Test: "<<features_test->get_num_vectors()<<endl;

I got these results:
Transposing:
Features Train: 41
Features Test: 41

Without transposing:
Features Train: 125973
Features Test: 22543

The last results is consistent with the number of cases (rows) and the first is consistent with the number of features (columns)... This information could help?

Thank you very much!

@karlnapf
Copy link
Member

The error message seems to be pretty clear to me?
Number of training vectors does not match number of labels
How many labels do you pass?

@karlnapf
Copy link
Member

@gf712 the error he gets on cart seems to have to do with the fmt lib crashing? :D

@gf712
Copy link
Member

gf712 commented May 15, 2020

Yes, seems like there is a formatting error.. @vigsterkr @theartful can we switch on the compile time checks of fmt to avoid this type of error at runtime? I am talking about https://fmt.dev/latest/api.html#format-api

@gf712
Copy link
Member

gf712 commented May 15, 2020

@javiermaldonadoc get_num_vectors gets you the number of examples, not the feature count (from the docs "get number of examples/vectors, possibly corresponding to the current subset")

@ghost
Copy link

ghost commented May 15, 2020

@gf712 FMT_STRING requires a constexpr string literal. I don't know how to make it work without either using a macro or changing every format string at call site to call FMT_STRING. https://gist.github.com/theartful/01060830399985b72c817f39de8be7ed

@gf712
Copy link
Member

gf712 commented May 15, 2020

@gf712 FMT_STRING requires a constexpr string literal. I don't know how to make it work without either using a macro or changing every format string at call site to call FMT_STRING. https://gist.github.com/theartful/01060830399985b72c817f39de8be7ed

Hmmm that is annoying.. I guess we just have to look for the bug then. Or we go back to using macros to do these checks. @karlnapf @vigsterkr ?

@karlnapf
Copy link
Member

no macros please :)
I guess this should be relatively easy to debug. I have the feeling it probably is due to a really long sequence being requested or something

@javiermaldonadoc
Copy link
Author

@javiermaldonadoc get_num_vectors gets you the number of examples, not the feature count (from the docs "get number of examples/vectors, possibly corresponding to the current subset")

Yes, that's correct and i'm checking that using these function jointly with transpose function. I have 41 features and 125K of examples in training and 22K in testing and is consistent with the given numbers by using this function.

@javiermaldonadoc
Copy link
Author

The error message seems to be pretty clear to me?
Number of training vectors does not match number of labels
How many labels do you pass?

I checked that those numbers are the same 125K in both cases, labels and vectors.

@karlnapf
Copy link
Member

karlnapf commented May 18, 2020

strange. Could you post a (preferably minimal standalone with synthetic data) example to reproduce this issue? Maybe there is a problem in the multiclass codes ...

@javiermaldonadoc
Copy link
Author

strange. Could you post a (preferably minimal standalone with synthetic data) example to reproduce this issue? Maybe there is a problem in the multiclass codes ...

Thank you!
Here is a link to the used datasets. https://drive.google.com/open?id=1Cz0ju3xfpw70Ve8PJ9AeQcZt_RPR3Ziq
You could use the small one (KDD-Test*), is the smallest of all datasets. An my code https://pastebin.com/HMAS9dEL, here is a full functional example (load data, create features and labels sets, training and testing SVM). As a reference, i'm using the NSL-KDD dataset https://www.unb.ca/cic/datasets/nsl.html

Hope this work for you! Again, thank you very much! I really appreciate that!

@stale
Copy link

stale bot commented Nov 14, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 14, 2020
@stale
Copy link

stale bot commented Nov 21, 2020

This issue is now being closed due to a lack of activity. Feel free to reopen it.

@stale stale bot closed this as completed Nov 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants