Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example for learning/predicting multiclass on own data #1868

Closed
tklein23 opened this issue Feb 13, 2014 · 8 comments
Closed

Example for learning/predicting multiclass on own data #1868

tklein23 opened this issue Feb 13, 2014 · 8 comments

Comments

@tklein23
Copy link
Contributor

This task is about creating examples in any of the available interfaces to train a multiclass SVM and to predict multiclass labels using the learned SVM:

$ python train_multiclass_svm.py train.data train.labels multiclass-svm.model
$ python predict_multiclass_svm.py multiclass-svm.model eval.data predicted.labels 

The example should be as simple as possible. The goals are:

  • applying algorithms to own data without a need to change the scripts/sources
  • providing a simple examples for new developers to start own scripts/experiments

Additionally, we could try to provide a script, that evaluates the outcome of the above scripts with. For example:

$ python evaluate_multiclass_labels.py eval.label predicted.labels 

Disclaimer: This task could also be solved with other interfaces than python and for other algorithms that multilabel SVMs.

@PirosB3
Copy link
Contributor

PirosB3 commented Mar 7, 2014

I would love to do this task, can I start?
This would be also good experience for me because it somewhat regards my SoC proposal.

Thanks

@PirosB3
Copy link
Contributor

PirosB3 commented Mar 7, 2014

Hi @tklein23
What type of format would you like the *.data and *.labels? I was thinking it would be a good idea to add something standard like svmlight. In this way, we reduce the files from 2 to 1 (svmlight contains both target and features) and provide a tool that is compatible with standard formats.

.=. : : ... : #

What do you think?

Thanks,
Dan

@tklein23
Copy link
Contributor Author

tklein23 commented Mar 7, 2014

Hey Dan! Feel free to take this task!

I totally agree that we should use standard formates. I think we can go with SVMlighs format, like shown in data/toy/7class_example4_train.light. No need to split it into two files.

The goal is simply to have something that can be applied easily to own data (without touching source). Btw., you're not limited to Python - feel free to do it in C++ if you like to.

PirosB3 added a commit to PirosB3/shogun that referenced this issue Mar 7, 2014
@PirosB3
Copy link
Contributor

PirosB3 commented Mar 7, 2014

@tklein23 I have started working on this new feature. When you have a second can you see my initial commit? it's no where near to a production version, but if there is something I am doing wrong please let me know.

Also, wouldn't it be better to have a single file that does training+evaluation? (like you said: evaluate_multiclass_labels.py, but without having the other two scripts) there is a lot of reusable functionality between the two.

Let me know,
Dan

@vigsterkr
Copy link
Member

@PirosB3 we would like you to send a PR (pull request) instead of asking people to check on your forked repository... it is essential that you start working with PRs as that's how we do development during the whole GSoC.

even if your code is not ready it's ok to send a PR as we'll discuss things in that PR and then you can change and add more commits to the PR obviously...

@PirosB3
Copy link
Contributor

PirosB3 commented Mar 7, 2014

Perfect!
I will send a PR

2014-03-07 13:31 GMT+00:00 Viktor Gal notifications@github.com:

@PirosB3 https://github.com/PirosB3 we would like you to send a PR
(pull request) instead of asking people to check on your forked
repository... it is essential that you start working with PRs as that's how
we do development during the whole GSoC.

even if your code is not ready it's ok to send a PR as we'll discuss
things in that PR and then you can change and add more commits to the PR
obviously...

Reply to this email directly or view it on GitHubhttps://github.com//issues/1868#issuecomment-37023878
.


PirosB3

https://github.com/PirosB3 http://pirosb3.com

@tklein23
Copy link
Contributor Author

tklein23 commented Mar 7, 2014

You wrote: Also, wouldn't it be better to have a single file that does training+evaluation? (like you said: evaluate_multiclass_labels.py, but without having the other two scripts) there is a lot of reusable functionality between the two.

I think it's better to have individual scripts for training, predicting and evaluation. Evaluation for example is not limited to a specific learning algorithm; you can use it to evaluate everything that outputs multiclass-labels.

Anyway, if you see reusable code, try to use methods/includes/whatever-codeblocks for it.

@karlnapf
Copy link
Member

karlnapf commented Mar 7, 2014

As for format, keep in mind we can serialise objects in Shogun

cameo54321 pushed a commit to cameo54321/shogun that referenced this issue Mar 17, 2014
@tklein23 tklein23 closed this as completed Apr 3, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants