Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added multilabel reader in LibSVMFile. Fixed a bug in so_multiclass.cpp #2062

Closed
wants to merge 111 commits into from

Conversation

Jiaolong
Copy link
Contributor

The previous one #2039 is closed due to some unrelated testing error in Travis report.

The changes also include updating revision of data submodule, where multilabel dataset were added.

yorkerlin and others added 30 commits March 11, 2014 19:19
This reverts commit 9babe65.
Fixed "bullet lists" errors and a subscript error in Doxygen documentation.
…eHeaderFile

Update RandomKitchenSinksDotFeatures.h
Add more details on RKS algorithm.
@Jiaolong
Copy link
Contributor Author

I list the changes here again in order to link to the related issues:

(1) This PR is mainly for #1987: add reader/writer for MultilabelLabels. A new function has been added in LibSVMFile.

(2) An example has been added, see io_libsvm_multilabel.cpp. Scene and yeast datasets have been tested.

(3) A tiny bug in so_multiclass.cpp was fixed. It occurred when compiling in ubuntu with mosek installed.

@vigsterkr
Copy link
Member

@Jiaolong we need to at least TODO the previous discussion: i.e. let the file content decide whether it's multi- or single-labelled, not the API... this means to mark in the the file as well not just a PR comment that is being lost as soon as you delete/merge it.

@Jiaolong
Copy link
Contributor Author

Yes, I am still working on this branch. But I am not totally clear about the idea. Could you elaborate me a little bit more?

So, for example, can we just use a sparse matrix to output the labels? No matter single or multiple, just unify the output.

@Jiaolong
Copy link
Contributor Author

I mean to use a unified function, e.g.,

void get_sparse_matrix(SGSparseVector<float64_t>*& matrix_feat, int32_t & num_feat, int32_t & num_vec,  SGSparseVector<float64_t>*& matrix_label, int32_t & num_classes);

Then, the user doesn't need to specify the type of the file but get the labels from matrix_label, including single and multiple labels.

@vigsterkr
Copy link
Member

@Jiaolong mmm yeah, exactly that's what i wrote yesterday....
but we need to give helper functions so that one can convert SGSparseVector<float64_t>*& matrix_label into float64* if it's a single labelled.... so that actually one can supply that to CMulticlassLabels or CBinaryLabels

and no that helper function should not be implemented in LibSVMFile...

@Jiaolong
Copy link
Contributor Author

OK, cool! To make sure I have catch your idea, I put the changes as following:

(1) Change old get_sparse_matrix (float64_t* labels, ...) into get_sparse_matrix (SGSparseVector<float64_t>* mat_labels, ...), so we get unified get_sparse_matrix (SGSparseVector<float64_t>* mat_labels, ...) for both single and multiple labels

(2) We parse the file and know what is the type, and if it is single-labeled, we convert sparse matrix into float pointers. The convert (helper) function is implemented in CBinaryLabels and CMulticlassLabels

(3) So for the users, they only call CBinaryLabels and CMulticlassLabels while they don't actually need to know what's going on inside CLibSVMFile.

Am I right?

@Jiaolong Jiaolong closed this Mar 22, 2014
@Jiaolong Jiaolong deleted the io_libsvm_multilable branch March 22, 2014 16:57
@Jiaolong
Copy link
Contributor Author

I am sorry, the branch was polluted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet