The format of the sparse instances in the input file #656

YaweiZhao · 2016-05-29T09:49:28Z

Hi,

Since the input training data is often organized as the sparse matrix in many datasets, what is the format of such those sparse training instances presented in the input files? For example, if one training instance consists of 1000 dimensions, and 100 of those dimensions are not zero, then is the training instance presented in the input file by using 900 zeros and 100 non-zeros? The same sparse instances in the datasets which are published on the LibSVM website are organized as the format of "dimension_id : value".

Best wishes,

Yawei

rcurtin · 2016-05-30T22:54:57Z

Hi Yawei,

mlpack unfortunately doesn't have any current support for loading sparse matrices from disk. In addition, because of this, the command-line programs only load dense data.

So if you want to use sparse data specifically, I think the best way is to write a C++ program using arma::sp_mat. But to make it harder... Armadillo does not have good documentation for their support for loading sparse matrices. You can load a coordinate list of the form

1 2 10.3
3 1 5.2
3 2 1.3

and this represents a matrix with three nonzero elements. You can load it using the function

arma::sp_mat m;
m.load("file.txt", arma::coord_ascii);

and then you can use that in mlpack methods. I wish that this was documented in the Armadillo docs but currently it is not.

I hope this is helpful... let me know if I can clarify anything.

YaweiZhao · 2016-05-31T02:55:40Z

Hi Ryan,

Thanks for your answer! It really helps me understand how to use MLPACK. Could I (or you) add your suggestion into the doc of MLPACK?

rcurtin · 2016-05-31T18:43:07Z

I updated the documentation in e36eec5; it's online at
http://mlpack.org/docs/mlpack-git/doxygen.php?doc=formatdoc.html

Let me know what you think, if anything can be clarified. I'll mark this as resolved since I've updated the documentation, but let me know if there is anything else to be done.

Thanks for pointing this out!

Ryan

YaweiZhao · 2016-06-04T01:48:15Z

Hi Ryan,

I have read the update documentation. I think it is clear and understandable. Thanks for your time. Nice work!!!!

Yawei

rcurtin closed this as completed May 31, 2016

rcurtin added s: fixed t: bug report labels May 31, 2016

rcurtin added this to the mlpack 2.0.2 milestone May 31, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The format of the sparse instances in the input file #656

The format of the sparse instances in the input file #656

YaweiZhao commented May 29, 2016

rcurtin commented May 30, 2016

YaweiZhao commented May 31, 2016

rcurtin commented May 31, 2016

YaweiZhao commented Jun 4, 2016

The format of the sparse instances in the input file #656

The format of the sparse instances in the input file #656

Comments

YaweiZhao commented May 29, 2016

rcurtin commented May 30, 2016

YaweiZhao commented May 31, 2016

rcurtin commented May 31, 2016

YaweiZhao commented Jun 4, 2016