You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since the input training data is often organized as the sparse matrix in many datasets, what is the format of such those sparse training instances presented in the input files? For example, if one training instance consists of 1000 dimensions, and 100 of those dimensions are not zero, then is the training instance presented in the input file by using 900 zeros and 100 non-zeros? The same sparse instances in the datasets which are published on the LibSVM website are organized as the format of "dimension_id : value".
Best wishes,
Yawei
The text was updated successfully, but these errors were encountered:
mlpack unfortunately doesn't have any current support for loading sparse matrices from disk. In addition, because of this, the command-line programs only load dense data.
So if you want to use sparse data specifically, I think the best way is to write a C++ program using arma::sp_mat. But to make it harder... Armadillo does not have good documentation for their support for loading sparse matrices. You can load a coordinate list of the form
1 2 10.3
3 1 5.2
3 2 1.3
and this represents a matrix with three nonzero elements. You can load it using the function
arma::sp_mat m;
m.load("file.txt", arma::coord_ascii);
and then you can use that in mlpack methods. I wish that this was documented in the Armadillo docs but currently it is not.
I hope this is helpful... let me know if I can clarify anything.
Let me know what you think, if anything can be clarified. I'll mark this as resolved since I've updated the documentation, but let me know if there is anything else to be done.
Hi,
Since the input training data is often organized as the sparse matrix in many datasets, what is the format of such those sparse training instances presented in the input files? For example, if one training instance consists of 1000 dimensions, and 100 of those dimensions are not zero, then is the training instance presented in the input file by using 900 zeros and 100 non-zeros? The same sparse instances in the datasets which are published on the LibSVM website are organized as the format of "dimension_id : value".
Best wishes,
Yawei
The text was updated successfully, but these errors were encountered: