Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize load csv #678

Closed

Conversation

stereomatchingkiss
Copy link
Contributor

Hi, I use boost::spirit to implement the csv parser, it is more memory efficient and faster.

parse file with 1 million lines, 39796KByte

spirit version :

transpose : 2151 msec
non transpose : 4073 msec

old version :

transpose : 9616 msec
non transpose : 10131 msec

non transpose version is slower, I guess it is because arma::Mat is column wise.

Upload for code reviews, haven't integrate it into the load function and run the test cases yet.

ps : Single thread only, do not know multi-thread can make performance become better or worse, DataSetInfo is not a lock free data structure. If we want to utilize the power of multi-thread, I think we could read a bunch of string into the vector, create thread pool and DataSetInfo vectors, merge the DataSetInfo at last.

rcurtin and others added 30 commits May 16, 2016 14:57
…itm for classification models. To be precise, this is is a Variance Reduces classification reinforcement learning rule.
…xtract a retina-like representation of the input image.
 Instead of including: methods/neighbor_search/ns_traversal_info.hpp
 Include the definition in: core/tree/traversal_info.hpp
Properly use Enum type, in rann and range_search.
Remove duplicated code for traversal info.
Deprecated arma function replaced by new arma constant
rcurtin and others added 21 commits June 2, 2016 09:49
…ples.

So we'll have to wait until mlpack 2.0.1 to remove it... :(
add cli executable for data_split
Marcus thinks this will fix the Windows build... let's find out.
2 : fix bug, cannot parse transpose file with correct result
@nilayjain nilayjain force-pushed the master branch 2 times, most recently from fddfc18 to 1f562a1 Compare June 5, 2016 12:00
@stereomatchingkiss
Copy link
Contributor Author

Why there are so many commits on this branch suddenly?

@keon
Copy link
Member

keon commented Jun 5, 2016

@stereomatchingkiss hmm... did you amend it with the merge commit?

@rcurtin
Copy link
Member

rcurtin commented Jun 5, 2016

Someone force pushed to the repo and I am in the process of fixing it. That will fix this problem.

@stereomatchingkiss
Copy link
Contributor Author

stereomatchingkiss commented Jun 5, 2016

@stereomatchingkiss hmm... did you amend it with the merge commit?

No, I only merge pull request(#650) one time.

Someone force pushed to the repo and I am in the process of fixing it. That will fix this problem.

Thanks for the fix.

Edit : Already integrate into load_impl.hpp, pass all of the test cases

@stereomatchingkiss
Copy link
Contributor Author

@rcurtin Would it make things easier to deal with if I delete this branch and open a new one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants