-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turn on data.table support in h2o R package by default #11521
Comments
Erin LeDell commented: We are already forcing it properly here: https://github.com/h2oai/h2o-3/blob/master/h2o-r/h2o-package/R/frame.R#L3232 But here are two places where it relies on user options (which will not be set unless the user explicitly knows to do so). Should we always force the use, even if the user has manually set it to false? I guess we should honor that too. |
Hugh Parsonage commented: The option should be I don't think it makes much sense to turn off e.g. |
Jan Sterba commented: Was possoble to do this for Matrix conversion, however using data.table for csv parsing needs to remain optional because of data.table buggy handling of quote-escaping. |
Matt Dowle commented: Jan - I find the phrase “data.table buggy handling of” a bit flippant and unhelpful. I thought data.table is one of the best file readers at handling quoting. It would be more helpful if you could point me to a specific issue you have faced. There are bugs yes but I don’t think it warrants that phrase. |
Jan Sterba commented: [~accountid:557058:393936ef-8683-427b-babb-14ffad4bb6d7] I am sorry you find my comment flippant, I did not mean to offend anyone. I have spend plenty of time trying to make data.table#fread handle escaped quotes in CSV and was unable to, hence I concluded its simply impossible.
This makes parsing single column csv with values with one quote inside impossible to parse. For more details see my comment in the pull request please. |
Let's make the h2o R package use data.table by default (if installed) for the as.h2o() and as.data.frame() functions, rather than forcing user to set
options("h2o.use.data.table"=TRUE)
in their code. This gives a huge speed up!The text was updated successfully, but these errors were encountered: