Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turn on data.table support in h2o R package by default #11521

Closed
exalate-issue-sync bot opened this issue May 12, 2023 · 6 comments
Closed

Turn on data.table support in h2o R package by default #11521

exalate-issue-sync bot opened this issue May 12, 2023 · 6 comments

Comments

@exalate-issue-sync
Copy link

Let's make the h2o R package use data.table by default (if installed) for the as.h2o() and as.data.frame() functions, rather than forcing user to set options("h2o.use.data.table"=TRUE) in their code. This gives a huge speed up!

@exalate-issue-sync
Copy link
Author

Erin LeDell commented: We are already forcing it properly here: https://github.com/h2oai/h2o-3/blob/master/h2o-r/h2o-package/R/frame.R#L3232

But here are two places where it relies on user options (which will not be set unless the user explicitly knows to do so).
https://github.com/h2oai/h2o-3/blob/master/h2o-r/h2o-package/R/frame.R#L3203
https://github.com/h2oai/h2o-3/blob/master/h2o-r/h2o-package/R/frame.R#L3367

Should we always force the use, even if the user has manually set it to false? I guess we should honor that too.

@exalate-issue-sync
Copy link
Author

Hugh Parsonage commented: The option should be TRUE or NULL by default, otherwise it causes performance regressions (e.g. as.h2o will not work unless a global option is set, which is opaque).

I don't think it makes much sense to turn off e.g. fwrite when it's available.

@exalate-issue-sync
Copy link
Author

Jan Sterba commented: Was possoble to do this for Matrix conversion, however using data.table for csv parsing needs to remain optional because of data.table buggy handling of quote-escaping.

@exalate-issue-sync
Copy link
Author

Matt Dowle commented: Jan - I find the phrase “data.table buggy handling of” a bit flippant and unhelpful. I thought data.table is one of the best file readers at handling quoting. It would be more helpful if you could point me to a specific issue you have faced. There are bugs yes but I don’t think it warrants that phrase.

@exalate-issue-sync
Copy link
Author

Jan Sterba commented: [~accountid:557058:393936ef-8683-427b-babb-14ffad4bb6d7] I am sorry you find my comment flippant, I did not mean to offend anyone.

I have spend plenty of time trying to make data.table#fread handle escaped quotes in CSV and was unable to, hence I concluded its simply impossible.

  • backslash-escaped quotes are parsed including the back-slash
  • double-double-qoute escaped quotes are also read as-is

This makes parsing single column csv with values with one quote inside impossible to parse. For more details see my comment in the pull request please.

@hasithjp
Copy link
Member

JIRA Issue Migration Info

Jira Issue: PUBDEV-4639
Assignee: Jan Sterba
Reporter: Erin LeDell
State: Resolved
Fix Version: 3.30.0.1
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#1409
#1453
#4265

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant