Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion - enable option to save tables as parquet rather than csv #73

Closed
iainmwallace opened this issue Oct 9, 2019 · 3 comments
Closed

Comments

@iainmwallace
Copy link

Hi,

Based on this recent blog, https://ursalabs.org/blog/2019-10-columnar-perf/ it would be beneficial from a performance perspective to enable an option to store tables in parquet format instead of csv

arrow::write_parquet(iris,"iris.parquet")
arrow::read_parquet("iris.parquet")

Thanks

Iain

@javierluraschi
Copy link
Contributor

This PR should ease the performance issue a little bit:

#74

It basically allows you to pin something with I() to avoid additional files form being crated.

The suggestion to use parquet is terrific, we actually took a look at using arrow or feather a couple months ago. However, we need a way to preview feather and arrow datasets in the browser which, at the time we tried this out, had restrictions on the supported data types javascript could handle.

That said, this is something worth considering in the future and also allowing users to customize how data frames are persisted in RStudio Connect. We'll follow up with another PR to customize how support files are exported in pins.

javierluraschi added a commit that referenced this issue Oct 9, 2019
Support for I() to improve creation of pins and mitigate #73
@javierluraschi
Copy link
Contributor

Yeah, this is a must do at some point; however, I think it needs to be opt-it. The challenge is that CSVs are universal while Parquet is not... one might still want to open a pin in Excel or Google Sheets and I'm not sure when that will be supported, if ever.

I think the approach I would suggest here would be something like:

board_register(..., format = "parquet")

Which would opt-in to store everything as parquet and loose interoperability with CSVs, which some users would really appreciate since some are creating pins with pin(I(data), board = "") to force R to use RDS serialization and avoid the overhead of using CSVs. Users in this situation would be better served, in some cases where performance is not the top-priority, to save in Parquet instead of RDS.

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 31, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants