Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof-of-concept pluggable i/o #5

Closed
wants to merge 3 commits into from
Closed

Proof-of-concept pluggable i/o #5

wants to merge 3 commits into from

Conversation

richfitz
Copy link
Member

@richfitz richfitz commented Jul 18, 2018

DO NOT MERGE

This is a starting point in lieu of a long description - this is how I'd be inclined to start expanding the i/o to remove the strong readr dependency and to allow non tsv formats. Only formats that allow append will work (so row-wise not column-wise).

In the form as submitted this supports the current behaviour of the package via readr with an entirely untested bit of scaffolding for a base R version.

Pending, should this approach feel useful

  • The name streamable_table should be checked carefully
  • Document and export the functions in R/stream.R
  • Testing throughput
  • Replace stopifnot with reasonable assertions
  • Deal with compression in the read side of things - I think that filename sniffing is the way to go here since this is the inverse function to ark
  • In this version, readr is annoyingly chatty and that should be dialed back

For #3

@cboettig
Copy link
Member

This looks very compelling to me.

Regarding the compression, this is only an issue on the write side of things, right? (i.e. we read .tsv.gz in via the appropriate gzfile() in unark already, but we writing tsv.gz only happens because write_tsv() parses the name. It looks like write_tsv and write.table support a connection just like the read-type functions, so we should probably support that.

This would restrict us to functions that support R connections as well as file names, but at least it would mean opening up all the base read.* / write.* functions as well as the readr ones, which would be a considerable improvement.

@cboettig cboettig mentioned this pull request Jul 21, 2018
12 tasks
@krlmlr
Copy link

krlmlr commented Jul 21, 2018

Could a DBI connection be just another instance of a streamable table then?

@cboettig
Copy link
Member

@krlmlr Yes, that's an excellent suggestion, I imagine it could. Should be relatively straight forward to copy chunk by chunk for a DBI connection, I imagine in that case we would basically just ignore any option regarding compression(?)

I think we do want to generalize beyond tsv, but still need to think about how best to get the abstraction right. In my above comments I was thinking 'anything that can read / write through an R connection (e.g. gzfile() connection), but supporting a DBI connection as the streamable_table makes sense too but doesn't fit that mold. suggestions on how to deal with that would be welcome!

@richfitz
Copy link
Member Author

In #7, @cboettig wrote:

(in #5, looks like @richfitz is advising me to implement something along those lines, so to speak, rather than intending I merge that directly. I do think #5 looks like a solid building block so I'm tempted to merge and deal with the follow-up later...)

Sorry @cboettig - to be clear - this can be made mergeable but you would want to fix the assorted issues flagged in my opening comment I suspect 😀

@cboettig
Copy link
Member

@richfitz thanks for this. I've implemented a version of this based on your proposal in #10. Needed to tweak the base methods a bit to get them to work, but I think this checks the boxes above now. I've made the base methods the default and to drop the hard dependency on readr, though I keep opt-int streamable functions for readr tables as well.

@cboettig cboettig closed this Jul 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants