Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arbitrary connection support #16

Closed
romainfrancois opened this issue Jan 20, 2014 · 5 comments
Closed

arbitrary connection support #16

romainfrancois opened this issue Jan 20, 2014 · 5 comments
Milestone

Comments

@romainfrancois
Copy link
Member

We need to leverage this: https://gist.github.com/romainfrancois/6119995

so that we can read a stream from an arbitrary connection. We would still use mmap for speed when it makes sense, but otherwise we can process the stream from a buffered connection.

In a threaded world we can imagine to separate the work into:

  • buffering data from the connection
  • process data

so that the thread(s) processing the data would not have to wait to process.

But before threads, we can sequentially retrieve data from the connection and process.

The problem I suppose is that I guess we can only read once from some connections, which might render difficult a two steps algorithm like counting the number of lines and then allocate data ...

@hadley
Copy link
Member

hadley commented Jan 20, 2014

Sounds like this is going to be a lot of work, and we'll need to implement quite different strategies for mmap files and connections. I think this means that connections should be at least a 0.2 feature.

@romainfrancois
Copy link
Member Author

Sure. I'll focus on mmap for now but keeping in mind extensibility. mmap is somewhat orthogonal to the processing. mmap is just a way to get a char*. The big win of mmap is that we get at once a pointer for all (or at least a big chunk of the data), for other connections I'll have to either do things char by char, or line by line using Rconn_fgetc or something.

But anyway, if we move this to a 0.2 version, then perhaps this gives me time to worry about connections upstream in Rcpp implementations.

@hadley
Copy link
Member

hadley commented Jan 21, 2014

Sounds good to me. I wonder if there's any support for mmapped gz/bz2 files. That's going to be a pretty common use case in my experience.

@romainfrancois
Copy link
Member Author

I don't think so, which is a reason why I'd want to be able to consume a connection through R internal api.

@romainfrancois
Copy link
Member Author

Apart from correct handling of the connection push back #19 I've leveraged what I need from the connection api.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants