Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove all instances of seek() so that we can handle streaming data again #24

Closed
pudo opened this issue Dec 14, 2012 · 2 comments · Fixed by #27
Closed

Remove all instances of seek() so that we can handle streaming data again #24

pudo opened this issue Dec 14, 2012 · 2 comments · Fixed by #27

Comments

@pudo
Copy link
Contributor

pudo commented Dec 14, 2012

After some format detection fixes, we now have a few calls to seek() in the CSV module. Those cannot work on urllib-style http request data. One of the main use cases for messytables is to do streaming web data. We should remove these calls, even if this results in a loss of functionality wrt. type detection.

@rufuspollock
Copy link
Member

Some thoughts:

  • At present urls do not cause a problem because of hack in any.py to make_stream_seekable (which reads entire file into memory). I imagine that hack should go as part of this refactor (?)

Alternative solutions (to removing use of seek):

  • implement a urllib style urlopen that supports seek - behind the scenes it could just make multiple urlopen calls, and if full seek support is needed you could use HTTP Range headers (see this example)
  • A simple alternative given that all we need is seek(0) is to add some kind of intermediate wrapper around the stream that buffers, say, the first 10k/100k bytes and allows seek within that

That said we only seem to have 2 places seek is used (commas.py) and one place in any.py.

@rufuspollock
Copy link
Member

@domoritz is there a status update here - i know you were chatting with @pudo on this (and see you've started some work in a branch).

It would be really nice to get this fixed (in particular it would allow us to upgrade dataproxy which would be super useful!)

/cc @nigelbabu

rufuspollock added a commit that referenced this issue Jan 5, 2013
Enable support for non seekable resources - fixes #24.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants