You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After some format detection fixes, we now have a few calls to seek() in the CSV module. Those cannot work on urllib-style http request data. One of the main use cases for messytables is to do streaming web data. We should remove these calls, even if this results in a loss of functionality wrt. type detection.
The text was updated successfully, but these errors were encountered:
At present urls do not cause a problem because of hack in any.py to make_stream_seekable (which reads entire file into memory). I imagine that hack should go as part of this refactor (?)
Alternative solutions (to removing use of seek):
implement a urllib style urlopen that supports seek - behind the scenes it could just make multiple urlopen calls, and if full seek support is needed you could use HTTP Range headers (see this example)
A simple alternative given that all we need is seek(0) is to add some kind of intermediate wrapper around the stream that buffers, say, the first 10k/100k bytes and allows seek within that
That said we only seem to have 2 places seek is used (commas.py) and one place in any.py.
After some format detection fixes, we now have a few calls to seek() in the CSV module. Those cannot work on urllib-style http request data. One of the main use cases for messytables is to do streaming web data. We should remove these calls, even if this results in a loss of functionality wrt. type detection.
The text was updated successfully, but these errors were encountered: