Skip to content
This repository has been archived by the owner on Apr 14, 2022. It is now read-only.

Figure out what interface the Response object should expose for streaming data #123

Open
njsmith opened this issue Aug 31, 2019 · 1 comment

Comments

@njsmith
Copy link
Member

njsmith commented Aug 31, 2019

In urllib3, HTTPResponse inherits from io.IOBase and implements the standard Python file interface. This is an interesting issue that discusses a bug in their implementation, and also why people value this feature: urllib3/urllib3#1305

We need to decide what interface our version of HTTPResponse should expose. Constraints:

  • The async version can't inherit from IOBase, because IOBase is a sync interface, and for async we probably want to expose something compatible with trio's ReceiveStream API
  • But our sync version will need to provide some way to expose an IOBase, so people can keeping feeding it into the csv module etc., like they're doing in the issue I linked above
  • We generally try to make sync and async interfaces identical, modulo async markers

You'll notice that these constraints are contradictory :-). Especially since the ReceiveStream and IOBase interfaces actually conflict with each other: they both support iteration, but for ReceiveStream it gives arbitrary chunks of bytes, while for IOBase it gives lines.

Some options:

  • We could have HTTPResponse implement the IOBase API in sync mode, and the ReceiveStream API in async mode.
  • We could have it implement IOBase in sync mode, and an async version of IOBase in async mode, and also provide an as_stream() method
  • We could have it implement ReceiveStream in async mode, a sync-ified version of ReceiveStream in sync mode, and also provide an as_file() method.
    • Further options: in async mode, the as_file method could either be not defined at all, or it could still exist and return an async version of the file interface
  • We could have it implement neither interface directly, but provide both as_stream and as_file methods (and then there's the same question about which methods are available in which modes)
pquentin added a commit to pquentin/hip that referenced this issue Oct 8, 2019
…om-master-2019-07-26

I tried and fail to understand how we should adapt this commit to our
work. Given that the current situation is likely to change (see
python-trio#123), I did not merge the
actual functionality, but only the test of the default behavior (and the
CHANGES/docs as we're not currently touching those for easier merges).
@sethmlarson
Copy link
Contributor

Had more discussions about this, here's the results of that discussion:

  • Having a Response.as_file() method which exposes a file-like object that acts exactly as a usual open() file is desirable for csv readers, data-frames, etc
  • Something which has caused problems for urllib3 in the past is users seeing the interface of 'HTTPResponse' and noticing it inherits IOBase but doesn't exactly behave like a file. Let's not make that mistake again.
  • The as_file() interface shouldn't close once all data is read just like a real file but should close the contained Response object after all data is read. This prevents Responses from being left un-closed but also gives an interface that places nice with interfaces expecting files.

Also discussed the usage of .text()/.data()/.json() as methods that load all data onto the Response.
There were questions about whether you should be able to call .text() and then maybe .text() again.

If we are one-shot only then you'd only be able to call a single "body" function on a response. However @pquentin brought up that this would make things annoying for debugging as you'd have to reissue requests to get another body.

Another thought was that if a response body is completely loaded into memory we could hold onto that body internally on the request to allow calling .text(), .json(), and .data() again if needed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants