Improved decoding support for Response.iter_content and iter_lines #1937

Merged
merged 3 commits into from May 12, 2014

Projects

None yet

4 participants

@jaraco
Contributor
jaraco commented Mar 4, 2014

This PR adds basic documentation for the decode_unicode parameter to iter_content (and implicitly for iter_lines). It also ensures the now-documented behavior is correct and consistent, specifically by honoring the parameter in all cases and not just in some (previously only apparent by reading the source).

@Lukasa
Collaborator
Lukasa commented Mar 4, 2014

Thanks for this!

In general this fix looks great, thanks! Would you mind providing a test? One that confirms that you get back unicode in the cases in question would be good enough.

@sigmavirus24 sigmavirus24 commented on the diff Mar 5, 2014
requests/models.py
@@ -642,12 +641,17 @@ def generate():
self._content_consumed = True
- gen = generate()
+ # simulate reading small chunks of the content
+ reused_chunks = iter_slices(self._content, chunk_size)
@sigmavirus24
sigmavirus24 Mar 5, 2014 Collaborator

Call me crazy but this would look better if it were simply:

if self._content_consumed:
    chunks = iter_slices(self._content, chunk_size)
else:
    chunks = generate()

Ternaries are not used in the requests code base and are generally advised against by PEP8 (one of the few places requests' style and PEP8 happen to coincide). You're also unconditionally creating two iterators which incurs the cost of creation and garbage collection (which has two parts: decrementing the ref count and collecting the object on the next sweep of the garbage collector) as opposed to creating one iterator (and binding) and returning it.

@jaraco
jaraco Mar 5, 2014 Contributor

You're welcome to adjust the style to suit the projects' preferences. I prefer ternary statements for assignment because it communicates better:

  • the left hand argument is referenced exactly once.
  • the assignment is obvious (there's no code path that could avoid assignment).
  • the branch is obvious (one or the other).
  • there's no temptation to inject additional behavior as there might be with a code block.

I did recognize the unconditional construction of the two iterators, but I considered the costs to be negligible. In general, I would only consider costs of function calls and object construction/destruction in highly sensitive tight loops. Perhaps this method falls into that category for some applications, but I suspect not as it's likely to be I/O bound instead. That said, if you wanted to keep the ternary operator but not have the unconditional construction, you could easily:

chunks = iter_slices(self._content, chunk_size) if self._content_consumed else generate()

I chose to create variables in an attempt to create more self-documenting code. Giving names to reused_chunks and stream_chunks adds clarity to the behavior. I believe the clarity of purpose outweighs the performance costs of unconditional construction.

But again, please feel free to accept my contribution as is and then tweak it to suit your preferences.

@kennethreitz kennethreitz merged commit 569cd23 into kennethreitz:master May 12, 2014

1 check passed

default Merged build finished.
Details
@kennethreitz
Owner

🍰

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment