Skip to content

Fix: response.json() call makes unnecessary memory allocation #6016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 31, 2023

Conversation

jxlil
Copy link
Contributor

@jxlil jxlil commented Aug 17, 2023

Now response.json() uses response.body to retrieve the dict object, so the str response is no longer stored in RAM.

Closes #5968

@codecov
Copy link

codecov bot commented Aug 17, 2023

Codecov Report

Merging #6016 (af7cad4) into master (3318971) will increase coverage by 0.01%.
Report is 3 commits behind head on master.
The diff coverage is 100.00%.

❗ Current head af7cad4 differs from pull request most recent head 4dd3267. Consider uploading reports for the commit 4dd3267 to get more accurate results

@@            Coverage Diff             @@
##           master    #6016      +/-   ##
==========================================
+ Coverage   88.85%   88.87%   +0.01%     
==========================================
  Files         162      162              
  Lines       11445    11445              
  Branches     1861     1861              
==========================================
+ Hits        10170    10172       +2     
  Misses        968      968              
+ Partials      307      305       -2     
Files Changed Coverage Δ
scrapy/extensions/corestats.py 100.00% <100.00%> (ø)
scrapy/extensions/feedexport.py 95.05% <100.00%> (+2.32%) ⬆️
scrapy/http/response/text.py 98.42% <100.00%> (ø)

... and 2 files with indirect coverage changes

@jxlil jxlil marked this pull request as ready for review August 17, 2023 18:03
Copy link
Contributor

@GeorgeA92 GeorgeA92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my side It looks OK.

However I am still not sure about.. additional tests that need (or not need) to implement here (I will prefer to wait for reviews from other contributors).

@@ -82,7 +82,7 @@ def json(self):
Deserialize a JSON document to a Python object.
"""
if self._cached_decoded_json is _NONE:
self._cached_decoded_json = json.loads(self.text)
self._cached_decoded_json = json.loads(self.body)
Copy link
Member

@kmike kmike Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior is not exactly the same, if the response is not UTF-8, and properly indicates it.

From https://datatracker.ietf.org/doc/html/rfc7159#section-8.1:

JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.

From https://docs.python.org/3/library/json.html#json.loads:

s can now be of type bytes or bytearray. The input encoding should be UTF-8, UTF-16 or UTF-32.

So we should be fine in most cases after this change.

At the same time, SHALL in the RFC is not MUST (see https://datatracker.ietf.org/doc/html/rfc2119#section-3):

SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.

I'm fine with making the change, but we should add an entry to the changelog, which explains that the change is backwards incompatible, and that Scrapy stops support some of these weird cases to improve the efficiency for the vast majority of cases. Users can always use json.loads(response.text) in their code, instead of the response.json() shortcut.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've just added an entry to the Backward-incompatible changes section in the changelog. Please let me know if I need to correct something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHALL != SHOULD.

MUST This word, or the terms "REQUIRED" or "SHALL", mean that the
definition is an absolute requirement of the specification.

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, provided the release notes fix.

@jxlil jxlil force-pushed the fix/response.json branch from 646d289 to 4dd3267 Compare August 21, 2023 14:30
@jxlil
Copy link
Contributor Author

jxlil commented Aug 23, 2023

A test failed, but I think it's not related to the changes in this PR

@kmike kmike merged commit 0fcb055 into scrapy:master Aug 31, 2023
@kmike
Copy link
Member

kmike commented Aug 31, 2023

Thanks @jxlil!

@jxlil jxlil deleted the fix/response.json branch August 31, 2023 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

response.json() call makes unnecessary memory allocation
4 participants