-
Notifications
You must be signed in to change notification settings - Fork 10.9k
Fix: response.json() call makes unnecessary memory allocation #6016
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6016 +/- ##
==========================================
+ Coverage 88.85% 88.87% +0.01%
==========================================
Files 162 162
Lines 11445 11445
Branches 1861 1861
==========================================
+ Hits 10170 10172 +2
Misses 968 968
+ Partials 307 305 -2
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my side It looks OK.
However I am still not sure about.. additional tests that need (or not need) to implement here (I will prefer to wait for reviews from other contributors).
@@ -82,7 +82,7 @@ def json(self): | |||
Deserialize a JSON document to a Python object. | |||
""" | |||
if self._cached_decoded_json is _NONE: | |||
self._cached_decoded_json = json.loads(self.text) | |||
self._cached_decoded_json = json.loads(self.body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior is not exactly the same, if the response is not UTF-8, and properly indicates it.
From https://datatracker.ietf.org/doc/html/rfc7159#section-8.1:
JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32.
From https://docs.python.org/3/library/json.html#json.loads:
s can now be of type bytes or bytearray. The input encoding should be UTF-8, UTF-16 or UTF-32.
So we should be fine in most cases after this change.
At the same time, SHALL in the RFC is not MUST (see https://datatracker.ietf.org/doc/html/rfc2119#section-3):
SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
I'm fine with making the change, but we should add an entry to the changelog, which explains that the change is backwards incompatible, and that Scrapy stops support some of these weird cases to improve the efficiency for the vast majority of cases. Users can always use json.loads(response.text)
in their code, instead of the response.json()
shortcut.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! I've just added an entry to the Backward-incompatible changes section in the changelog. Please let me know if I need to correct something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHALL != SHOULD.
MUST This word, or the terms "REQUIRED" or "SHALL", mean that the
definition is an absolute requirement of the specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, provided the release notes fix.
646d289
to
4dd3267
Compare
A test failed, but I think it's not related to the changes in this PR |
Thanks @jxlil! |
Now
response.json()
usesresponse.body
to retrieve thedict
object, so the str response is no longer stored in RAM.Closes #5968