-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] use response selector cache #3157
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3157 +/- ##
==========================================
+ Coverage 82.11% 82.11% +<.01%
==========================================
Files 228 228
Lines 9588 9589 +1
Branches 1385 1385
==========================================
+ Hits 7873 7874 +1
Misses 1456 1456
Partials 259 259
|
scrapy/loader/__init__.py
Outdated
@@ -26,7 +26,10 @@ class ItemLoader(object): | |||
|
|||
def __init__(self, item=None, selector=None, response=None, parent=None, **context): | |||
if selector is None and response is not None: | |||
selector = self.default_selector_class(response) | |||
if response.selector.__class__ == Selector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nctl144,
There are two issues with this line:
- If a subclass overrides
default_selector_class
attribute, it won't have an effect after this change. - As I mentioned in https://github.com/scrapy/scrapy/issues/3125#issuecomment-366316774, this approach has an issue: when someone accesses
response.selector
first time, Scrapy parses response body; when this condition is False, we'd be parsing response body twice (here + later, atself.default_selector_class(response)
line). This looks unnecesary.
Hey @kmike , I got what you mean. So I modified the code so that we use the response.selector even when the response._cached_selector is None. Since response.selector returns a new Selector object (of the current response) when the selector has no cache, I think the code is still the same right? |
Use response selector cache instead of creating new one every time an ItemLoader object is created. As mentioned in issue #3125. I created this as a sample code to continue the discussion :)