[WIP] use response selector cache #3157

malloxpb · 2018-03-08T07:28:40Z

Use response selector cache instead of creating new one every time an ItemLoader object is created. As mentioned in issue #3125. I created this as a sample code to continue the discussion :)

codecov · 2018-03-08T07:52:17Z

Codecov Report

Merging #3157 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #3157      +/-   ##
==========================================
+ Coverage   82.11%   82.11%   +<.01%     
==========================================
  Files         228      228              
  Lines        9588     9589       +1     
  Branches     1385     1385              
==========================================
+ Hits         7873     7874       +1     
  Misses       1456     1456              
  Partials      259      259

Impacted Files	Coverage Δ
scrapy/loader/__init__.py	`94.55% <100%> (+0.03%)`	⬆️

kmike · 2018-03-10T18:19:53Z

scrapy/loader/__init__.py

@@ -26,7 +26,10 @@ class ItemLoader(object):

    def __init__(self, item=None, selector=None, response=None, parent=None, **context):
        if selector is None and response is not None:
-            selector = self.default_selector_class(response)
+            if response.selector.__class__ == Selector:


Hi @nctl144,

There are two issues with this line:

If a subclass overrides default_selector_class attribute, it won't have an effect after this change.

As I mentioned in https://github.com/scrapy/scrapy/issues/3125#issuecomment-366316774, this approach has an issue: when someone accesses response.selector first time, Scrapy parses response body; when this condition is False, we'd be parsing response body twice (here + later, at self.default_selector_class(response) line). This looks unnecesary.

malloxpb · 2018-03-11T02:32:08Z

Hey @kmike , I got what you mean. So I modified the code so that we use the response.selector even when the response._cached_selector is None. Since response.selector returns a new Selector object (of the current response) when the selector has no cache, I think the code is still the same right?
Please let me know what you think! :)

use response selector cache

8fd4508

kmike requested changes Mar 10, 2018

View reviewed changes

use response.selector right away

71dbbda

wRAR added the item loaders label Feb 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] use response selector cache #3157

[WIP] use response selector cache #3157

malloxpb commented Mar 8, 2018

codecov bot commented Mar 8, 2018 •

edited

Loading

kmike Mar 10, 2018

malloxpb commented Mar 11, 2018 •

edited

Loading

[WIP] use response selector cache #3157

Are you sure you want to change the base?

[WIP] use response selector cache #3157

Conversation

malloxpb commented Mar 8, 2018

codecov bot commented Mar 8, 2018 • edited Loading

Codecov Report

kmike Mar 10, 2018

Choose a reason for hiding this comment

malloxpb commented Mar 11, 2018 • edited Loading

codecov bot commented Mar 8, 2018 •

edited

Loading

malloxpb commented Mar 11, 2018 •

edited

Loading