[MRG+1] Preserve request class when converting to/from dicts (utils.reqser) #2510

elacuesta · 2017-01-24T14:34:24Z

Attempting to fix #1890

codecov-io · 2017-01-24T14:51:10Z

Current coverage is 83.46% (diff: 100%)

Merging #2510 into master will increase coverage by <.01%

@@             master      #2510   diff @@
==========================================
  Files           161        161          
  Lines          8780       8784     +4   
  Methods           0          0          
  Messages          0          0          
  Branches       1288       1289     +1   
==========================================
+ Hits           7328       7332     +4   
  Misses         1204       1204          
  Partials        248        248

Powered by Codecov. Last update 4620d2f...53757e5

elacuesta · 2017-01-24T15:03:28Z

scrapy/utils/reqser.py

@@ -20,6 +19,7 @@ def request_to_dict(request, spider=None):
    if callable(eb):
        eb = _find_method(spider, eb)
    d = {
+        '__class__': request.__class__,


This doesn't work well with scrapy.squeues.MarshalFifoDiskQueue and scrapy.squeues.MarshalLifoDiskQueue, the resulting dicts cannot be serialized. I'm working on it.

I think it make sense not to store default Request class, and store request class only if it is not default:

it would allow to save some memory and CPU;

by supporting queues without __class__ key we're making this PR compatible with disk queues created by previous Scrapy versions.

That was fast! I made a few modifications to this PR and now it stores a string pointing to the class, this allows the dict to be serialized using Marshal-based queues. It also works well with requests without __class__ key, falling back to the default Request, but it is true that we would save some resources saving the class only when it's not Request; let me patch that.

@kmike Done, please check again

voith · 2017-01-24T18:47:54Z

scrapy/utils/reqser.py

@@ -47,7 +50,11 @@ def request_from_dict(d, spider=None):
    eb = d['errback']
    if eb and spider:
        eb = _get_method(spider, eb)
-    return Request(
+    try:


What has been the thought process here to use a try except block? Why not:

request_cls = load_object(d['_class']) if '_class' in d else Request

This makes more sense to me since d['_class'] is only set when type(request) is not Request and since d['_class'] stores import path of request, load_object (d['_class']) should not error. Or is there a case when it will throw an error which I'm unaware of?

The load_object call could fail as well. Imagine you have a disk queue containing scrapy_splash.request.SplashRequest and you try to read the queue from an enviroment which doesn't have scrapy_splash installed.
I'm not saying how often that would happen, just that it could happen :-)

I think in this case it is better to fail loudly with an exception.

@elacuesta If I understood you right, what you're trying to say is that requests were stored to a disk queue by a separate project and were read by another. I don't know if that is asensible use case. My opinion here goes with Mikhail i.e fail in such a case.

I can make it fail, I was just following (what I thought was) the spirit of your comment @kmike.
It would also be a bit backward incompatible, currently it wouldn't fail but yield a regular Request object.

Done, thanks for the suggestion @voith

elacuesta · 2017-02-03T11:58:58Z

Hello there @kmike, looking forward to reading your comments after the latest changes.

kmike · 2017-02-06T19:57:49Z

Looks good! Thanks @elacuesta and @voith.

redapple · 2017-02-08T17:30:29Z

Thanks @elacuesta

elacuesta commented Jan 24, 2017

View reviewed changes

elacuesta force-pushed the reqser_request_class branch 4 times, most recently from 93ba1e9 to 01d83c9 Compare January 24, 2017 16:43

voith reviewed Jan 24, 2017

View reviewed changes

Preserve request class when converting to/from dicts (utils.reqser)

53757e5

elacuesta force-pushed the reqser_request_class branch from 01d83c9 to 53757e5 Compare January 24, 2017 20:13

kmike changed the title ~~Preserve request class when converting to/from dicts (utils.reqser)~~ [MRG+1] Preserve request class when converting to/from dicts (utils.reqser) Feb 6, 2017

kmike added this to the v1.4 milestone Feb 6, 2017

redapple merged commit f32a229 into scrapy:master Feb 8, 2017

elacuesta deleted the reqser_request_class branch February 8, 2017 17:32

kmike mentioned this pull request Feb 15, 2017

using custom request types for sitemap spider #2565

Open

kmike removed this from the v1.4 milestone Feb 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+1] Preserve request class when converting to/from dicts (utils.reqser) #2510

[MRG+1] Preserve request class when converting to/from dicts (utils.reqser) #2510

elacuesta commented Jan 24, 2017 •

edited

Loading

codecov-io commented Jan 24, 2017 •

edited

Loading

elacuesta Jan 24, 2017

kmike Jan 24, 2017 •

edited

Loading

elacuesta Jan 24, 2017

elacuesta Jan 24, 2017

voith Jan 24, 2017

elacuesta Jan 24, 2017

kmike Jan 24, 2017

voith Jan 24, 2017 •

edited

Loading

elacuesta Jan 24, 2017

elacuesta Jan 24, 2017

elacuesta commented Feb 3, 2017

kmike commented Feb 6, 2017

redapple commented Feb 8, 2017

[MRG+1] Preserve request class when converting to/from dicts (utils.reqser) #2510

[MRG+1] Preserve request class when converting to/from dicts (utils.reqser) #2510

Conversation

elacuesta commented Jan 24, 2017 • edited Loading

codecov-io commented Jan 24, 2017 • edited Loading

Current coverage is 83.46% (diff: 100%)

Choose a reason for hiding this comment

kmike Jan 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

voith Jan 24, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elacuesta commented Feb 3, 2017

kmike commented Feb 6, 2017

redapple commented Feb 8, 2017

elacuesta commented Jan 24, 2017 •

edited

Loading

codecov-io commented Jan 24, 2017 •

edited

Loading

kmike Jan 24, 2017 •

edited

Loading

voith Jan 24, 2017 •

edited

Loading