[MRG+1] Preserve request class when converting to/from dicts (utils.reqser) #2510
Conversation
Current coverage is 83.46% (diff: 100%)@@ master #2510 diff @@
==========================================
Files 161 161
Lines 8780 8784 +4
Methods 0 0
Messages 0 0
Branches 1288 1289 +1
==========================================
+ Hits 7328 7332 +4
Misses 1204 1204
Partials 248 248
|
@@ -20,6 +19,7 @@ def request_to_dict(request, spider=None): | |||
if callable(eb): | |||
eb = _find_method(spider, eb) | |||
d = { | |||
'__class__': request.__class__, |
elacuesta
Jan 24, 2017
Author
Member
This doesn't work well with scrapy.squeues.MarshalFifoDiskQueue
and scrapy.squeues.MarshalLifoDiskQueue
, the resulting dicts cannot be serialized. I'm working on it.
This doesn't work well with scrapy.squeues.MarshalFifoDiskQueue
and scrapy.squeues.MarshalLifoDiskQueue
, the resulting dicts cannot be serialized. I'm working on it.
kmike
Jan 24, 2017
•
Member
I think it make sense not to store default Request class, and store request class only if it is not default:
- it would allow to save some memory and CPU;
- by supporting queues without
__class__
key we're making this PR compatible with disk queues created by previous Scrapy versions.
I think it make sense not to store default Request class, and store request class only if it is not default:
- it would allow to save some memory and CPU;
- by supporting queues without
__class__
key we're making this PR compatible with disk queues created by previous Scrapy versions.
elacuesta
Jan 24, 2017
Author
Member
That was fast! I made a few modifications to this PR and now it stores a string pointing to the class, this allows the dict to be serialized using Marshal
-based queues. It also works well with requests without __class__
key, falling back to the default Request
, but it is true that we would save some resources saving the class only when it's not Request
; let me patch that.
That was fast! I made a few modifications to this PR and now it stores a string pointing to the class, this allows the dict to be serialized using Marshal
-based queues. It also works well with requests without __class__
key, falling back to the default Request
, but it is true that we would save some resources saving the class only when it's not Request
; let me patch that.
93ba1e9
to
01d83c9
@@ -47,7 +50,11 @@ def request_from_dict(d, spider=None): | |||
eb = d['errback'] | |||
if eb and spider: | |||
eb = _get_method(spider, eb) | |||
return Request( | |||
try: |
voith
Jan 24, 2017
What has been the thought process here to use a try except block? Why not:
request_cls = load_object(d['_class']) if '_class' in d else Request
This makes more sense to me since d['_class']
is only set when type(request) is not Request
and since d['_class']
stores import path of request, load_object (d['_class'])
should not error. Or is there a case when it will throw an error which I'm unaware of?
What has been the thought process here to use a try except block? Why not:
request_cls = load_object(d['_class']) if '_class' in d else Request
This makes more sense to me since d['_class']
is only set when type(request) is not Request
and since d['_class']
stores import path of request, load_object (d['_class'])
should not error. Or is there a case when it will throw an error which I'm unaware of?
elacuesta
Jan 24, 2017
Author
Member
The load_object
call could fail as well. Imagine you have a disk queue containing scrapy_splash.request.SplashRequest
and you try to read the queue from an enviroment which doesn't have scrapy_splash
installed.
I'm not saying how often that would happen, just that it could happen :-)
The load_object
call could fail as well. Imagine you have a disk queue containing scrapy_splash.request.SplashRequest
and you try to read the queue from an enviroment which doesn't have scrapy_splash
installed.
I'm not saying how often that would happen, just that it could happen :-)
kmike
Jan 24, 2017
Member
I think in this case it is better to fail loudly with an exception.
I think in this case it is better to fail loudly with an exception.
voith
Jan 24, 2017
•
@elacuesta If I understood you right, what you're trying to say is that requests were stored to a disk queue by a separate project and were read by another. I don't know if that is asensible use case. My opinion here goes with Mikhail i.e fail in such a case.
@elacuesta If I understood you right, what you're trying to say is that requests were stored to a disk queue by a separate project and were read by another. I don't know if that is asensible use case. My opinion here goes with Mikhail i.e fail in such a case.
01d83c9
to
53757e5
Hello there @kmike, looking forward to reading your comments after the latest changes. |
Looks good! Thanks @elacuesta and @voith. |
Thanks @elacuesta |
Attempting to fix #1890