-
Notifications
You must be signed in to change notification settings - Fork 10.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Preserve request class when converting to/from dicts (utils.reqser) #2510
Conversation
Current coverage is 83.46% (diff: 100%)@@ master #2510 diff @@
==========================================
Files 161 161
Lines 8780 8784 +4
Methods 0 0
Messages 0 0
Branches 1288 1289 +1
==========================================
+ Hits 7328 7332 +4
Misses 1204 1204
Partials 248 248
|
@@ -20,6 +19,7 @@ def request_to_dict(request, spider=None): | |||
if callable(eb): | |||
eb = _find_method(spider, eb) | |||
d = { | |||
'__class__': request.__class__, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work well with scrapy.squeues.MarshalFifoDiskQueue
and scrapy.squeues.MarshalLifoDiskQueue
, the resulting dicts cannot be serialized. I'm working on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it make sense not to store default Request class, and store request class only if it is not default:
- it would allow to save some memory and CPU;
- by supporting queues without
__class__
key we're making this PR compatible with disk queues created by previous Scrapy versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was fast! I made a few modifications to this PR and now it stores a string pointing to the class, this allows the dict to be serialized using Marshal
-based queues. It also works well with requests without __class__
key, falling back to the default Request
, but it is true that we would save some resources saving the class only when it's not Request
; let me patch that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kmike Done, please check again
93ba1e9
to
01d83c9
Compare
@@ -47,7 +50,11 @@ def request_from_dict(d, spider=None): | |||
eb = d['errback'] | |||
if eb and spider: | |||
eb = _get_method(spider, eb) | |||
return Request( | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What has been the thought process here to use a try except block? Why not:
request_cls = load_object(d['_class']) if '_class' in d else Request
This makes more sense to me since d['_class']
is only set when type(request) is not Request
and since d['_class']
stores import path of request, load_object (d['_class'])
should not error. Or is there a case when it will throw an error which I'm unaware of?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The load_object
call could fail as well. Imagine you have a disk queue containing scrapy_splash.request.SplashRequest
and you try to read the queue from an enviroment which doesn't have scrapy_splash
installed.
I'm not saying how often that would happen, just that it could happen :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case it is better to fail loudly with an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elacuesta If I understood you right, what you're trying to say is that requests were stored to a disk queue by a separate project and were read by another. I don't know if that is asensible use case. My opinion here goes with Mikhail i.e fail in such a case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks for the suggestion @voith
01d83c9
to
53757e5
Compare
Hello there @kmike, looking forward to reading your comments after the latest changes. |
Looks good! Thanks @elacuesta and @voith. |
Thanks @elacuesta |
Attempting to fix #1890