Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Preserve request class when converting to/from dicts (utils.reqser) #2510

Merged
merged 1 commit into from
Feb 8, 2017

Conversation

elacuesta
Copy link
Member

@elacuesta elacuesta commented Jan 24, 2017

Attempting to fix #1890

@codecov-io
Copy link

codecov-io commented Jan 24, 2017

Current coverage is 83.46% (diff: 100%)

Merging #2510 into master will increase coverage by <.01%

@@             master      #2510   diff @@
==========================================
  Files           161        161          
  Lines          8780       8784     +4   
  Methods           0          0          
  Messages          0          0          
  Branches       1288       1289     +1   
==========================================
+ Hits           7328       7332     +4   
  Misses         1204       1204          
  Partials        248        248          

Powered by Codecov. Last update 4620d2f...53757e5

@@ -20,6 +19,7 @@ def request_to_dict(request, spider=None):
if callable(eb):
eb = _find_method(spider, eb)
d = {
'__class__': request.__class__,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work well with scrapy.squeues.MarshalFifoDiskQueue and scrapy.squeues.MarshalLifoDiskQueue, the resulting dicts cannot be serialized. I'm working on it.

Copy link
Member

@kmike kmike Jan 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it make sense not to store default Request class, and store request class only if it is not default:

  1. it would allow to save some memory and CPU;
  2. by supporting queues without __class__ key we're making this PR compatible with disk queues created by previous Scrapy versions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was fast! I made a few modifications to this PR and now it stores a string pointing to the class, this allows the dict to be serialized using Marshal-based queues. It also works well with requests without __class__ key, falling back to the default Request, but it is true that we would save some resources saving the class only when it's not Request; let me patch that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kmike Done, please check again

@elacuesta elacuesta force-pushed the reqser_request_class branch 4 times, most recently from 93ba1e9 to 01d83c9 Compare January 24, 2017 16:43
@@ -47,7 +50,11 @@ def request_from_dict(d, spider=None):
eb = d['errback']
if eb and spider:
eb = _get_method(spider, eb)
return Request(
try:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What has been the thought process here to use a try except block? Why not:

 request_cls = load_object(d['_class']) if '_class' in d else Request

This makes more sense to me since d['_class'] is only set when type(request) is not Request and since d['_class'] stores import path of request, load_object (d['_class']) should not error. Or is there a case when it will throw an error which I'm unaware of?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The load_object call could fail as well. Imagine you have a disk queue containing scrapy_splash.request.SplashRequest and you try to read the queue from an enviroment which doesn't have scrapy_splash installed.
I'm not saying how often that would happen, just that it could happen :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case it is better to fail loudly with an exception.

Copy link

@voith voith Jan 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elacuesta If I understood you right, what you're trying to say is that requests were stored to a disk queue by a separate project and were read by another. I don't know if that is asensible use case. My opinion here goes with Mikhail i.e fail in such a case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make it fail, I was just following (what I thought was) the spirit of your comment @kmike.
It would also be a bit backward incompatible, currently it wouldn't fail but yield a regular Request object.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the suggestion @voith

@elacuesta
Copy link
Member Author

Hello there @kmike, looking forward to reading your comments after the latest changes.

@kmike kmike changed the title Preserve request class when converting to/from dicts (utils.reqser) [MRG+1] Preserve request class when converting to/from dicts (utils.reqser) Feb 6, 2017
@kmike
Copy link
Member

kmike commented Feb 6, 2017

Looks good! Thanks @elacuesta and @voith.

@kmike kmike added this to the v1.4 milestone Feb 6, 2017
@redapple redapple merged commit f32a229 into scrapy:master Feb 8, 2017
@redapple
Copy link
Contributor

redapple commented Feb 8, 2017

Thanks @elacuesta

@elacuesta elacuesta deleted the reqser_request_class branch February 8, 2017 17:32
@kmike kmike removed this from the v1.4 milestone Feb 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disk queues don't preserve Request class
5 participants