Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Preserve request class when converting to/from dicts (utils.reqser) #2510

Merged
merged 1 commit into from Feb 8, 2017

Conversation

@elacuesta
Copy link
Member

@elacuesta elacuesta commented Jan 24, 2017

Attempting to fix #1890

@codecov-io
Copy link

@codecov-io codecov-io commented Jan 24, 2017

Current coverage is 83.46% (diff: 100%)

Merging #2510 into master will increase coverage by <.01%

@@             master      #2510   diff @@
==========================================
  Files           161        161          
  Lines          8780       8784     +4   
  Methods           0          0          
  Messages          0          0          
  Branches       1288       1289     +1   
==========================================
+ Hits           7328       7332     +4   
  Misses         1204       1204          
  Partials        248        248          

Powered by Codecov. Last update 4620d2f...53757e5

@@ -20,6 +19,7 @@ def request_to_dict(request, spider=None):
if callable(eb):
eb = _find_method(spider, eb)
d = {
'__class__': request.__class__,

This comment has been minimized.

@elacuesta

elacuesta Jan 24, 2017
Author Member

This doesn't work well with scrapy.squeues.MarshalFifoDiskQueue and scrapy.squeues.MarshalLifoDiskQueue, the resulting dicts cannot be serialized. I'm working on it.

This comment has been minimized.

@kmike

kmike Jan 24, 2017
Member

I think it make sense not to store default Request class, and store request class only if it is not default:

  1. it would allow to save some memory and CPU;
  2. by supporting queues without __class__ key we're making this PR compatible with disk queues created by previous Scrapy versions.

This comment has been minimized.

@elacuesta

elacuesta Jan 24, 2017
Author Member

That was fast! I made a few modifications to this PR and now it stores a string pointing to the class, this allows the dict to be serialized using Marshal-based queues. It also works well with requests without __class__ key, falling back to the default Request, but it is true that we would save some resources saving the class only when it's not Request; let me patch that.

This comment has been minimized.

@elacuesta

elacuesta Jan 24, 2017
Author Member

@kmike Done, please check again

@elacuesta elacuesta force-pushed the elacuesta:reqser_request_class branch 4 times, most recently from 93ba1e9 to 01d83c9 Jan 24, 2017
@@ -47,7 +50,11 @@ def request_from_dict(d, spider=None):
eb = d['errback']
if eb and spider:
eb = _get_method(spider, eb)
return Request(
try:

This comment has been minimized.

@voith

voith Jan 24, 2017

What has been the thought process here to use a try except block? Why not:

 request_cls = load_object(d['_class']) if '_class' in d else Request

This makes more sense to me since d['_class'] is only set when type(request) is not Request and since d['_class'] stores import path of request, load_object (d['_class']) should not error. Or is there a case when it will throw an error which I'm unaware of?

This comment has been minimized.

@elacuesta

elacuesta Jan 24, 2017
Author Member

The load_object call could fail as well. Imagine you have a disk queue containing scrapy_splash.request.SplashRequest and you try to read the queue from an enviroment which doesn't have scrapy_splash installed.
I'm not saying how often that would happen, just that it could happen :-)

This comment has been minimized.

@kmike

kmike Jan 24, 2017
Member

I think in this case it is better to fail loudly with an exception.

This comment has been minimized.

@voith

voith Jan 24, 2017

@elacuesta If I understood you right, what you're trying to say is that requests were stored to a disk queue by a separate project and were read by another. I don't know if that is asensible use case. My opinion here goes with Mikhail i.e fail in such a case.

This comment has been minimized.

@elacuesta

elacuesta Jan 24, 2017
Author Member

I can make it fail, I was just following (what I thought was) the spirit of your comment @kmike.
It would also be a bit backward incompatible, currently it wouldn't fail but yield a regular Request object.

This comment has been minimized.

@elacuesta

elacuesta Jan 24, 2017
Author Member

Done, thanks for the suggestion @voith

@elacuesta elacuesta force-pushed the elacuesta:reqser_request_class branch from 01d83c9 to 53757e5 Jan 24, 2017
@elacuesta
Copy link
Member Author

@elacuesta elacuesta commented Feb 3, 2017

Hello there @kmike, looking forward to reading your comments after the latest changes.

@kmike kmike changed the title Preserve request class when converting to/from dicts (utils.reqser) [MRG+1] Preserve request class when converting to/from dicts (utils.reqser) Feb 6, 2017
@kmike
Copy link
Member

@kmike kmike commented Feb 6, 2017

Looks good! Thanks @elacuesta and @voith.

@kmike kmike added this to the v1.4 milestone Feb 6, 2017
@redapple redapple merged commit f32a229 into scrapy:master Feb 8, 2017
3 checks passed
3 checks passed
codecov/patch 100% of diff hit (target 83.46%)
Details
codecov/project 83.46% (+<.01%) compared to 4620d2f
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@redapple
Copy link
Contributor

@redapple redapple commented Feb 8, 2017

Thanks @elacuesta

@elacuesta elacuesta deleted the elacuesta:reqser_request_class branch Feb 8, 2017
@kmike kmike removed this from the v1.4 milestone Feb 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants