Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restore request's callback after calling custom callback #3129

Open
wants to merge 9 commits into
base: master
from

Conversation

Projects
None yet
3 participants
@nctl144
Copy link
Member

commented Feb 17, 2018

Restore the request callback after calling the custom callback. Bug #3095

@codecov

This comment has been minimized.

Copy link

commented Feb 17, 2018

Codecov Report

Merging #3129 into master will decrease coverage by 0.01%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master    #3129      +/-   ##
==========================================
- Coverage    82.1%   82.09%   -0.02%     
==========================================
  Files         228      228              
  Lines        9586     9590       +4     
  Branches     1384     1386       +2     
==========================================
+ Hits         7871     7873       +2     
- Misses       1456     1458       +2     
  Partials      259      259
Impacted Files Coverage Δ
scrapy/commands/parse.py 20.8% <0%> (-0.25%) ⬇️
scrapy/http/response/__init__.py 93.44% <0%> (+0.22%) ⬆️
@@ -205,12 +205,14 @@ def callback(response):
req.meta['_depth'] = depth + 1
req.meta['_callback'] = req.callback
req.callback = callback
req.callback = original_callback

This comment has been minimized.

Copy link
@jdemaeyer

jdemaeyer Feb 21, 2018

Contributor

Hi @nctl144, thanks for the PR! :)

At this point, the custom callback method was not called yet. You are simply resetting the requests callback to its original immediately after it had been set to the custom callback. Thereby, you disable the whole functionality of scrapy parse for all subsequent requests, and instead of printing the requests and items of the last request, scrapy will print those of the first request.

You can see this if you yield an item before printing "done", like this:

import scrapy


class HttpBinSpider(scrapy.Spider):

    name = "httpbin.org"

    start_urls = ['https://httpbin.org/']

    def parse(self, response):
        if response.meta.get('retried'):
            self.logger.info("done")
            yield {'some': 'item'}
            return
        response.meta['retried'] = True
        yield response.request.replace(dont_filter=True)

With this PR, the end of the crawl log with scrapy parse "https://httpbin.org/" -d 2 looks like this:

>>> STATUS DEPTH LEVEL 1 <<<
# Scraped Items  ------------------------------------------------------------
[]
# Requests  -----------------------------------------------------------------
[<GET https://httpbin.org/>]

whereas with a correct fix it should look like this:

>>> STATUS DEPTH LEVEL 2 <<<
# Scraped Items  ------------------------------------------------------------
[{'some': 'item'}]
# Requests  -----------------------------------------------------------------
[]

The real call to the original callback is done in line 195, so you should try to restore the original request.callback attribute somewhere around there.

return requests

#update request meta if any extra meta was passed through the --meta/-m opts.
if opts.meta:
request.meta.update(opts.meta)

original_callback = request.callback

This comment has been minimized.

Copy link
@jdemaeyer

jdemaeyer Feb 21, 2018

Contributor

This is already saved in request.meta['_callback'] (see two lines below)

@nctl144

This comment has been minimized.

Copy link
Member Author

commented Feb 21, 2018

Hey @jdemaeyer , I have fixed the code just the way you suggested. Can you please review it for me? :) I also added a test case for this, based on other test cases in the "parse" test file and the demo spider that you gave in the bug description.
I also noticed that there is another PR for this bug. So I'm just wondering what I should do next. Please let me know!

@nctl144

This comment has been minimized.

Copy link
Member Author

commented Feb 24, 2018

hey guys, when you have time can you please review this PR for me? Let me know if there is anything that I can improve! :)

@jdemaeyer
Copy link
Contributor

left a comment

@nctl144 thanks for the update, I've left some comments in the code. :)

Don't worry about the other PR, this is not a race - a little duplication is par for the course in open source. What's more important for GSoC is that we can see how you work, how you deal with feedback, and most importantly, that you want to learn. :)

@@ -192,6 +192,7 @@ def callback(response):
# parse items and requests
depth = response.meta['_depth']

request.callback = cb

This comment has been minimized.

Copy link
@jdemaeyer

jdemaeyer Feb 26, 2018

Contributor

Nice, this looks like a good spot to reset the callback. :)

I am going to nit-pick once more here: I would rather reset the callback to its original value (the one stored in response.meta) and not to the function it resolved to (cb). This will only have an impact if the original value was None, but it ensures that scrapy parse will behave just like scrapy crawl should you check something like if response.request.callback: (for whatever reason you would do that 😅).

This comment has been minimized.

Copy link
@nctl144

nctl144 Feb 26, 2018

Author Member

Aight I got it. I just fixed it to ensure the request callback is restored only when it is not None and I also reset the callback to its value in response.meta :)

return
response.meta['retried'] = True
yield response.request.replace(dont_filter=True)

This comment has been minimized.

Copy link
@jdemaeyer

jdemaeyer Feb 26, 2018

Contributor

May you can add another test case that tests non-default callbacks, like the example in this comment?

@nctl144

This comment has been minimized.

Copy link
Member Author

commented Feb 26, 2018

Thanks for letting me know @jdemaeyer , maybe I was just a little bit excited for the first PR to get accepted 😂.
I just added a custom callback test for this PR and fixed the code in parse.py. It might not be perfect yet but please let me know if there is anything that I can further improve!
And yes I actually learned a lot through this PR and some that I made in other scrapy repos. I am enjoying this so far!
And again, thanks for your help, I really appreciate it :)

nctl144 added some commits Feb 26, 2018

@@ -192,6 +192,9 @@ def callback(response):
# parse items and requests
depth = response.meta['_depth']

if response.request.callback:

This comment has been minimized.

Copy link
@gekco

gekco Mar 2, 2018

Hi,
I am new to the project and to open source development as well. but would you explain why
if response.request.callback: check is neccessary. also If i am not wrong , you can just use cb and not request.meta['_callback']

This comment has been minimized.

Copy link
@nctl144

nctl144 Mar 2, 2018

Author Member

Hey @gekco , I believe there's a comment in the review changes that has more detail about this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.