Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

defer.inlineCallbacks in spider ? #4263

Closed
poupryc opened this issue Jan 1, 2020 · 3 comments
Closed

defer.inlineCallbacks in spider ? #4263

poupryc opened this issue Jan 1, 2020 · 3 comments

Comments

@poupryc
Copy link

poupryc commented Jan 1, 2020

Hi,

I'm trying to use this feature of Twisted to run the following code:

# -*- coding: utf-8 -*-
import scrapy
import re
from twisted.internet.defer import inlineCallbacks

from sherlock import utils, items, regex


class PagesSpider(scrapy.spiders.SitemapSpider):
    name = 'pages'
    allowed_domains = ['thing.com']
    sitemap_follow = [r'sitemap_page']

    def __init__(self, site=None, *args, **kwargs):
        super(PagesSpider, self).__init__(*args, **kwargs)

    @inlineCallbacks
    def parse(self, response):
        # things
        response = yield scrapy.Request("https://google.com")
        # Twisted execute the request and resume the generator here with the response
        print(response.text)

Is this possible ? I'm trying to use this to dispense with the inline-request module.

Thanks

@poupryc
Copy link
Author

poupryc commented Jan 2, 2020

After reading the code, I think I've found the solution. Perhaps documenting it would be beneficial?

# -*- coding: utf-8 -*-
import scrapy
import re
from twisted.internet.defer import inlineCallbacks

from sherlock import utils, items, regex


class PagesSpider(scrapy.spiders.SitemapSpider):
    name = 'pages'
    allowed_domains = ['thing.com']
    sitemap_follow = [r'sitemap_page']

    def __init__(self, site=None, *args, **kwargs):
        super(PagesSpider, self).__init__(*args, **kwargs)

    @inlineCallbacks
    def parse(self, response):
        # things
        request = scrapy.Request("https://google.com")
        response = yield self.crawler.engine.download(request, self) 
        # Twisted execute the request and resume the generator here with the response
        print(response.text)

It's a little verbose, but it works. Correct me if I'm wrong

@elacuesta
Copy link
Member

Relevant: #542 (comment). Not sure if we want to document these ExecutionEngine methods though. Any thoughts @dangra?

If, on the other hand, this issue is motivated by the above one (#3500), the "Scrapy way" of achieving such result would be Passing additional data to callback functions

@wRAR
Copy link
Member

wRAR commented Oct 29, 2023

The current way to do the same is to declare the callback async def, which is already documented. I don't want us to describe any inlineCallbacks uses where the alternatives exists, so I'm closing this.

@wRAR wRAR closed this as completed Oct 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants