Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shortcut for idle #1051

Open
wants to merge 10 commits into
base: master
Choose a base branch
from
Open

Conversation

ChienliMa
Copy link

related to issue #740

@Digenis
Copy link
Member

Digenis commented Feb 20, 2015

LGTM.

Only the existence of the idle method
and the confusion it can create concerns me a bit
I think it's not equivalent to spider.close(),
not meant to be called in some other way than signals.

A typo can accidentally define it instead of idled
(Weren't there such false open issues and/or mailing list posts
for occidentally overriding close instead of closed?)
Mangling its name (__idle) can prevent this.
(but it's an extremely rare practice for scrapy's codebase)

@ChienliMa
Copy link
Author

Is there something else need to be done besides changing its name?
Like documentation.( Well, i'll make a PR later )

@nyov
Copy link
Contributor

nyov commented Mar 25, 2015

It doesn't yet fullfill the requirements of #740, it's nothing more than a signal shortcut so far.
The intended goal was to better handle cases such as this: https://gist.github.com/nyov/8720340

As per #740, idled should

  • listen for spider_idle signal (done)
  • verify the signal was called for the current spider before executing it's idled() method
  • handle an iterable of returned requests; schedule requests using engine.crawl()
  • raise DontCloseSpider exception

@ChienliMa
Copy link
Author

Hi, I am new here. Do you mean something like this?
Should the test case also be changed to cover the code?

@nyov
Copy link
Contributor

nyov commented Mar 26, 2015

This looks more like it, in theory. But if you reference self, it can no longer be a staticmethod.
Though I'm not sure I'm up-to-date with current internals. (Do we still have multiple spiders per crawler actually?)

@dangra, @kmike, could you take a look at this? I'd love to see this make the next release if we can make it work.

@kmike
Copy link
Member

kmike commented Apr 2, 2015

This idled method is an improvement over signals because it handles Requests.

  • what happens if idled returned nothing?
  • it should be documented how to stop the spider from idled signal;
  • usage example in docs would be good;
  • there should be more tests, at least for request handling.

I'm also thinking about the API - maybe we can make something even more user-friendly, and leave 'idle' signal for advanced use cases?

For example, we may allow start_requests to yield a special value, 'WhenQueueEmpty' or 'WaitUntilIdle' (too much iIlilI though):

import scrapy

class MySpider(scrapy.Spider):
    def start_requests(self):
        # send some seed urls
        for url in self.start_urls:
            yield scrapy.Request(url, ...)

        yield scrapy.WaitUntilIdle 
        # Scrapy returns the control here when there is no more 
        # requests in queue - when idle signal fires. 

        while True:
            try:
                for url in self.get_batch_from_redis():
                    yield scrapy.Request(url, ...)
                yield WaitUntilIdle
            except NoMoreRequests:
                break

@ChienliMa
Copy link
Author

Sorry for my disappearance.

@kmike for you opinions:

what happens if idled returned nothing?

If users implement the idled() and return nothing, or something other than a generator, nothing will happen.

it should be documented how to stop the spider from idled signal; usage example in docs would be good;

I added a try..except.... block. Now if users can raise CloseSpider in idled() method if they want to close spider. Do you think this is ok?

there should be more tests, at least for request handling.

I add some tests in the test case. But the test failed because the engine of the crawler did not exist. How can I fix this?

Another API

I am not sure which one is better. I think this should let core developers to decide.

@coveralls

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants