Shortcut for idle #1051

ChienliMa · 2015-02-20T03:24:33Z

related to issue #740

Digenis · 2015-02-20T09:48:25Z

LGTM.

Only the existence of the idle method
and the confusion it can create concerns me a bit
I think it's not equivalent to spider.close(),
not meant to be called in some other way than signals.

A typo can accidentally define it instead of idled
(Weren't there such false open issues and/or mailing list posts
for occidentally overriding close instead of closed?)
Mangling its name (__idle) can prevent this.
(but it's an extremely rare practice for scrapy's codebase)

ChienliMa · 2015-02-20T12:12:23Z

Is there something else need to be done besides changing its name?
Like documentation.( Well, i'll make a PR later )

nyov · 2015-03-25T10:53:12Z

It doesn't yet fullfill the requirements of #740, it's nothing more than a signal shortcut so far.
The intended goal was to better handle cases such as this: https://gist.github.com/nyov/8720340

As per #740, idled should

listen for spider_idle signal (done)
verify the signal was called for the current spider before executing it's idled() method
handle an iterable of returned requests; schedule requests using engine.crawl()
raise DontCloseSpider exception

ChienliMa · 2015-03-26T16:04:38Z

Hi, I am new here. Do you mean something like this?
Should the test case also be changed to cover the code?

nyov · 2015-03-26T20:46:26Z

This looks more like it, in theory. But if you reference self, it can no longer be a staticmethod.
Though I'm not sure I'm up-to-date with current internals. (Do we still have multiple spiders per crawler actually?)

@dangra, @kmike, could you take a look at this? I'd love to see this make the next release if we can make it work.

kmike · 2015-04-02T19:50:57Z

This idled method is an improvement over signals because it handles Requests.

what happens if idled returned nothing?
it should be documented how to stop the spider from idled signal;
usage example in docs would be good;
there should be more tests, at least for request handling.

I'm also thinking about the API - maybe we can make something even more user-friendly, and leave 'idle' signal for advanced use cases?

For example, we may allow start_requests to yield a special value, 'WhenQueueEmpty' or 'WaitUntilIdle' (too much iIlilI though):

import scrapy

class MySpider(scrapy.Spider):
    def start_requests(self):
        # send some seed urls
        for url in self.start_urls:
            yield scrapy.Request(url, ...)

        yield scrapy.WaitUntilIdle 
        # Scrapy returns the control here when there is no more 
        # requests in queue - when idle signal fires. 

        while True:
            try:
                for url in self.get_batch_from_redis():
                    yield scrapy.Request(url, ...)
                yield WaitUntilIdle
            except NoMoreRequests:
                break

ChienliMa · 2015-04-25T16:25:01Z

Sorry for my disappearance.

@kmike for you opinions:

what happens if idled returned nothing?

If users implement the idled() and return nothing, or something other than a generator, nothing will happen.

it should be documented how to stop the spider from idled signal; usage example in docs would be good;

I added a try..except.... block. Now if users can raise CloseSpider in idled() method if they want to close spider. Do you think this is ok?

there should be more tests, at least for request handling.

I add some tests in the test case. But the test failed because the engine of the crawler did not exist. How can I fix this?

Another API

I am not sure which one is better. I think this should let core developers to decide.

ChienliMa added 3 commits February 20, 2015 11:16

add shortcut to spider_idle and its corresponding test

0f3f72d

delete extra import

6f774e0

delete extra line

0f5635d

Change Spider.idle into Spider.__idle()

44b4bf9

Document for Spider.idled()

117a38a

verify spider before calling idled() and schedule requests after it.

f575ecf

changes staticmethod to instance method

f5246d4

ChienliMa added 3 commits April 25, 2015 23:22

Enable users to close spider in idled() method

7f396af

add test for request schedule

dbaacee

Return if idled() does not return an iterable

a1ae321

Digenis mentioned this pull request Mar 15, 2016

Trouble executing code on spider close #1862

Closed

kmike mentioned this pull request Sep 16, 2016

Allow start_requests method running forever #456

Open

This comment was marked as outdated.

Sign in to view

kmike mentioned this pull request Mar 10, 2018

GSoC 2018: Intro & questions about adding async/await support #3148

Closed

whalebot-helmsman mentioned this pull request Apr 25, 2018

Ability to control consumption of start_requests from spider #3237

Open

torymur mentioned this pull request Jan 21, 2019

Shortcut method for spider_idle signal #740

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shortcut for idle #1051

Shortcut for idle #1051

ChienliMa commented Feb 20, 2015

Digenis commented Feb 20, 2015

ChienliMa commented Feb 20, 2015

nyov commented Mar 25, 2015

ChienliMa commented Mar 26, 2015

nyov commented Mar 26, 2015

kmike commented Apr 2, 2015 •

edited

ChienliMa commented Apr 25, 2015

This comment was marked as outdated.

Shortcut for idle #1051

Are you sure you want to change the base?

Shortcut for idle #1051

Conversation

ChienliMa commented Feb 20, 2015

Digenis commented Feb 20, 2015

ChienliMa commented Feb 20, 2015

nyov commented Mar 25, 2015

ChienliMa commented Mar 26, 2015

nyov commented Mar 26, 2015

kmike commented Apr 2, 2015 • edited

ChienliMa commented Apr 25, 2015

what happens if idled returned nothing?

it should be documented how to stop the spider from idled signal; usage example in docs would be good;

there should be more tests, at least for request handling.

Another API

This comment was marked as outdated.

kmike commented Apr 2, 2015 •

edited