I just read Ben Alman on jQuery's Throttle and Debounce and wondered: can DebounceRequest be added to Scrapy?
The use case: recently I've been scraping a website where I wanted to gather Facebook Likes per URL. Now that comes cheap using Facebook API, particularly the Facebook Query Language. The problem is that FB will eventually stop answering my request, on a undocumented rate limit. But what if I could define a way for those requests to be joined and called only after a certain while, asking for all the parameters in a user-defined way? I'd expect the callback to be called just once, too (or maybe several times but with the full response).
I don't actually understand the facebook related parts here.
Were you asking for debouncing requests to same URLs instead of dupe-filtering?
Or would you mind explaining the use-case again?
Let's say my CustomSpider yields DebouncedRequests with a particular parameter, like http://domain.com/getdata.xml?query=1. But instead of firing it immediately, I set rules for it to wait at least 10 requests (or maybe wait for at least 10 seconds) and join the requests on a single request to http://domain.com/getdata.xml?query=1,2,3,4,5,6,7,8,9,10.
I think this should already be possible in a middleware.