-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Scheduler: minimal interface, API docs #3559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3559 +/- ##
==========================================
- Coverage 88.25% 88.07% -0.18%
==========================================
Files 162 162
Lines 10430 10467 +37
Branches 1514 1517 +3
==========================================
+ Hits 9205 9219 +14
- Misses 951 972 +21
- Partials 274 276 +2
|
Hey @elacuesta! Thanks, I think we should really document this API, though I'd
|
That's good @kmike, thanks! I can wait on this, but I wonder how far away is that from being merged? The same use case that motivated me to understand the Scheduler would also benefit greatly by the addition of the |
Finishing that PR is kind-of priority :)
what's your use case? |
Ping! #3520 has been merged 🙂 |
fdf6dfa
to
cfd490e
Compare
Co-authored-by: Adrián Chaves <adrian@chaves.io>
This reverts commit a2ede1d.
I think the latest changes make it clear that all the queue management stuff that the default scheduler does is not technically essential to perform the scheduler functions. Now this extremely useful example is possible!from scrapy import Spider
class FriendlyScheduler:
def __init__(self):
self.requests = dict()
def has_pending_requests(self):
return bool(self.requests)
def open(self, spider):
print(f"Hello {spider.__class__.__name__}, thanks for using this scheduler")
return None
def close(self, reason):
print("Farewell my friend")
return None
def enqueue_request(self, request):
if request.url in self.requests:
return False
print("By all means, I will store this request for you")
self.requests[request.url] = request
return True
def next_request(self):
if self.has_pending_requests():
_, request = self.requests.popitem()
print(f"Enjoy your next request: {request.url}")
return request
return None
class QuotesSpider(Spider):
name = "quotes"
start_urls = [
"http://quotes.toscrape.com/tag/friends/",
"http://quotes.toscrape.com/tag/life/",
"http://quotes.toscrape.com/tag/humor/",
]
custom_settings = {
"SCHEDULER": FriendlyScheduler,
"LOG_LEVEL": "INFO",
}
def parse(self, response):
for quote in response.css("div.quote"):
yield {
"author": quote.xpath("span/small/text()").get(),
"text": quote.css("span.text::text").get(),
}
|
state = self.dqs.close() | ||
assert isinstance(self.dqdir, str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion is only to avoid a typing error in the next line (_write_dqs_state
expects a str
but it gets Optional[str]
). At this point, if self.dqs is not None
it's only because self.dqdir
is a str
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Fixes #3537