-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
response.follow_all or SelectorList.follow_all shortcut #2582
Comments
An alternative is to implement it on SelectorList: yield from response.css('li.next a').follow_all(self.parse, meta={...}) |
For Python 2 users we may add RequestSet container class, so that page_response = await response.css('.pages a::attr(href)').follow_all()
# page_responses is a list of scrapy.Response objects for all pages or responses = await RequestSet([Request(url1), Request(url2)]) I'm not sure we can guarantee the order in the |
A RequestSet class is useful regardless of whether or not you are using Python 3. It's useful to track state that's shared among a subset of requests, but not all of them. E.g. you are parsing a subsection of a site and you want to track duplicates or some statistics for that subsection only. You need to store some info which becomes irrelevant as soon as the last request from that subset is dealt with. If you store that info on the spider, you need to carefully track the set of active requests using callbacks & errbacks. The implementation is error-prone and hard to do right on the first try, so it would be nice to have it shipped in well-tested state with the package. |
@kmike, only a rough one. I think of RequestSet as something that:
One more thing to consider is cross-referencing RequestSets, i.e. when two requests that should belong to one RequestSet are produced by different callbacks and thus have different scopes. Maybe a simple |
@immerrr could you please coopy-paste this to a new issue? |
What do you think about adding response.follow_all shortcut, which returns a list of requests? This is inspired by this note in docs:
So instead of
users would be able to write (in Python 3)
We can also add 'css' and 'xpath' support to it, as keyword arguments; it would shorten the code to this:
(this is a follow-up to #1940 and #2540)
The text was updated successfully, but these errors were encountered: