Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CrawlSpider improvements #781

Closed
dangra opened this issue Jul 2, 2014 · 6 comments
Closed

CrawlSpider improvements #781

dangra opened this issue Jul 2, 2014 · 6 comments

Comments

@dangra
Copy link
Member

@dangra dangra commented Jul 2, 2014

A ticket to kickoff the discussion on CrawlSpider enhancements (if any)

ideas:

  • Simplify rule definitions by using an implicit linkextractor instanciated from LxmlLinkExtractor.
  • ... what else?
@pablohoffman
Copy link
Member

@pablohoffman pablohoffman commented Jul 3, 2014

  • any good idea from SEP-014 (if any)
  • support overriding parse? (#712)

@nyov
Copy link
Contributor

@nyov nyov commented Aug 27, 2014

in regards to "overriding parse", my proposal is at #732 (but a bit more generic than CrawlSpider)
In essence, I believe in decoupling the Scraper's entrypoint from the user-facing parse.

+------------------+-----------------+---------------------+
| Scraper (caller) | Spider (callee) | UserSpider(Spider)  |
+------------------+-----------------+---------------------+
| call_spider() ---> init()/_parse() |                     |
|                  | |`-> important()|                     |
|                  | `--> parse()  <-- parse() /wo super() |
+------------------+-----------------+---------------------+

Another thing I've been thinking on in the past was designing spiders around Mixin-classes instead. Maybe that doesn't belong here. But the outcome would be that Spider "ideas" are self-contained and combineable, something like MySpider(Spider, Crawl, Init, Csv).

With InitSpider this is probably already possible to some extent, but currently it's not guranteed that classes could work together.

@redapple
Copy link
Contributor

@redapple redapple commented Sep 14, 2016

I would also add something related to #929, that is passing the response to process_request in _requests_to_follow, so that one can tweak the generated requests with some context. (If there's another way currently than overriding _request_to_follow, it'd be happy for it to be documented :)

Oh, and also adding errback to rules could be handy at times: see http://stackoverflow.com/a/35870000/2572383

@guillaumedsde
Copy link

@guillaumedsde guillaumedsde commented Jun 13, 2019

I would also add something related to #929, that is passing the response to process_request in _requests_to_follow, so that one can tweak the generated requests with some context. (If there's another way currently than overriding _request_to_follow, it'd be happy for it to be documented :)

Oh, and also adding errback to rules could be handy at times: see http://stackoverflow.com/a/35870000/2572383

is this a planned feature?

@elacuesta
Copy link
Member

@elacuesta elacuesta commented Sep 13, 2019

is this a planned feature?

@guillaumedsde Please see #3682 (included in version 1.7.1) and #4000 (in progress).

@elacuesta
Copy link
Member

@elacuesta elacuesta commented Jan 4, 2021

Seems to me like everything has been addressed 👌

@elacuesta elacuesta closed this Jan 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants