Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Scraper API to call internal _parse method #4254

Merged
merged 10 commits into from Jul 21, 2020

Conversation

elacuesta
Copy link
Member

@elacuesta elacuesta commented Dec 23, 2019

This is #732, cherry-picked on top of the current master branch and updated to add keyword argument support on callbacks.

Related to #781.

nyov and others added 2 commits Dec 23, 2019
A Spider class using internal pre-processing can have first dibs
at this and then call a public `parse` method for subclass hooking.
@codecov
Copy link

codecov bot commented Dec 23, 2019

Codecov Report

Merging #4254 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #4254   +/-   ##
=======================================
  Coverage   84.80%   84.80%           
=======================================
  Files         163      163           
  Lines        9990     9992    +2     
  Branches     1488     1488           
=======================================
+ Hits         8472     8474    +2     
  Misses       1255     1255           
  Partials      263      263           
Impacted Files Coverage Δ
scrapy/core/scraper.py 87.97% <100.00%> (ø)
scrapy/spiders/__init__.py 100.00% <100.00%> (ø)
scrapy/spiders/crawl.py 94.56% <100.00%> (+2.17%) ⬆️
scrapy/spiders/feed.py 66.15% <100.00%> (ø)
scrapy/utils/misc.py 95.45% <0.00%> (-1.82%) ⬇️

@elacuesta
Copy link
Member Author

elacuesta commented Dec 26, 2019

Removed the note about the CrawlSpider.parse method, added a note about explicitly setting the callback for new requests in CrawlSpider/XMLFeedSpider. This isn't a new requirement, just something that is not currently mentioned in the docs: the current implementation send requests without an explicit callback to the spider's parse method, which is supposed not to be overridden; the user doesn't necessarily know exactly what it does.

@elacuesta elacuesta marked this pull request as ready for review Dec 26, 2019
@elacuesta
Copy link
Member Author

elacuesta commented Jan 30, 2020

@scrapy/maintainers any thoughts about this?

@elacuesta elacuesta closed this Feb 6, 2020
@elacuesta elacuesta reopened this Feb 6, 2020
@elacuesta elacuesta added this to the v2.3 milestone Jun 25, 2020
wRAR
wRAR approved these changes Jul 21, 2020
@wRAR wRAR merged commit f3372a3 into scrapy:master Jul 21, 2020
2 checks passed
@elacuesta elacuesta deleted the spider.parse branch Jul 21, 2020
@elacuesta
Copy link
Member Author

elacuesta commented Jul 21, 2020

Thanks @Gallaecio and @wRAR for the reviews, and many thanks @nyov for the initial implementation 😄

nflx added a commit to nflx/typesense-docsearch-scraper that referenced this pull request Sep 19, 2022
nflx added a commit to nflx/typesense-docsearch-scraper that referenced this pull request Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants