Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommend Common Crawl instead of Google Cache #5432

Merged
merged 1 commit into from
Mar 11, 2022
Merged

Conversation

farsene
Copy link
Contributor

@farsene farsene commented Mar 1, 2022

Closes #3582

The terms of use allow scraping as long as Scrapy honours the restrictions of robot.txt files and NOFOLLOW metatags
Common Crawl terms of use: https://commoncrawl.org/terms-of-use/

@codecov
Copy link

codecov bot commented Mar 2, 2022

Codecov Report

Merging #5432 (ccdbb79) into master (50c8bec) will increase coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #5432      +/-   ##
==========================================
+ Coverage   88.75%   88.77%   +0.01%     
==========================================
  Files         163      163              
  Lines       10666    10666              
  Branches     1818     1818              
==========================================
+ Hits         9467     9469       +2     
+ Misses        923      922       -1     
+ Partials      276      275       -1     
Impacted Files Coverage Δ
scrapy/core/downloader/__init__.py 92.48% <0.00%> (+1.50%) ⬆️

@Gallaecio
Copy link
Member

Thanks!

@wRAR wRAR merged commit 2d6042b into scrapy:master Mar 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not recommend Google Cache in the documentation
3 participants