Merge pull request #5432 from Farsene1/master

Recommend Common Crawl instead of Google Cache
scrapy · Mar 11, 2022 · 2d6042b · 2d6042b
2 parents e264cc3 + ccdbb79
commit 2d6042b
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/docs/topics/practices.rst b/docs/topics/practices.rst
@@ -262,7 +262,7 @@ Here are some tips to keep in mind when dealing with these kinds of sites:
 * disable cookies (see :setting:`COOKIES_ENABLED`) as some sites may use
   cookies to spot bot behaviour
 * use download delays (2 or higher). See :setting:`DOWNLOAD_DELAY` setting.
-* if possible, use `Google cache`_ to fetch pages, instead of hitting the sites
+* if possible, use `Common Crawl`_ to fetch pages, instead of hitting the sites
   directly
 * use a pool of rotating IPs. For example, the free `Tor project`_ or paid
   services like `ProxyMesh`_. An open source alternative is `scrapoxy`_, a
@@ -277,7 +277,7 @@ If you are still unable to prevent your bot getting banned, consider contacting
 .. _Tor project: https://www.torproject.org/
 .. _commercial support: https://scrapy.org/support/
 .. _ProxyMesh: https://proxymesh.com/
-.. _Google cache: http://www.googleguide.com/cached_pages.html
+.. _Common Crawl: https://commoncrawl.org/
 .. _testspiders: https://github.com/scrapinghub/testspiders
 .. _scrapoxy: https://scrapoxy.io/
 .. _Zyte Smart Proxy Manager: https://www.zyte.com/smart-proxy-manager/