This is a collection of robots.txt templates
- Robots.txt is a file that tells search engines which pages or files the crawler can or can not request from your site
- A crawler is a program that browses the web automatically. It is used by search engines to update their web index.
# This is a comment
User-agent: *
Allow: /
User-agent: *
Disallow: /
User-agent: *
Disallow: /folder/
User-agent: *
Disallow: /file.html
User-agent: *
Disallow: /*.pdf$
User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /
User-agent: *
Allow: /
User-agent: Googlebot
Disallow: /
User-agent: *
Sitemap: https://example.com/sitemap.xml
The Sitemap directive tells the crawler where to find your sitemap.
A sitemap is a file that lists the pages of your site. It is used by search engines to index your site.
User-agent: *
Crawl-delay: 10
The Crawl-delay directive tells the crawler to wait at least 10 seconds between requests to your site.
- Googlebot - Used for Google Search
- Bingbot - Used for Bing Search
- Slurp - Yahoo's web crawler
- DuckDuckBot - Used by the DuckDuckGo search engine
- Baiduspider - This is a Chinese search engine
- YandexBot - This is a Russian search engine
- facebot - Used by Facebook
- Pinterestbot - Used by Pinterest
- TwitterBot - Used by Twitter