Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Hosted robots.txt permissions verifier
Go Ruby
branch: master
Failed to load latest commit information.
static Use "403 Forbidden" instead of "400 Bad Request".
turk
.gitignore cache fetched robots data in local memcache
README.md Use "403 Forbidden" instead of "400 Bad Request".
Rakefile add readme / ronn files
app.yaml migrate to go_1 runtime

README.md

Can I Crawl (this URL)

Hosted robots.txt permissions verifier.

ENDPOINTS

  • / This page.
  • /check Runs the robots.txt verification check.

Description

Verifies if the provided URL is allowed to be crawled by your User-Agent. Pass in the destination URL and the service will download, parse and check the robots.txt file for permissions. If you're allowed to continue, it will issue a 3XX redirect, otherwise a 4XX code is returned.

Examples

$ curl -v http://canicrawl.appspot.com/check?url=http://google.com/

< HTTP/1.0 302 Found
< Location: http://www.google.com/

$ curl -v http://canicrawl.appspot.com/check?url=http://google.com/search

< HTTP/1.0 403 Forbidden
< Content-Length: 23
{"status":"disallowed"}

$ curl -H'User-Agent: MyCustomAgent' -v http://canicrawl.appspot.com/check?url=http://google.com/

> User-Agent: MyCustomAgent
< HTTP/1.0 302 Found
< Location: http://www.google.com/

Note: google.com/robots.txt disallows requests to /search.

License

MIT License - Copyright (c) 2011 Ilya Grigorik

Something went wrong with that request. Please try again.