Skip to content
Hosted robots.txt permissions verifier
Go Ruby
Find file
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
static
turk
.gitignore
README.md
Rakefile
app.yaml

README.md

Can I Crawl (this URL)

Hosted robots.txt permissions verifier.

ENDPOINTS

  • / This page.
  • /check Runs the robots.txt verification check.

Description

Verifies if the provided URL is allowed to be crawled by your User-Agent. Pass in the destination URL and the service will download, parse and check the robots.txt file for permissions. If you're allowed to continue, it will issue a 3XX redirect, otherwise a 4XX code is returned.

Examples

$ curl -v http://canicrawl.appspot.com/check?url=http://google.com/

< HTTP/1.0 302 Found
< Location: http://www.google.com/

$ curl -v http://canicrawl.appspot.com/check?url=http://google.com/search

< HTTP/1.0 403 Forbidden
< Content-Length: 23
{"status":"disallowed"}

$ curl -H'User-Agent: MyCustomAgent' -v http://canicrawl.appspot.com/check?url=http://google.com/

> User-Agent: MyCustomAgent
< HTTP/1.0 302 Found
< Location: http://www.google.com/

Note: google.com/robots.txt disallows requests to /search.

License

MIT License - Copyright (c) 2011 Ilya Grigorik

Something went wrong with that request. Please try again.