Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Hosted robots.txt permissions verifier

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 static
Octocat-spinner-32 turk
Octocat-spinner-32 .gitignore
Octocat-spinner-32 README.md
Octocat-spinner-32 Rakefile
Octocat-spinner-32 app.yaml
README.md

Can I Crawl (this URL)

Hosted robots.txt permissions verifier.

ENDPOINTS

  • / This page.
  • /check Runs the robots.txt verification check.

Description

Verifies if the provided URL is allowed to be crawled by your User-Agent. Pass in the destination URL and the service will download, parse and check the robots.txt file for permissions. If you're allowed to continue, it will issue a 3XX redirect, otherwise a 4XX code is returned.

Examples

$ curl -v http://canicrawl.appspot.com/check?url=http://google.com/

< HTTP/1.0 302 Found
< Location: http://www.google.com/

$ curl -v http://canicrawl.appspot.com/check?url=http://google.com/search

< HTTP/1.0 403 Forbidden
< Content-Length: 23
{"status":"disallowed"}

$ curl -H'User-Agent: MyCustomAgent' -v http://canicrawl.appspot.com/check?url=http://google.com/

> User-Agent: MyCustomAgent
< HTTP/1.0 302 Found
< Location: http://www.google.com/

Note: google.com/robots.txt disallows requests to /search.

License

MIT License - Copyright (c) 2011 Ilya Grigorik

Something went wrong with that request. Please try again.