Skip to content
Newer
Older
100644 35 lines (22 sloc) 1.19 KB
02b4f28 @igrigorik add readme / ronn files
authored Jan 21, 2012
1 # Can I Crawl (this URL)
cb55e5d @igrigorik add readme
authored May 15, 2011
2
02b4f28 @igrigorik add readme / ronn files
authored Jan 21, 2012
3 Hosted robots.txt permissions verifier.
cb55e5d @igrigorik add readme
authored May 15, 2011
4
02b4f28 @igrigorik add readme / ronn files
authored Jan 21, 2012
5 ## ENDPOINTS
cb55e5d @igrigorik add readme
authored May 15, 2011
6
02b4f28 @igrigorik add readme / ronn files
authored Jan 21, 2012
7 - [`/`](http://canicrawl.appspot.com/) This page.
8 - [`/check`](http://canicrawl.appspot.com/check) Runs the robots.txt verification check.
cb55e5d @igrigorik add readme
authored May 15, 2011
9
02b4f28 @igrigorik add readme / ronn files
authored Jan 21, 2012
10 ## Description
cb55e5d @igrigorik add readme
authored May 15, 2011
11
02b4f28 @igrigorik add readme / ronn files
authored Jan 21, 2012
12 Verifies if the provided URL is allowed to be crawled by your User-Agent. Pass in the destination URL and the service will download, parse and check the [robots.txt](http://www.robotstxt.org/) file for permissions. If you're allowed to continue, it will issue a **3XX** redirect, otherwise a **4XX** code is returned.
cb55e5d @igrigorik add readme
authored May 15, 2011
13
02b4f28 @igrigorik add readme / ronn files
authored Jan 21, 2012
14 ## Examples
15
16 ### $ curl -v http://canicrawl.appspot.com/check?url=http://google.com/
17 < HTTP/1.0 302 Found
18 < Location: http://www.google.com/
19
20 ### $ curl -v http://canicrawl.appspot.com/check?url=http://google.com/search
65379bb @PiotrSikora Use "403 Forbidden" instead of "400 Bad Request".
PiotrSikora authored Jan 22, 2012
21 < HTTP/1.0 403 Forbidden
02b4f28 @igrigorik add readme / ronn files
authored Jan 21, 2012
22 < Content-Length: 23
23 {"status":"disallowed"}
24
25 ### $ curl -H'User-Agent: MyCustomAgent' -v http://canicrawl.appspot.com/check?url=http://google.com/
26 > User-Agent: MyCustomAgent
27 < HTTP/1.0 302 Found
28 < Location: http://www.google.com/
29
30 Note: [google.com/robots.txt](http://google.com/robots.txt) disallows requests to _/search_.
cb55e5d @igrigorik add readme
authored May 15, 2011
31
32 ### License
33
02b4f28 @igrigorik add readme / ronn files
authored Jan 21, 2012
34 MIT License - Copyright (c) 2011 [Ilya Grigorik](http://www.igvita.com/)
Something went wrong with that request. Please try again.