Skip to content
Find file
Fetching contributors…
Cannot retrieve contributors at this time
106 lines (84 sloc) 4.4 KB
<!DOCTYPE html>
<meta http-equiv='content-type' value='text/html;charset=utf8'>
<meta name='generator' value='Ronn/v0.7.3 ('>
<title>canicrawl(1): Robots.txt Permissions Verifier</title>
<style type='text/css' media='all'>
/* style: man */
body#manpage {margin:0}
.mp {max-width:100ex;padding:0 9ex 1ex 4ex}
.mp p,.mp pre,.mp ul,.mp ol,.mp dl {margin:0 0 20px 0}
.mp h2 {margin:10px 0 0 0}
.mp > p,.mp > pre,.mp > ul,.mp > ol,.mp > dl {margin-left:8ex}
.mp h3 {margin:0 0 0 4ex}
.mp dt {margin:0;clear:left}
.mp dt.flush {float:left;width:8ex}
.mp dd {margin:0 0 0 9ex}
.mp h1,.mp h2,.mp h3,.mp h4 {clear:left}
.mp pre {margin-bottom:20px}
.mp pre+h2,.mp pre+h3 {margin-top:22px}
.mp h2+pre,.mp h3+pre {margin-top:5px}
.mp img {display:block;margin:auto}
.mp {display:none}
.mp,.mp code,.mp pre,.mp tt,.mp kbd,.mp samp,.mp h3,.mp h4 {font-family:monospace;font-size:14px;line-height:1.42857142857143}
.mp h2 {font-size:16px;line-height:1.25}
.mp h1 {font-size:20px;line-height:2}
.mp {text-align:justify;background:#fff}
.mp,.mp code,.mp pre,.mp pre code,.mp tt,.mp kbd,.mp samp {color:#131211}
.mp h1,.mp h2,.mp h3,.mp h4 {color:#030201}
.mp u {text-decoration:underline}
.mp code,.mp strong,.mp b {font-weight:bold;color:#131211}
.mp em,.mp var {font-style:italic;color:#232221;text-decoration:none}
.mp a,.mp a:link,.mp a:hover,.mp a code,.mp a pre,.mp a tt,.mp a kbd,.mp a samp {color:#0000ff}
.mp {font-weight:normal;color:#434241}
.mp pre {padding:0 4ex}
.mp pre code {font-weight:normal;color:#434241}
.mp h2+pre,h3+pre {padding-left:0}, li {margin:3px 0 10px 0;padding:0;float:left;width:33%;list-style-type:none;text-transform:uppercase;color:#999;letter-spacing:1px} {width:100%} {text-align:left} {text-align:center;letter-spacing:4px} {text-align:right;float:right}
<style type='text/css' media='all'>
.mp {max-width:150ex}
ul {list-style: None; margin-left: 1em!important}
.man-navigation {left:151ex}
<body id='manpage'>
<a href=""><img style="position: absolute; top: 0; right: 0; border: 0;" src="" alt="Fork me on GitHub"></a>
<!-- DOCS -->
<div class='mp'>
<h1>Can I Crawl (this URL)</h1>
<p>Hosted robots.txt permissions verifier.</p>
<li><a href=""><code>/</code></a> This page.</li>
<li><a href=""><code>/check</code></a> Runs the robots.txt verification check.</li>
<h2 id="Description">Description</h2>
<p>Verifies if the provided URL is allowed to be crawled by your User-Agent. Pass in the destination URL and the service will download, parse and check the <a href="">robots.txt</a> file for permissions. If you're allowed to continue, it will issue a <strong>3XX</strong> redirect, otherwise a <strong>4XX</strong> code is returned.</p>
<h2 id="Examples">Examples</h2>
<h3 id="-curl-v-http-canicrawl-appspot-com-check-url-http-www-google-com-">$ curl -v</h3>
<pre><code>&lt; HTTP/1.0 302 Found
&lt; Location:
<h3 id="-curl-v-http-canicrawl-appspot-com-check-url-http-www-google-com-search">$ curl -v</h3>
<pre><code>&lt; HTTP/1.0 403 Forbidden
&lt; Content-Length: 23
<h3 id="-curl-H-User-Agent-MyCustomAgent-v-http-canicrawl-appspot-com-check-url-http-www-google-com-">$ curl -H'User-Agent: MyCustomAgent' -v</h3>
<pre><code>&gt; User-Agent: MyCustomAgent
&lt; HTTP/1.0 302 Found
&lt; Location:
<p>Note: <a href=""></a> disallows requests to <em>/search</em>.</p>
<h2 id="License">License</h2>
<p>MIT License - Copyright (c) 2011 <a href="">Ilya Grigorik</a></p>
<!-- END DOCS -->
Something went wrong with that request. Please try again.