scrapy · pablohoffman · Jan 28, 2014 · Jan 23, 2014 · Jan 23, 2014 · Jan 23, 2014
diff --git a/docs/intro/tutorial.rst b/docs/intro/tutorial.rst
@@ -147,15 +147,17 @@ To put our spider to work, go to the project's top level directory and run::
 The ``crawl dmoz`` command runs the spider for the ``dmoz.org`` domain. You
 will get an output similar to this::
 
-   2008-08-20 03:51:13-0300 [scrapy] INFO: Started project: dmoz
-   2008-08-20 03:51:13-0300 [tutorial] INFO: Enabled extensions: ...
-   2008-08-20 03:51:13-0300 [tutorial] INFO: Enabled downloader middlewares: ...
-   2008-08-20 03:51:13-0300 [tutorial] INFO: Enabled spider middlewares: ...
-   2008-08-20 03:51:13-0300 [tutorial] INFO: Enabled item pipelines: ...
-   2008-08-20 03:51:14-0300 [dmoz] INFO: Spider opened
-   2008-08-20 03:51:14-0300 [dmoz] DEBUG: Crawled <http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: <None>)
-   2008-08-20 03:51:14-0300 [dmoz] DEBUG: Crawled <http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: <None>)
-   2008-08-20 03:51:14-0300 [dmoz] INFO: Spider closed (finished)
+    2014-01-23 18:13:07-0400 [scrapy] INFO: Scrapy started (bot: tutorial)
+    2014-01-23 18:13:07-0400 [scrapy] INFO: Optional features available: ...
+    2014-01-23 18:13:07-0400 [scrapy] INFO: Overridden settings: {}
+    2014-01-23 18:13:07-0400 [scrapy] INFO: Enabled extensions: ...
+    2014-01-23 18:13:07-0400 [scrapy] INFO: Enabled downloader middlewares: ...
+    2014-01-23 18:13:07-0400 [scrapy] INFO: Enabled spider middlewares: ...
+    2014-01-23 18:13:07-0400 [scrapy] INFO: Enabled item pipelines: ...
+    2014-01-23 18:13:07-0400 [dmoz] INFO: Spider opened
+    2014-01-23 18:13:08-0400 [dmoz] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None)
+    2014-01-23 18:13:09-0400 [dmoz] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
+    2014-01-23 18:13:09-0400 [dmoz] INFO: Closing spider (finished)
 
 Pay attention to the lines containing ``[dmoz]``, which corresponds to our
 spider. You can see a log line for each URL defined in ``start_urls``. Because
@@ -253,16 +255,18 @@ This is what the shell looks like::
 
     [ ... Scrapy log here ... ]
 
+    2014-01-23 17:11:42-0400 [default] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None)
     [s] Available Scrapy objects:
-    [s] 2010-08-19 21:45:59-0300 [default] INFO: Spider closed (finished)
-    [s]   sel        <Selector (http://www.dmoz.org/Computers/Programming/Languages/Python/Books/) xpath=None>
-    [s]   item       Item()
+    [s]   crawler    <scrapy.crawler.Crawler object at 0x3636b50>
+    [s]   item       {}
     [s]   request    <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
     [s]   response   <200 http://www.dmoz.org/Computers/Programming/Languages/Python/Books/>
-    [s]   spider     <Spider 'default' at 0x1b6c2d0>
+    [s]   sel        <Selector xpath=None data=u'<html>\r\n<head>\r\n<meta http-equiv="Conten'>
+    [s]   settings   <CrawlerSettings module=None>
+    [s]   spider     <Spider 'default' at 0x3cebf50>
     [s] Useful shortcuts:
-    [s]   shelp()           Print this help
-    [s]   fetch(req_or_url) Fetch a new request or URL and update shell objects
+    [s]   shelp()           Shell help (print this help)
+    [s]   fetch(req_or_url) Fetch request (or URL) and update local objects
     [s]   view(response)    View response in a browser
 
     In [1]: 
@@ -278,13 +282,13 @@ on response's type.
 So let's try it::
 
    In [1]: sel.xpath('//title')
-   Out[1]: [<Selector (title) xpath=//title>]
+   Out[1]: [<Selector xpath='//title' data=u'<title>Open Directory - Computers: Progr'>]
 
    In [2]: sel.xpath('//title').extract()
    Out[2]: [u'<title>Open Directory - Computers: Programming: Languages: Python: Books</title>']
 
    In [3]: sel.xpath('//title/text()')
-   Out[3]: [<Selector (text) xpath=//title/text()>]
+   Out[3]: [<Selector xpath='//title/text()' data=u'Open Directory - Computers: Programming:'>]
 
    In [4]: sel.xpath('//title/text()').extract()
    Out[4]: [u'Open Directory - Computers: Programming: Languages: Python: Books']

diff --git a/docs/topics/shell.rst b/docs/topics/shell.rst
@@ -71,6 +71,8 @@ content).
 
 Those objects are:
 
+ * ``crawler`` - the current :class:`~scrapy.crawler.Crawler` object.
+
  * ``spider`` - the Spider which is known to handle the URL, or a
    :class:`~scrapy.spider.Spider` object if there is no spider found for
    the current URL
@@ -110,16 +112,17 @@ Then, the shell fetches the URL (using the Scrapy downloader) and prints the
 list of available objects and useful shortcuts (you'll notice that these lines
 all start with the ``[s]`` prefix)::
 
-    [s] Available objects
-    [s]   sel       <Selector (http://scrapy.org) xpath=None>
-    [s]   item      Item()
-    [s]   request   <http://scrapy.org>
-    [s]   response  <http://scrapy.org>
-    [s]   settings  <Settings 'mybot.settings'>
-    [s]   spider    <Spider 'default' at 0x2bed9d0>
+    [s] Available Scrapy objects:
+    [s]   crawler    <scrapy.crawler.Crawler object at 0x1e16b50>
+    [s]   item       {}
+    [s]   request    <GET http://scrapy.org>
+    [s]   response   <200 http://scrapy.org>
+    [s]   sel        <Selector xpath=None data=u'<html>\n  <head>\n    <meta charset="utf-8'>
+    [s]   settings   <CrawlerSettings module=None>
+    [s]   spider     <Spider 'default' at 0x20c6f50>
     [s] Useful shortcuts:
-    [s]   shelp()           Prints this help.
-    [s]   fetch(req_or_url) Fetch a new request or URL and update objects
+    [s]   shelp()           Shell help (print this help)
+    [s]   fetch(req_or_url) Fetch request (or URL) and update local objects
     [s]   view(response)    View response in a browser
 
     >>>
@@ -131,24 +134,27 @@ After that, we can star playing with the objects::
 
     >>> fetch("http://slashdot.org")
     [s] Available Scrapy objects:
-    [s]   sel        <Selector (http://slashdot.org) xpath=None>
-    [s]   item       JobItem()
+    [s]   crawler    <scrapy.crawler.Crawler object at 0x1a13b50>
+    [s]   item       {}
     [s]   request    <GET http://slashdot.org>
     [s]   response   <200 http://slashdot.org>
-    [s]   settings   <Settings 'jobsbot.settings'>
-    [s]   spider     <Spider 'default' at 0x3c44a10>
+    [s]   sel        <Selector xpath=None data=u'<html lang="en">\n<head>\n\n\n\n\n<script id="'>
+    [s]   settings   <CrawlerSettings module=None>
+    [s]   spider     <Spider 'default' at 0x20c6f50>
     [s] Useful shortcuts:
     [s]   shelp()           Shell help (print this help)
     [s]   fetch(req_or_url) Fetch request (or URL) and update local objects
     [s]   view(response)    View response in a browser
 
-    >>> sel.xpath("//h2/text()").extract()
-    [u'News for nerds, stuff that matters']
+    >>> sel.xpath('//title/text()').extract()
+    [u'Slashdot: News for nerds, stuff that matters']
 
     >>> request = request.replace(method="POST")
 
     >>> fetch(request)
-    2009-04-03 00:57:39-0300 [default] ERROR: Downloading <http://slashdot.org> from <None>: 405 Method Not Allowed
+    [s] Available Scrapy objects:
+    [s]   crawler    <scrapy.crawler.Crawler object at 0x1e16b50>
+    ...
 
     >>>
 
@@ -165,47 +171,54 @@ This can be achieved by using the ``scrapy.shell.inspect_response`` function.
 
 Here's an example of how you would call it from your spider::
 
+    from scrapy.spider import Spider
+
+
     class MySpider(Spider):
-        ...
+        name = "myspider"
+        start_urls = [
+            "http://example.com",
+            "http://example.org",
+            "http://example.net",
+        ]
 
         def parse(self, response):
-            if response.url == 'http://www.example.com/products.php':
+            # We want to inspect one specific response.
+            if ".org" in response.url:
                 from scrapy.shell import inspect_response
                 inspect_response(response)
 
-            # ... your parsing code ..
+            # Rest of parsing code.
 
 When you run the spider, you will get something similar to this::
 
-    2009-08-27 19:15:25-0300 [example.com] DEBUG: Crawled <http://www.example.com/> (referer: <None>)
-    2009-08-27 19:15:26-0300 [example.com] DEBUG: Crawled <http://www.example.com/products.php> (referer: <http://www.example.com/>)
-    [s] Available objects
-    [s]   sel       <Selector (http://www.example.com/products.php) xpath=None>
+    2014-01-23 17:48:31-0400 [myspider] DEBUG: Crawled (200) <GET http://example.com> (referer: None)
+    2014-01-23 17:48:31-0400 [myspider] DEBUG: Crawled (200) <GET http://example.org> (referer: None)
+    [s] Available Scrapy objects:
+    [s]   crawler    <scrapy.crawler.Crawler object at 0x1e16b50>
     ...
 
     >>> response.url
-    'http://www.example.com/products.php'
+    'http://example.org'
 
 Then, you can check if the extraction code is working::
 
-    >>> sel.xpath('//h1')
+    >>> sel.xpath('//h1[@class="fn"]')
     []
 
 Nope, it doesn't. So you can open the response in your web browser and see if
 it's the response you were expecting::
 
     >>> view(response)
-    >>>
+    True
 
 Finally you hit Ctrl-D (or Ctrl-Z in Windows) to exit the shell and resume the
 crawling::
 
     >>> ^D
-    2009-08-27 19:15:25-0300 [example.com] DEBUG: Crawled <http://www.example.com/product.php?id=1> (referer: <None>)
-    2009-08-27 19:15:25-0300 [example.com] DEBUG: Crawled <http://www.example.com/product.php?id=2> (referer: <None>)
-    # ...
+    2014-01-23 17:50:03-0400 [myspider] DEBUG: Crawled (200) <GET http://example.net> (referer: None)
+    ...
 
 Note that you can't use the ``fetch`` shortcut here since the Scrapy engine is
 blocked by the shell. However, after you leave the shell, the spider will
 continue crawling where it stopped, as shown above.
-
diff --git a/scrapy/shell.py b/scrapy/shell.py
@@ -1,30 +1,32 @@
-"""
-Scrapy Shell
+"""Scrapy Shell
 
 See documentation in docs/topics/shell.rst
+
 """
 from __future__ import print_function
+
 import signal
 
 from twisted.internet import reactor, threads, defer
 from twisted.python import threadable
 from w3lib.url import any_to_uri
 
+from scrapy.crawler import Crawler
+from scrapy.exceptions import IgnoreRequest
+from scrapy.http import Request, Response
 from scrapy.item import BaseItem
-from scrapy.spider import Spider
 from scrapy.selector import Selector
-from scrapy.utils.spider import create_spider_for_request
+from scrapy.settings import Settings
+from scrapy.spider import Spider
+from scrapy.utils.console import start_python_console
 from scrapy.utils.misc import load_object
 from scrapy.utils.response import open_in_browser
-from scrapy.utils.console import start_python_console
-from scrapy.settings import Settings
-from scrapy.http import Request, Response
-from scrapy.exceptions import IgnoreRequest
+from scrapy.utils.spider import create_spider_for_request
 
 
 class Shell(object):
 
-    relevant_classes = (Spider, Request, Response, BaseItem,
+    relevant_classes = (Crawler, Spider, Request, Response, BaseItem,
                         Selector, Settings)
 
     def __init__(self, crawler, update_vars=None, code=None):
@@ -91,6 +93,7 @@ def fetch(self, request_or_url, spider=None):
         self.populate_vars(response, request, spider)
 
     def populate_vars(self, response=None, request=None, spider=None):
+        self.vars['crawler'] = self.crawler
         self.vars['item'] = self.item_class()
         self.vars['settings'] = self.crawler.settings
         self.vars['spider'] = spider