Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inspect_response(response) yields incorrect response in IPython shell #396

Closed
xEtherealx opened this issue Sep 24, 2013 · 12 comments
Closed

inspect_response(response) yields incorrect response in IPython shell #396

xEtherealx opened this issue Sep 24, 2013 · 12 comments
Labels
bug
Milestone

Comments

@xEtherealx
Copy link

@xEtherealx xEtherealx commented Sep 24, 2013

Example case (requires registration at example site, and even then would be hard to use as a use-case; modify to suit your needs): http://pastebin.com/GT8N893q

In the above example, the response.meta printout in after_submit callback does not match that within the inspect_response shell on the second iteration (the first is correct). It appears that inspect_response has a stale response the second time.

@dangra
Copy link
Member

@dangra dangra commented Sep 24, 2013

Bug Confirmed.

SPIDER:

from scrapy.spider import BaseSpider
from scrapy.http import Request
from scrapy.shell import inspect_response

class I396Spider(BaseSpider):

    name = 'i396'
    start_urls = ('http://httpbin.org/stream/1', 'http://httpbin.org/stream/2')

    def parse(self, response):
        print response.body
        inspect_response(response)

OUTPUT:

$ scrapy runspider i396.py 
2013-09-24 16:23:12-0300 [scrapy] INFO: Scrapy 0.19.0 started (bot: scrapybot)
2013-09-24 16:23:12-0300 [scrapy] DEBUG: Optional features available: ssl, http11, boto, django
2013-09-24 16:23:12-0300 [scrapy] DEBUG: Overridden settings: {}
2013-09-24 16:23:12-0300 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2013-09-24 16:23:13-0300 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2013-09-24 16:23:13-0300 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2013-09-24 16:23:13-0300 [scrapy] DEBUG: Enabled item pipelines: 
2013-09-24 16:23:13-0300 [i396] INFO: Spider opened
2013-09-24 16:23:13-0300 [i396] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2013-09-24 16:23:13-0300 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6024
2013-09-24 16:23:13-0300 [scrapy] DEBUG: Web service listening on 0.0.0.0:6081
2013-09-24 16:23:14-0300 [i396] DEBUG: Crawled (200) <GET http://httpbin.org/stream/2> (referer: None)
{"origin": "186.8.225.21", "id": 0, "url": "http://httpbin.org/stream/2", "args": {}, "headers": {"User-Agent": "Scrapy/0.19.0 (+http://scrapy.org)", "Accept-Language": "en", "Accept-Encoding": "x-gzip,gzip,deflate", "Connection": "close", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Host": "httpbin.org"}}
{"origin": "186.8.225.21", "id": 1, "url": "http://httpbin.org/stream/2", "args": {}, "headers": {"User-Agent": "Scrapy/0.19.0 (+http://scrapy.org)", "Accept-Language": "en", "Accept-Encoding": "x-gzip,gzip,deflate", "Connection": "close", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Host": "httpbin.org"}}

[s] Available Scrapy objects:
[s]   item       {}
[s]   request    <GET http://httpbin.org/stream/2>
[s]   response   <200 http://httpbin.org/stream/2>
[s]   settings   <CrawlerSettings module=None>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
Python 2.7.3 (default, Sep 26 2012, 21:51:14) 
Type "copyright", "credits" or "license" for more information.

IPython 0.13.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
>>> print res
%reset            %reset_selective  response          
>>> print response.bo
response.body             response.body_as_unicode  
>>> print response.body
{"origin": "186.8.225.21", "id": 0, "url": "http://httpbin.org/stream/2", "args": {}, "headers": {"User-Agent": "Scrapy/0.19.0 (+http://scrapy.org)", "Accept-Language": "en", "Accept-Encoding": "x-gzip,gzip,deflate", "Connection": "close", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Host": "httpbin.org"}}
{"origin": "186.8.225.21", "id": 1, "url": "http://httpbin.org/stream/2", "args": {}, "headers": {"User-Agent": "Scrapy/0.19.0 (+http://scrapy.org)", "Accept-Language": "en", "Accept-Encoding": "x-gzip,gzip,deflate", "Connection": "close", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Host": "httpbin.org"}}

>>> 
Do you really want to exit ([y]/n)? 

2013-09-24 16:23:30-0300 [i396] DEBUG: Crawled (200) <GET http://httpbin.org/stream/1> (referer: None)
{"args": {}, "url": "http://httpbin.org/stream/1", "id": 0, "headers": {"User-Agent": "Scrapy/0.19.0 (+http://scrapy.org)", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Host": "httpbin.org", "Connection": "close", "Accept-Encoding": "x-gzip,gzip,deflate", "Accept-Language": "en"}, "origin": "186.8.225.21"}

[s] Available Scrapy objects:
[s]   item       {}
[s]   request    <GET http://httpbin.org/stream/1>
[s]   response   <200 http://httpbin.org/stream/1>
[s]   settings   <CrawlerSettings module=None>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
Python 2.7.3 (default, Sep 26 2012, 21:51:14) 
Type "copyright", "credits" or "license" for more information.

IPython 0.13.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
>>> print response.body
{"origin": "186.8.225.21", "id": 0, "url": "http://httpbin.org/stream/2", "args": {}, "headers": {"User-Agent": "Scrapy/0.19.0 (+http://scrapy.org)", "Accept-Language": "en", "Accept-Encoding": "x-gzip,gzip,deflate", "Connection": "close", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Host": "httpbin.org"}}
{"origin": "186.8.225.21", "id": 1, "url": "http://httpbin.org/stream/2", "args": {}, "headers": {"User-Agent": "Scrapy/0.19.0 (+http://scrapy.org)", "Accept-Language": "en", "Accept-Encoding": "x-gzip,gzip,deflate", "Connection": "close", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Host": "httpbin.org"}}

>>> 
@dangra dangra closed this in aa6fb7d Oct 10, 2013
dangra added a commit that referenced this issue Oct 10, 2013
IPython embedding code borrowed from pallets/werkzeug#85
gcmalloc added a commit to gcmalloc/scrapy that referenced this issue Oct 10, 2013
@Tarliton
Copy link

@Tarliton Tarliton commented Nov 28, 2016

This bug is still happening, with same spider just change:

inspect_response(response)

to

inspect_response(response, self)

Even if I inspect the second response, inspect_response returns data from the first one.

scrapy version 1.2.1

output:

{"url": "http://httpbin.org/stream/2", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Mozilla/5.0 (Windows; U; MSIE 9.0; WIndows NT 9.0; en-US))"}, "args": {}, "id": 0, "origin": "189.26.136.17"}
{"url": "http://httpbin.org/stream/2", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Mozilla/5.0 (Windows; U; MSIE 9.0; WIndows NT 9.0; en-US))"}, "args": {}, "id": 1, "origin": "189.26.136.17"}

2016-11-28 18:14:00 [traitlets] DEBUG: Using default logger
2016-11-28 18:14:00 [traitlets] DEBUG: Using default logger
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x7fabe3810a50>
[s]   item       {}
[s]   request    <GET http://httpbin.org/stream/1>
[s]   response   <200 http://httpbin.org/stream/1>
[s]   settings   <scrapy.settings.Settings object at 0x7fabe38107d0>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
In [2]: view(response)
Out[2]: True

In [3]: view(response)
Out[3]: True

In [4]: view(response)
Out[4]: True

In [5]: response.body
Out[5]: '{"url": "http://httpbin.org/stream/1", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Mozilla/5.0 (Windows; U; MSIE 9.0; WIndows NT 9.0; en-US))"}, "args": {}, "id": 0, "origin": "189.26.136.17"}\n'
@redapple
Copy link
Contributor

@redapple redapple commented Nov 29, 2016

@Tarliton , thank you for reporting.
I can reproduce this with Python2 and Jupyter 1.0 / IPython 5.1

$ scrapy version -v
Scrapy    : 1.2.1
lxml      : 3.6.4.0
libxml2   : 2.9.4
Twisted   : 16.6.0
Python    : 2.7.12 (default, Nov 19 2016, 06:48:10) - [GCC 5.4.0 20160609]
pyOpenSSL : 16.2.0 (OpenSSL 1.0.2g  1 Mar 2016)
Platform  : Linux-4.4.0-47-generic-x86_64-with-Ubuntu-16.04-xenial

$ pip freeze (redacted)
ipykernel==4.5.1
ipython==5.1.0
ipython-genutils==0.1.0
ipywidgets==5.2.2
jupyter==1.0.0
jupyter-client==4.4.0
jupyter-console==5.0.0
jupyter-core==4.2.0

When running the test spider, after the 2nd Ctrl+D, response is still pointing to the 1st received response:

$ cat test.py
import scrapy
from scrapy.shell import inspect_response

class I396Spider(scrapy.Spider):

    name = 'i396'
    start_urls = ('http://httpbin.org/stream/1', 'http://httpbin.org/stream/2')

    def parse(self, response):
        print(response.body)
        inspect_response(response, self)

$ scrapy runspider test.py 
2016-11-29 11:13:37 [scrapy] INFO: Scrapy 1.2.1 started (bot: scrapybot)
(...)
2016-11-29 11:13:37 [scrapy] INFO: Spider opened
2016-11-29 11:13:37 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-11-29 11:13:37 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-11-29 11:13:37 [scrapy] DEBUG: Crawled (200) <GET http://httpbin.org/stream/2> (referer: None)
2016-11-29 11:13:37 [scrapy] DEBUG: Crawled (200) <GET http://httpbin.org/stream/1> (referer: None)
{"url": "http://httpbin.org/stream/2", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 0, "origin": "89.84.122.217"}
{"url": "http://httpbin.org/stream/2", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 1, "origin": "89.84.122.217"}

2016-11-29 11:13:38 [traitlets] DEBUG: Using default logger
2016-11-29 11:13:38 [traitlets] DEBUG: Using default logger
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f63db8d4c90>
[s]   item       {}
[s]   request    <GET http://httpbin.org/stream/2>
[s]   response   <200 http://httpbin.org/stream/2>
[s]   settings   <scrapy.settings.Settings object at 0x7f63db8d4b90>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
In [1]: response
Out[1]: <200 http://httpbin.org/stream/2>

In [2]: response.body
Out[2]: '{"url": "http://httpbin.org/stream/2", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 0, "origin": "89.84.122.217"}\n{"url": "http://httpbin.org/stream/2", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 1, "origin": "89.84.122.217"}\n'

In [3]:                                                                                                                                                                                                                                                                        
Do you really want to exit ([y]/n)? y

{"url": "http://httpbin.org/stream/1", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 0, "origin": "89.84.122.217"}

2016-11-29 11:13:55 [traitlets] DEBUG: Using default logger
2016-11-29 11:13:55 [traitlets] DEBUG: Using default logger
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f63db8d4c90>
[s]   item       {}
[s]   request    <GET http://httpbin.org/stream/2>
[s]   response   <200 http://httpbin.org/stream/2>
[s]   settings   <scrapy.settings.Settings object at 0x7f63db8d4b90>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
In [3]: response
Out[3]: <200 http://httpbin.org/stream/2>

In [4]: response.body
Out[4]: '{"url": "http://httpbin.org/stream/2", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 0, "origin": "89.84.122.217"}\n{"url": "http://httpbin.org/stream/2", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 1, "origin": "89.84.122.217"}\n'

In [5]: 

In [5]:                                                                                                                                                                                                                                                                        
Do you really want to exit ([y]/n)? 

2016-11-29 11:14:17 [scrapy] INFO: Closing spider (finished)
2016-11-29 11:14:17 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 434,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 1283,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 2,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2016, 11, 29, 10, 14, 17, 467605),
 'log_count/DEBUG': 7,
 'log_count/INFO': 7,
 'response_received_count': 2,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'start_time': datetime.datetime(2016, 11, 29, 10, 13, 37, 388619)}
2016-11-29 11:14:17 [scrapy] INFO: Spider closed (finished)

Note that using the standard python shell, it's working fine:

$ cat ~/.scrapy.cfg
[settings]
shell = python

$ scrapy runspider test.py 
2016-11-29 11:20:17 [scrapy] INFO: Scrapy 1.2.1 started (bot: scrapybot)
(...)
2016-11-29 11:20:17 [scrapy] INFO: Spider opened
2016-11-29 11:20:17 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-11-29 11:20:17 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-11-29 11:20:17 [scrapy] DEBUG: Crawled (200) <GET http://httpbin.org/stream/1> (referer: None)
2016-11-29 11:20:17 [scrapy] DEBUG: Crawled (200) <GET http://httpbin.org/stream/2> (referer: None)
{"url": "http://httpbin.org/stream/1", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 0, "origin": "89.84.122.217"}

[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f370ae24c90>
[s]   item       {}
[s]   request    <GET http://httpbin.org/stream/1>
[s]   response   <200 http://httpbin.org/stream/1>
[s]   settings   <scrapy.settings.Settings object at 0x7f370ae24b90>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
>>> response, response.body
(<200 http://httpbin.org/stream/1>, '{"url": "http://httpbin.org/stream/1", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 0, "origin": "89.84.122.217"}\n')
>>> 
{"url": "http://httpbin.org/stream/2", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 0, "origin": "89.84.122.217"}
{"url": "http://httpbin.org/stream/2", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 1, "origin": "89.84.122.217"}

[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f370ae24c90>
[s]   item       {}
[s]   request    <GET http://httpbin.org/stream/2>
[s]   response   <200 http://httpbin.org/stream/2>
[s]   settings   <scrapy.settings.Settings object at 0x7f370ae24b90>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
>>> response, response.body
(<200 http://httpbin.org/stream/2>, '{"url": "http://httpbin.org/stream/2", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 0, "origin": "89.84.122.217"}\n{"url": "http://httpbin.org/stream/2", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 1, "origin": "89.84.122.217"}\n')
>>> 
2016-11-29 11:21:44 [scrapy] INFO: Crawled 2 pages (at 2 pages/min), scraped 0 items (at 0 items/min)
2016-11-29 11:21:44 [scrapy] INFO: Closing spider (finished)
2016-11-29 11:21:44 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 434,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 1283,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 2,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2016, 11, 29, 10, 21, 44, 331105),
 'log_count/DEBUG': 3,
 'log_count/INFO': 8,
 'response_received_count': 2,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'start_time': datetime.datetime(2016, 11, 29, 10, 20, 17, 72430)}
2016-11-29 11:21:44 [scrapy] INFO: Spider closed (finished)
@redapple redapple reopened this Nov 29, 2016
@redapple redapple removed this from the Scrapy 0.20 milestone Nov 29, 2016
@redapple redapple changed the title inspect_response(response) yields incorrect response inspect_response(response) yields incorrect response in IPython shell Nov 29, 2016
@redapple
Copy link
Contributor

@redapple redapple commented Nov 29, 2016

Same with Python 3:

$ scrapy version -v
Scrapy    : 1.2.1
lxml      : 3.6.4.0
libxml2   : 2.9.4
Twisted   : 16.6.0
Python    : 3.5.2 (default, Nov 17 2016, 17:05:23) - [GCC 5.4.0 20160609]
pyOpenSSL : 16.2.0 (OpenSSL 1.0.2g  1 Mar 2016)
Platform  : Linux-4.4.0-47-generic-x86_64-with-Ubuntu-16.04-xenial

$ scrapy runspider test.py 
2016-11-29 11:38:57 [scrapy] INFO: Scrapy 1.2.1 started (bot: scrapybot)
(...)
2016-11-29 11:38:57 [scrapy] INFO: Spider opened
2016-11-29 11:38:57 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-11-29 11:38:57 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-11-29 11:38:57 [scrapy] DEBUG: Crawled (200) <GET http://httpbin.org/stream/1> (referer: None)
2016-11-29 11:38:57 [scrapy] DEBUG: Crawled (200) <GET http://httpbin.org/stream/2> (referer: None)
b'{"url": "http://httpbin.org/stream/1", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 0, "origin": "89.84.122.217"}\n'
2016-11-29 11:38:58 [traitlets] DEBUG: Using default logger
2016-11-29 11:38:58 [traitlets] DEBUG: Using default logger
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f33fa368828>
[s]   item       {}
[s]   request    <GET http://httpbin.org/stream/1>
[s]   response   <200 http://httpbin.org/stream/1>
[s]   settings   <scrapy.settings.Settings object at 0x7f33f8f44a58>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
In [1]: response, response.text
Out[1]: 
(<200 http://httpbin.org/stream/1>,
 '{"url": "http://httpbin.org/stream/1", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 0, "origin": "89.84.122.217"}\n')

In [2]:                                                                                                                                                                                                                                                                        
Do you really want to exit ([y]/n)? y

b'{"url": "http://httpbin.org/stream/2", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 0, "origin": "89.84.122.217"}\n{"url": "http://httpbin.org/stream/2", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 1, "origin": "89.84.122.217"}\n'
2016-11-29 11:39:09 [traitlets] DEBUG: Using default logger
2016-11-29 11:39:09 [traitlets] DEBUG: Using default logger
[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f33fa368828>
[s]   item       {}
[s]   request    <GET http://httpbin.org/stream/1>
[s]   response   <200 http://httpbin.org/stream/1>
[s]   settings   <scrapy.settings.Settings object at 0x7f33f8f44a58>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
In [2]: response, response.text
Out[2]: 
(<200 http://httpbin.org/stream/1>,
 '{"url": "http://httpbin.org/stream/1", "headers": {"Host": "httpbin.org", "Accept-Language": "en", "Accept-Encoding": "gzip,deflate", "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "User-Agent": "Scrapy/1.2.1 (+http://scrapy.org)"}, "args": {}, "id": 0, "origin": "89.84.122.217"}\n')

In [3]:                                                                                                                                                                                                                                                                        
Do you really want to exit ([y]/n)? 

2016-11-29 11:39:20 [scrapy] INFO: Closing spider (finished)
2016-11-29 11:39:20 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 434,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 1283,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 2,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2016, 11, 29, 10, 39, 20, 860343),
 'log_count/DEBUG': 7,
 'log_count/INFO': 7,
 'response_received_count': 2,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'start_time': datetime.datetime(2016, 11, 29, 10, 38, 57, 538573)}
2016-11-29 11:39:20 [scrapy] INFO: Spider closed (finished)
@Tarliton
Copy link

@Tarliton Tarliton commented Nov 29, 2016

@redapple as we can see, this only happens with ipython. I was debugging for a while and found this:

file: scrapy/utils/console.py

    @wraps(_embed_ipython_shell)
    def wrapper(namespace=namespace, banner=''):
        config = load_default_config()
        # Always use .instace() to ensure _instance propagation to all parents
        # this is needed for <TAB> completion works well for new imports
        shell = InteractiveShellEmbed.instance(
            banner1=banner, user_ns=namespace, config=config)
        shell()
    return wrapper

So the wrapper for ipython calls the instance method. If we check the instance method, the doc string says this:

file: site-packages/traitlets/config/configurable.py

    @classmethod
    def instance(cls, *args, **kwargs):
        """Returns a global instance of this class.

        This method create a new instance if none have previously been created
        and returns a previously created instance is one already exists.

        The arguments and keyword arguments passed to this method are passed
        on to the :meth:`__init__` method of the class upon instantiation.

        Examples
        --------

        Create a singleton class using instance, and retrieve it::

            >>> from traitlets.config.configurable import SingletonConfigurable
            >>> class Foo(SingletonConfigurable): pass
            >>> foo = Foo.instance()
            >>> foo == Foo.instance()
            True

        Create a subclass that is retrived using the base class instance::

            >>> class Bar(SingletonConfigurable): pass
            >>> class Bam(Bar): pass
            >>> bam = Bam.instance()
            >>> bam == Bar.instance()
            True
        """

It's returning a global instance, so the first inspect_response is creating this shell and the next ones are reusing the same shell with old variables. Shall this be correct?

@redapple
Copy link
Contributor

@redapple redapple commented Nov 29, 2016

Hm, I'm pretty new to embedded shell and all.
@eliasdorneles worked around the IPython bits you reference (#856 (comment)) so he may have an idea

@redapple
Copy link
Contributor

@redapple redapple commented Nov 29, 2016

I think it's working with this change:

~/src/scrapy$ git diff
diff --git a/scrapy/utils/console.py b/scrapy/utils/console.py
index 567fd51..aa12346 100644
--- a/scrapy/utils/console.py
+++ b/scrapy/utils/console.py
@@ -17,6 +17,8 @@ def _embed_ipython_shell(namespace={}, banner=''):
         # this is needed for <TAB> completion works well for new imports
         shell = InteractiveShellEmbed.instance(
             banner1=banner, user_ns=namespace, config=config)
+        shell.banner1 = banner
+        shell.push(namespace, interactive=False)
         shell()
     return wrapper

I'll continue checking.

@Tarliton
Copy link

@Tarliton Tarliton commented Nov 29, 2016

that instance part was added here #2229

@redapple
Copy link
Contributor

@redapple redapple commented Nov 29, 2016

@Tarliton , indeed. It works in Scrapy 1.1.3 (the regression appears in Scrapy 1.2)

ahlinc added a commit to ahlinc/scrapy that referenced this issue Nov 30, 2016
The InteractiveShellEmbed class is a singleton
and we need to drop the instance with its clear_instance() method
to rebuild the instance from scratch with fresh environment
for each subsequent drop in.
@ahlinc
Copy link
Contributor

@ahlinc ahlinc commented Nov 30, 2016

@Tarliton @redapple I've fixed it in #2418, please, verify it with the fix.

@redapple
Copy link
Contributor

@redapple redapple commented Nov 30, 2016

Thanks @ahlinc , I'll check on my end.

@redapple
Copy link
Contributor

@redapple redapple commented Nov 30, 2016

@ahlinc , #2418 works for me, thanks!
And thanks @Tarliton for spotting this

@redapple redapple added this to the v1.2.2 milestone Nov 30, 2016
eliasdorneles added a commit that referenced this issue Nov 30, 2016
[MRG+1] Fix #396 re-triggered issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants