Skip to content

Scrapy Shell Always Raises RuntimeError but Works Fine #5742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
CodingMoeButa opened this issue Dec 1, 2022 · 5 comments
Closed

Scrapy Shell Always Raises RuntimeError but Works Fine #5742

CodingMoeButa opened this issue Dec 1, 2022 · 5 comments

Comments

@CodingMoeButa
Copy link

Description

Scrapy Shell always raises RuntimeError even though I request any URL, but results fine.

Steps to Reproduce

Execute scrapy shell 'https://any.url', return:

2022-12-01 21:30:51 [scrapy.utils.log] INFO: Versions: lxml 4.9.1.0, libxml2 2.9.12, cssselect 1.1.0, parsel 1.7.0, w3lib 2.0.1, Twisted 22.10.0, Python 3.10.5 (tags/v3.10.5:f377153, Jun  6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)], pyOpenSSL 22.1.0 (OpenSSL 3.0.7 1 Nov 2022), cryptography 38.0.3, Platform Windows-10-10.0.19045-SP0
2022-12-01 21:30:51 [scrapy.crawler] INFO: Overridden settings:
{'AUTOTHROTTLE_ENABLED': True,
 'AUTOTHROTTLE_START_DELAY': 1,
 'AUTOTHROTTLE_TARGET_CONCURRENCY': 8.0,
 'BOT_NAME': 'lightnovel',
 'CONCURRENT_REQUESTS': 32,
 'CONCURRENT_REQUESTS_PER_IP': 16,
 'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
 'LOGSTATS_INTERVAL': 0,
 'LOG_LEVEL': 'INFO',
 'NEWSPIDER_MODULE': 'lightnovel.spiders',
 'REQUEST_FINGERPRINTER_IMPLEMENTATION': '2.7',
 'SPIDER_MODULES': ['lightnovel.spiders'],
 'TWISTED_REACTOR': 'twisted.internet.asyncioreactor.AsyncioSelectorReactor',
 'USER_AGENT': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 '
               '(KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36 '
               'Edg/107.0.1418.52 Scrapy/2.7.1 (+https://scrapy.org) '
               'LightnovelSpider/3.0'}
2022-12-01 21:30:51 [scrapy.extensions.telnet] INFO: Telnet Password: 797751ad72cc1c35
2022-12-01 21:30:51 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.throttle.AutoThrottle']
2022-12-01 21:30:51 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'lightnovel.middlewares.LightnovelDownloaderMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2022-12-01 21:30:51 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2022-12-01 21:30:51 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2022-12-01 21:30:51 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2022-12-01 21:30:51 [scrapy.core.engine] INFO: Spider opened
2022-12-01 21:30:52 [default] INFO: Spider opened: default
2022-12-01 21:30:52 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.baidu.com> (referer: None)
Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\twisted\internet\defer.py", line 892, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "C:\Python310\lib\site-packages\scrapy\utils\defer.py", line 285, in f
    return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
  File "C:\Python310\lib\site-packages\scrapy\utils\defer.py", line 272, in deferred_from_coro
    event_loop = get_asyncio_event_loop_policy().get_event_loop()
  File "C:\Python310\lib\asyncio\events.py", line 656, in get_event_loop
    raise RuntimeError('There is no current event loop in thread %r.'
RuntimeError: There is no current event loop in thread 'Thread-1 (start)'.
2022-12-01 21:30:52 [py.warnings] WARNING: C:\Python310\lib\site-packages\twisted\internet\defer.py:892: RuntimeWarning: coroutine 'SpiderMiddlewareManager.scrape_response.<locals>.process_callback_output' was never awaited
  current.result = callback(  # type: ignore[misc]

[s] Available Scrapy objects:
[s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)
[s]   crawler    <scrapy.crawler.Crawler object at 0x000002EFC594F6D0>
[s]   item       {}
[s]   request    <GET https://www.baidu.com>
[s]   response   <200 https://www.baidu.com>
[s]   settings   <scrapy.settings.Settings object at 0x000002EFC594F670>
[s]   spider     <DefaultSpider 'default' at 0x2efc5e09450>
[s] Useful shortcuts:
[s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)
[s]   fetch(req)                  Fetch a scrapy.Request and update local objects
[s]   shelp()           Shell help (print this help)
[s]   view(response)    View response in a browser
>>>

Expected behavior: It is not expected to show:

2022-12-01 21:30:52 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.baidu.com> (referer: None)
Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\twisted\internet\defer.py", line 892, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "C:\Python310\lib\site-packages\scrapy\utils\defer.py", line 285, in f
    return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
  File "C:\Python310\lib\site-packages\scrapy\utils\defer.py", line 272, in deferred_from_coro
    event_loop = get_asyncio_event_loop_policy().get_event_loop()
  File "C:\Python310\lib\asyncio\events.py", line 656, in get_event_loop
    raise RuntimeError('There is no current event loop in thread %r.'
RuntimeError: There is no current event loop in thread 'Thread-1 (start)'.
2022-12-01 21:30:52 [py.warnings] WARNING: C:\Python310\lib\site-packages\twisted\internet\defer.py:892: RuntimeWarning: coroutine 'SpiderMiddlewareManager.scrape_response.<locals>.process_callback_output' was never awaited
  current.result = callback(  # type: ignore[misc]

Reproduces how often: Always.

Versions

Scrapy : 2.7.1
lxml : 4.9.1.0
libxml2 : 2.9.12
cssselect : 1.1.0
parsel : 1.7.0
w3lib : 2.0.1
Twisted : 22.10.0
Python : 3.10.5 (tags/v3.10.5:f377153, Jun 6 2022, 16:14:13) [MSC v.1929 64 bit (AMD64)]
pyOpenSSL : 22.1.0 (OpenSSL 3.0.7 1 Nov 2022)
cryptography : 38.0.3
Platform : Windows-10-10.0.19045-SP0

@wRAR
Copy link
Member

wRAR commented Dec 1, 2022

This is certainly the same problem as in #5740. However, what do you mean by working fine? Are you able to access the response?

Also, is the result different when you use fetch(url) instead of passing it as a command line argument?

@CodingMoeButa
Copy link
Author

Yes, I meant I am able to access the response.
The results have no difference between using fetch(url) and passing it as a command line argument.

@wRAR
Copy link
Member

wRAR commented Dec 1, 2022

Interesting, thanks!

Are you able to run spiders with scrapy crawl in this project?

@CodingMoeButa
Copy link
Author

Everything works fine both scrapy crawl and scrapy runspider without exceptions.

@wRAR
Copy link
Member

wRAR commented Dec 1, 2022

Great, then it looks like it's specific to scrapy shell. I'm going to close this issue and write the info from you in the previous one. Thanks for your assistance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants