You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If one attempts to use scrapy-playwright with scrapy shell, simply doing a fetch results in a hang.
Steps to Reproduce
install scrapy-playwright,
launch scrapy shell with the AsyncioSelectorReactor,
run fetch('https://www.google.com', meta={'playwright': True, 'playwright_include_page': True})
Expected behavior:
The request should be handled by scrapy-playwright and result in a response.
Actual behavior:
The request is blocked soon after the second thread is created, right before the browser page is created. The traceback with Ctrl-C shows that execution is blocked in blockingCallFromThread. On closing the shell, the error message:
asyncio: Task was destroyed but it is pending!
task: <Task pending name='Task-163' coro=<ScrapyPlaywrightDownloadHandler._download_request() running at /home/petros/.local/lib/python3.10/site-packages/scrapy_playwright/handler.py:272> cb=[Deferred.fromFuture..adapt() at /usr/lib/python3.10/site-packages/twisted/internet/defer.py:1063]>
wRAR
changed the title
Asyncio event loop handling results in hang
Asyncio event loop handling results in hang in scrapy shell with scrapy-playwright
Feb 16, 2023
Description
If one attempts to use scrapy-playwright with scrapy shell, simply doing a
fetch
results in a hang.Steps to Reproduce
AsyncioSelectorReactor
,fetch('https://www.google.com', meta={'playwright': True, 'playwright_include_page': True})
Expected behavior:
The request should be handled by scrapy-playwright and result in a response.
Actual behavior:
The request is blocked soon after the second thread is created, right before the browser page is created. The traceback with Ctrl-C shows that execution is blocked in
blockingCallFromThread
. On closing the shell, the error message:appears.
Reproduces how often:
Always.
Versions
Scrapy : 2.8.0
lxml : 4.9.2.0
libxml2 : 2.9.14
cssselect : 1.2.0
parsel : 1.7.0
w3lib : 1.22.0
Twisted : 22.10.0
Python : 3.10.9 (main, Dec 08 2022, 14:49:06) [GCC]
pyOpenSSL : 23.0.0 (OpenSSL 3.0.7 1 Nov 2022)
cryptography : 39.0.1
Platform : Linux-6.1.8-1-default-x86_64-with-glibc2.36
Additional context
There is a long thread about the root cause in https://discord.com/channels/851364676688543744/1073689007927590932. The summary is that
set_asyncio_event_loop
seems to have the following issues:get_asyncio_event_loop_policy().get_event_loop()
, which results in the edge case described in asyncio Policies documentation needs clarification python/cpython#96377 (comment), namelyset_asyncio_event_loop
creates an event loop, even though another already exists.asyncio.get_event_loop_policy()
, which will probably be deprecated, according to asyncio Policies documentation needs clarification python/cpython#96377 (comment).The text was updated successfully, but these errors were encountered: