Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asyncio event loop handling results in hang in scrapy shell with scrapy-playwright #5831

Closed
auxsvr opened this issue Feb 16, 2023 · 0 comments · Fixed by #5832
Closed

Asyncio event loop handling results in hang in scrapy shell with scrapy-playwright #5831

auxsvr opened this issue Feb 16, 2023 · 0 comments · Fixed by #5832

Comments

@auxsvr
Copy link
Contributor

auxsvr commented Feb 16, 2023

Description

If one attempts to use scrapy-playwright with scrapy shell, simply doing a fetch results in a hang.

Steps to Reproduce

  1. install scrapy-playwright,
  2. launch scrapy shell with the AsyncioSelectorReactor,
  3. run fetch('https://www.google.com', meta={'playwright': True, 'playwright_include_page': True})

Expected behavior:
The request should be handled by scrapy-playwright and result in a response.

Actual behavior:
The request is blocked soon after the second thread is created, right before the browser page is created. The traceback with Ctrl-C shows that execution is blocked in blockingCallFromThread. On closing the shell, the error message:

asyncio: Task was destroyed but it is pending!
task: <Task pending name='Task-163' coro=<ScrapyPlaywrightDownloadHandler._download_request() running at /home/petros/.local/lib/python3.10/site-packages/scrapy_playwright/handler.py:272> cb=[Deferred.fromFuture..adapt() at /usr/lib/python3.10/site-packages/twisted/internet/defer.py:1063]>

appears.

Reproduces how often:
Always.

Versions

Scrapy : 2.8.0
lxml : 4.9.2.0
libxml2 : 2.9.14
cssselect : 1.2.0
parsel : 1.7.0
w3lib : 1.22.0
Twisted : 22.10.0
Python : 3.10.9 (main, Dec 08 2022, 14:49:06) [GCC]
pyOpenSSL : 23.0.0 (OpenSSL 3.0.7 1 Nov 2022)
cryptography : 39.0.1
Platform : Linux-6.1.8-1-default-x86_64-with-glibc2.36

Additional context

There is a long thread about the root cause in https://discord.com/channels/851364676688543744/1073689007927590932. The summary is that set_asyncio_event_loop seems to have the following issues:

  1. it uses get_asyncio_event_loop_policy().get_event_loop(), which results in the edge case described in asyncio Policies documentation needs clarification python/cpython#96377 (comment), namely set_asyncio_event_loop creates an event loop, even though another already exists.
  2. it uses asyncio.get_event_loop_policy(), which will probably be deprecated, according to asyncio Policies documentation needs clarification python/cpython#96377 (comment).
@wRAR wRAR changed the title Asyncio event loop handling results in hang Asyncio event loop handling results in hang in scrapy shell with scrapy-playwright Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants