Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent edge case that creates extra event loop #5832

Merged
merged 6 commits into from
Mar 14, 2023
Merged

Conversation

auxsvr
Copy link
Contributor

@auxsvr auxsvr commented Feb 16, 2023

Fixes #5831.

asyncio.get_event_loop_policy().get_event_loop() is asyncio.get_running_loop() is not always true, as mentioned in python/cpython#96377 (comment). The AsyncioSelectorReactor runs in the second thread and uses the already initialised event loop in the main thread to run the crawler, so a single event loop will be running code from two threads.

According to python/cpython#100160, we might need to replace asyncio.get_event_loop() with asyncio.get_running_loop() if semantics change. Currently, some low level aspects of asyncio are in flux.

`asyncio.get_event_loop_policy().get_event_loop() is asyncio.get_running_loop()` is not always true, as mentioned in python/cpython#96377 (comment). The AsyncioSelectorReactor runs in the second thread and uses the already initialised event loop in the main thread to run the crawler, so a single event loop will be running code from two threads.
@Gallaecio
Copy link
Member

I expect tests to break, we had to change this recently for a reason. Let’s see…

scrapy/utils/reactor.py Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Feb 16, 2023

Codecov Report

Merging #5832 (aabd084) into master (69d6894) will decrease coverage by 0.11%.
The diff coverage is 86.55%.

❗ Current head aabd084 differs from pull request most recent head e240bee. Consider uploading reports for the commit e240bee to get more accurate results

@@            Coverage Diff             @@
##           master    #5832      +/-   ##
==========================================
- Coverage   88.94%   88.84%   -0.11%     
==========================================
  Files         162      162              
  Lines       11002    11055      +53     
  Branches     1798     1800       +2     
==========================================
+ Hits         9786     9822      +36     
- Misses        937      954      +17     
  Partials      279      279              
Impacted Files Coverage Δ
scrapy/commands/parse.py 29.26% <42.10%> (+1.07%) ⬆️
scrapy/utils/reactor.py 79.78% <70.00%> (-4.31%) ⬇️
scrapy/core/http2/agent.py 96.42% <83.33%> (+0.04%) ⬆️
scrapy/core/http2/protocol.py 83.49% <85.71%> (+0.07%) ⬆️
scrapy/commands/genspider.py 86.32% <100.00%> (-1.07%) ⬇️
scrapy/core/downloader/__init__.py 92.75% <100.00%> (+0.27%) ⬆️
scrapy/core/engine.py 84.19% <100.00%> (+0.17%) ⬆️
scrapy/core/http2/stream.py 91.90% <100.00%> (ø)
scrapy/core/scraper.py 84.61% <100.00%> (+0.08%) ⬆️
scrapy/core/spidermw.py 99.44% <100.00%> (ø)
... and 7 more

Replace `get_asyncio_event_loop_policy` with `set_asyncio_event_loop_policy` to minimise usage of `asyncio.get_event_loop_policy`. If the issues with the default loop on Windows are resolved, `set_asyncio_event_loop_policy` will be redundant and so will all the policy functions of asyncio.
@auxsvr auxsvr requested a review from wRAR February 20, 2023 09:36
Copy link
Member

@wRAR wRAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, though I wonder if this can break some other rare use case.

Copy link
Member

@Gallaecio Gallaecio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance you can add a test that only passes after your changes? To make sure we do not break what you are fixing here later on.

scrapy/utils/reactor.py Show resolved Hide resolved
@Gallaecio Gallaecio merged commit d60b4ed into scrapy:master Mar 14, 2023
@Gallaecio
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Asyncio event loop handling results in hang in scrapy shell with scrapy-playwright
3 participants