-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Fixes issue with calling fetch in scrapy shell. #5748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5748 +/- ##
==========================================
- Coverage 88.91% 84.78% -4.14%
==========================================
Files 162 162
Lines 10963 10968 +5
Branches 1794 1794
==========================================
- Hits 9748 9299 -449
- Misses 932 1398 +466
+ Partials 283 271 -12
|
It passes all tests locally on both Linux and Windows, so I am not certain why it's failing here. |
Can you please also add a test to test_command_shell.py that checks that there are no exceptions printed? I assume it's possible to write one that fails with the current Scrapy on Windows. |
Sure thing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
# create and set new event loop for this thread | ||
event_loop = policy.new_event_loop() | ||
asyncio.set_event_loop(event_loop) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think this accounts for the ASYNCIO_EVENT_LOOP
setting.
See the code were we set the event loop initially:
scrapy/scrapy/utils/reactor.py
Lines 67 to 80 in 3a34fa8
def install_reactor(reactor_path, event_loop_path=None): | |
"""Installs the :mod:`~twisted.internet.reactor` with the specified | |
import path. Also installs the asyncio event loop with the specified import | |
path if the asyncio reactor is enabled""" | |
reactor_class = load_object(reactor_path) | |
if reactor_class is asyncioreactor.AsyncioSelectorReactor: | |
with suppress(error.ReactorAlreadyInstalledError): | |
policy = get_asyncio_event_loop_policy() | |
if event_loop_path is not None: | |
event_loop_class = load_object(event_loop_path) | |
event_loop = event_loop_class() | |
asyncio.set_event_loop(event_loop) | |
else: | |
event_loop = policy.get_event_loop() |
I don’t see a clear way to address this, so I suggest:
- We document, in both the documentation of the setting (ASYNCIO_EVENT_LOOP) and that of the command (scrapy shell) that they are not compatible with each other.
- We create a task to make them compatible at some point in the future.
- We include a test here that demonstrates how
ASYNCIO_EVENT_LOOP
is not respected by scrapy shell. Hopefully the existing test for ASYNCIO_EVENT_LOOP helps with that. Funny enough, the test cannot run on Windows, as we use a Linux-specific alternative loop in those tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct, I didn't even think about that. I made a note in the settings docs for ASYNCIO_EVENT_LOOP.
Where should the test go for showing that the setting isn't respected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should the test go for showing that the setting isn't respected?
I was hoping there would be a way to test this, but after looking into it, I am not sure how to do it myself 😓 Our ASYNCIO_EVENT_LOOP tests are based on logs, but for the same reason it is not easy to use ASYNCIO_EVENT_LOOP in the first place, we do not log a message about whether or not it is being used.
I think we can skip the testing part.
I wonder if we could go a different way here: rather than create the loop if missing when needed, create it as we create the thread that causes the issue in the first place. scrapy/scrapy/commands/shell.py Lines 71 to 80 in 26ebdbf
For example, what if we modify CrawlerProcess.start, so that it takes care of initializing the loop if needed? From there we should have access to settings, i.e. be able to respect ASYNCIO_EVENT_LOOP. |
I had that same thought when you first made the observation. But the thread call that causes the issue is the one called from the fetch command I thought, in the Line 110 in 26ebdbf
|
I opened #5760 which implements the alternative approach of explicitly setting the event loop upon starting of the new thread created by |
Im going to close this PR. Thanks |
Reference Issue Fixes #5740 , #5742
You can recreate the issue with the following script.
I was able to recreate this issue using
scrapy shell
andfetch
on both windows and linux. However it only occurs inside of a project with theTWISTED_REACTOR
setting set to AsyncioSelectorReactor.What causes the issue is when
get_asyncio_event_loop_policy().get_event_loop()
is called, there is no event loop in the thread that the function is called from. Which raises an exception.