Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to use on Windows? NotImplementedError of _make_subprocess_transport for SelectorEventLoop #7

Closed
yadalik opened this issue Mar 25, 2021 · 7 comments

Comments

@yadalik
Copy link

yadalik commented Mar 25, 2021

Mostly to ensure, that I'm out of options, except migrating to Linux (or using WSL2).
Windows 10, Python 3.8.5, Scrapy 2.4.1, playwright-1.9.2, scrapy-playwright 0.0.3
TL;DR: asyncioEventLoop built on top of SelectorEventLoop, and by design need from there addReader (or maybe something else), and won't work with ProactorEventLoop. But also, subprocesses on windows supported only in ProactorEventLoop, and not implemented in SelectorEventLoop.
The reasons mostly described here: https://docs.python.org/3/library/asyncio-platforms.html#asyncio-windows-subprocess

With process = CrawlerProcess(get_project_settings()) in starter.py:

from scrapy.utils.reactor import install_reactor

install_reactor('twisted.internet.asyncioreactor.AsyncioSelectorReactor')

In settings.py:

DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

For Twisted == 20.3.0:

starter.py", line 8, in <module>
    install_reactor('twisted.internet.asyncioreactor.AsyncioSelectorReactor')
  File "C:\Users\i\miniconda3\envs\yu\lib\site-packages\scrapy\utils\reactor.py", line 66, in install_reactor
    asyncioreactor.install(eventloop=event_loop)
  File "C:\Users\i\miniconda3\envs\yu\lib\site-packages\twisted\internet\asyncioreactor.py", line 320, in install
    reactor = AsyncioSelectorReactor(eventloop)
  File "C:\Users\i\miniconda3\envs\yu\lib\site-packages\twisted\internet\asyncioreactor.py", line 69, in __init__
    super().__init__()
  File "C:\Users\i\miniconda3\envs\yu\lib\site-packages\twisted\internet\base.py", line 571, in __init__
    self.installWaker()
  File "C:\Users\i\miniconda3\envs\yu\lib\site-packages\twisted\internet\posixbase.py", line 286, in installWaker
    self.addReader(self.waker)
  File "C:\Users\i\miniconda3\envs\yu\lib\site-packages\twisted\internet\asyncioreactor.py", line 151, in addReader
    self._asyncioEventloop.add_reader(fd, callWithLogger, reader,
  File "C:\Users\i\miniconda3\envs\yu\lib\asyncio\events.py", line 501, in add_reader
    raise NotImplementedError
NotImplementedError

For Twisted-21.2.0:

starter.py", line 8, in <module>
    install_reactor('twisted.internet.asyncioreactor.AsyncioSelectorReactor')
  File "C:\Users\i\miniconda3\envs\yu\lib\site-packages\scrapy\utils\reactor.py", line 66, in install_reactor
    asyncioreactor.install(eventloop=event_loop)
  File "C:\Users\i\miniconda3\envs\yu\lib\site-packages\twisted\internet\asyncioreactor.py", line 307, in install
    reactor = AsyncioSelectorReactor(eventloop)
  File "C:\Users\i\miniconda3\envs\yu\lib\site-packages\twisted\internet\asyncioreactor.py", line 60, in __init__
    raise TypeError(
TypeError: SelectorEventLoop required, instead got: <ProactorEventLoop running=False closed=False debug=False>

(writing things below just for easier googling for errors, because of course those actions will not help):

Also, if we try to set for CrawlerProcess in starter.py:
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) before installing reactor or just set SelectorEventLoop here:
install_reactor('twisted.internet.asyncioreactor.AsyncioSelectorReactor', event_loop_path='asyncio.SelectorEventLoop') - we will get NotImplementedError

Even if we not using starter script and will start spider from terminal with scrapy crawl spider_name with
ASYNCIO_EVENT_LOOP = "asyncio.SelectorEventLoop" in settings.py

future: <Task finished name='Task-4' coro=<Connection.run() done, defined at c:\users\i\miniconda3\envs\yu\lib\site-packages\playwright\_impl\_connection.py:163> exception=NotImplementedError()>
Traceback (most recent call last):
  File "c:\users\i\miniconda3\envs\yu\lib\site-packages\playwright\_impl\_connection.py", line 166, in run
    await self._transport.run()
  File "c:\users\i\miniconda3\envs\yu\lib\site-packages\playwright\_impl\_transport.py", line 60, in run
    proc = await asyncio.create_subprocess_exec(
  File "c:\users\i\miniconda3\envs\yu\lib\asyncio\subprocess.py", line 236, in create_subprocess_exec
    transport, protocol = await loop.subprocess_exec(
  File "c:\users\i\miniconda3\envs\yu\lib\asyncio\base_events.py", line 1630, in subprocess_exec
    transport = await self._make_subprocess_transport(
  File "c:\users\i\miniconda3\envs\yu\lib\asyncio\base_events.py", line 491, in _make_subprocess_transport
    raise NotImplementedError
NotImplementedError
@elacuesta
Copy link
Member

elacuesta commented Mar 28, 2021

Indeed, it seems to me like it's currently not possible to run scrapy-playwright on Windows.

In the link to the Python docs that you posted, it says that

On Windows, the default event loop ProactorEventLoop supports subprocesses, whereas SelectorEventLoop does not

Playwright does make use of subprocesses, but the only available Twisted reactor based on asyncio event loops is AsyncioSelectorReactor. Here is where Twisted checks the event loop, and raises one of the exceptions you are getting.

Regarding the Windows Subsystem for Linux, I have zero experience with that, and I don't have access to a Windows machine to try it. If you do go that way, I'd very much appreciate you coming back and reporting your findings.

@yadalik
Copy link
Author

yadalik commented Apr 11, 2021

Hi, with WSL2, at least wsl2 on Ubuntu 20.04 LTS works quite well. It does not need any specific settings (like setting up event_loop_path or ASYNCIO_EVENT_LOOP), except changing start of links to any files from windows-specific drive_letter:\ to /mnt/drive_letter/.
Also, there will be self-explanatory errors on first start about how to install playwright and its browser binaries.

Those links can help if choose to use wsl2 instead of installing Linux:

  1. https://docs.microsoft.com/en-us/windows/wsl/install-win10 - guide on installing wsl2 from Microsoft.
  2. https://www.jetbrains.com/help/pycharm/using-wsl-as-a-remote-interpreter.html - guide for PyCharm on debugging and starting with wsl2 (only on pro-version or 30-day evaluation).
  3. https://code.visualstudio.com/docs/remote/wsl-tutorial - guide for Visual Studio Code (free).

@BruceLee569
Copy link

BruceLee569 commented May 18, 2022

Indeed, it seems to me like it's currently not possible to run scrapy-playwright on Windows.

In the link to the Python docs that you posted, it says that

On Windows, the default event loop ProactorEventLoop supports subprocesses, whereas SelectorEventLoop does not

Playwright does make use of subprocesses, but the only available Twisted reactor based on asyncio event loops is AsyncioSelectorReactor. Here is where Twisted checks the event loop, and raises one of the exceptions you are getting.

Regarding the Windows Subsystem for Linux, I have zero experience with that, and I don't have access to a Windows machine to try it. If you do go that way, I'd very much appreciate you coming back and reporting your findings.

Hi, I need to run both scrapy and playwright on a Windows computer, I don't know much about event loops, but still want to ask you after consulting the only explanation of ProactorEventLoop in the official Python document and twisted's own implementation of windows iocpreactor, is it impossible for the two to run compatible? And this is the only related PR I found in twisted project.

@elacuesta
Copy link
Member

AFAICT there hasn't been any development on this, my original comment still applies.
Playwright needs subprocess, SelectorEventLoop does not support subprocess on Windows (I don't know what are the technical reasons behind this), and the asyncio twisted reactor is based on SelectorEventLoop.

@BruceLee569
Copy link

BruceLee569 commented May 19, 2022

@elacuesta Ok, thank you very much for your reply! BTW, the official document of Playwright is also mentioned here.

@elacuesta
Copy link
Member

That's good to know, I'll add a link to that on the README. Thanks!

@elacuesta
Copy link
Member

Updated the readme: 1c5f96e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants