Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does not shutdown cleanly on SIGINT (cmd+c) #62

Open
samwillis opened this issue Feb 23, 2022 · 2 comments
Open

Does not shutdown cleanly on SIGINT (cmd+c) #62

samwillis opened this issue Feb 23, 2022 · 2 comments

Comments

@samwillis
Copy link

samwillis commented Feb 23, 2022

It seems that when using scrapy-playwright Scrapy will not shut down cleanly on SIGINT (cmd+c), and you have to force a shutdown with a second cmd+c. If you use the telnet client to run engine.stop() it does seem to be shutting down cleanly. This is needed in order to save the current state for resuming, an unclean shutdown does not save the current state and cannot be resumed. It stops scraping but continues to log its stats every minute.

On some investigation Playwright has a launch arg handle_sigint which I believe indicates to forward a SIGINT to the browser process. I thought it may have been this forwarding of the SIGINT that was causing the hang in the handling of shutdown. When I set this to false (it defaults to true) with PLAYWRIGHT_LAUNCH_OPTIONS it still doesn't shutdown cleanly, however all chromium processes are stoped but Scrapy does not shut down and there are no errors, even during a forced shutdown. It just continues to report its stats every minute showing 0 more pages scraped.

I'm happy to continue investigating how to fix this but would appreciate any pointers in the right direction as to where the issue may be arising.

Edit:

Just want to add I am confident that what is happening is that the SIGINT is still passed on to the browser. With engine.stop() via telnet I see a very graceful shut down (up to a couple of minutes long) with pages currently being processed in my parse call-backs all finishing. SIGINT/cmd+c just immediately kills the browser processes.

@samwillis samwillis changed the title Does not shutdown cleanly on SIGINT (^C) Does not shutdown cleanly on SIGINT (cmd+c) Feb 23, 2022
@samwillis
Copy link
Author

So to answer my own question, I have solved it.

This is a problem upstream with Playwright, currently there isn't a way to prevent a sigint being passed down to playwright. handle_sigint=False stops playwright from detecting the sigint and doing some of its own handling, that probably wants turning of when used with Scrapy, I have set it false.

The other thing I have done is monkey patch playwright to stop passing down the sigint:

# Monkeypatch Playwright
from playwright._impl._transport import PipeTransport, _get_stderr_fileno

def new_pipe_transport_connect_preexec(): # Don't forward signals.
    os.setpgrp()

async def new_pipe_transport_connect(self):
    self._stopped_future: asyncio.Future = asyncio.Future()
    # Hide the command-line window on Windows when using Pythonw.exe
    creationflags = 0
    if sys.platform == "win32" and sys.stdout is None:
        creationflags = subprocess.CREATE_NO_WINDOW

    try:
        # For pyinstaller
        env = os.environ.copy()
        if getattr(sys, "frozen", False):
            env.setdefault("PLAYWRIGHT_BROWSERS_PATH", "0")

        self._proc = await asyncio.create_subprocess_exec(
            str(self._driver_executable),
            "run-driver",
            stdin=asyncio.subprocess.PIPE,
            stdout=asyncio.subprocess.PIPE,
            stderr=_get_stderr_fileno(),
            limit=32768,
            creationflags=creationflags,
            env=env,
            preexec_fn=new_pipe_transport_connect_preexec,  # My new line!!
        )
    except Exception as exc:
        self.on_error_future.set_exception(exc)
        raise exc

    self._output = self._proc.stdin

PipeTransport.connect = new_pipe_transport_connect

This is inspired by https://stackoverflow.com/a/5446982/18244376 and only worked on Posix.

Going to file a bug with Playwright asking for an api to enable this.

@elacuesta
Copy link
Member

Thank you for the research and for opening the upstream feature request, let's wait and see what they suggest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants