Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to how signals are caught and handled in awatch #136

Merged
merged 11 commits into from
May 16, 2022

Conversation

justvanrossum
Copy link
Contributor

@justvanrossum justvanrossum commented Apr 30, 2022

For anyone coming to this in future, this is a more correct and elegant way of fixing #128 than my attempt (#132).

However it does bring some changes, in particuarly we can no longer optionally suppress KeyboardInterrupt inside awatch, so you might need to catch it where you call asyncio.run() or similar.


I'm am not very confident about this PR, but I think this would be the correct fix for #128, at least for asyncio:

You don't need to handle SIGINT explicitly, you just have to catch CancelledError, and set the stop even from there.

For me, this fixes the Ctr-C problem when using multiple watchers without the user code having to bother with an explicit stop event.

@justvanrossum
Copy link
Contributor Author

Hm, now I need to learn how the tests work and see which is wrong: the code or the tests... Any advise would be greatly appreciated.

@codecov
Copy link

codecov bot commented Apr 30, 2022

Codecov Report

Merging #136 (2011a71) into main (a87b815) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #136   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6         6           
  Lines          372       368    -4     
  Branches        79        76    -3     
=========================================
- Hits           372       368    -4     
Impacted Files Coverage Δ
watchfiles/main.py 100.00% <100.00%> (ø)
watchfiles/run.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a87b815...2011a71. Read the comment docs.

@justvanrossum
Copy link
Contributor Author

There are some obvious oversights in my PR (like the now unused interrupted variable), and I'd be happy to clean up, but let's first see whether you agree at all with this approach.

@samuelcolvin
Copy link
Owner

This doesn't work because the rust code won't hear the signal.

If you have rust_timeout=None the coroutine will not exit on a signal until the next filesystem event is received/detected... Or never if no filesystem change is received.

@samuelcolvin
Copy link
Owner

Unless I'm wrong somehow... But I'm pretty sure I'm not.

Remember inside rust we have threading, threads can't generally detect signals.

@justvanrossum
Copy link
Contributor Author

justvanrossum commented Apr 30, 2022

It's quite possible I misunderstand how things work.

But in the current code, where do you tell the Rust code to stop?

The signal handler just responds to a SIGINT, and sets the stop event, which is handled by anyio.

In this PR, I don't listen to SIGINT: Python/asyncio is already doing that, and the asyncio loop cancels all pending tasks, causing CancelledError to be raised, upon which I set the stop event. It's not all that different from the current code.

@samuelcolvin
Copy link
Owner

Let me look again when I'm at my laptop.

@samuelcolvin samuelcolvin reopened this Apr 30, 2022
@samuelcolvin
Copy link
Owner

samuelcolvin commented Apr 30, 2022

I still don't think this works - it doesn't set the event when a signal is received, so the rust code can't know to return.

Have you tried it? If compiling the binary is difficult, just copy it from a regular install.

Also I went to a talk yesterday at PyCon about how you should never catch CancelledError - it "breaks the contract of asyncio".

@justvanrossum
Copy link
Contributor Author

I tried it, and it does fix my problem.

This PR just moves setting the stop event to a different place in the code, one that works better with asyncio.

@justvanrossum
Copy link
Contributor Author

CancelledError: didn't it add "... without reraising it" ?

@samuelcolvin
Copy link
Owner

Let me look again when I'm at my laptop.

My bad, I shouldn't reply on my phone without reading it properly.

@justvanrossum
Copy link
Contributor Author

justvanrossum commented Apr 30, 2022

This is my test code:

import asyncio
import watchfiles


async def watcher(path):
    async for changes in watchfiles.awatch(path, debug=True):
        print(changes)


async def watch_multiple(paths):
    await asyncio.gather(*(watcher(p) for p in paths))
    print("done?")


def main():
    asyncio.run(watch_multiple(["dir_A", "dir_B"]))


if __name__ == "__main__":
    main()

Without this PR, I have to do Ctl-C twice before the process exists.
With the PR, it exits cleanly with a KeyboardInterrupt upon a single Ctl-C.

@justvanrossum
Copy link
Contributor Author

Btw. this PR also ensures that cancelling an asyncio task containing an awatch call works correctly.

I wrote a test for it, but it is asyncio-specific. I bet the same could work with anyio, but I've never used it so I don't know how.

async def test_awatch_cancel_task(mocker, mock_rust_notify: 'MockRustType'):
    import asyncio

    async def watcher():
        async for _ in awatch('.'):
            pass

    async def delayed_cancel():
        await asyncio.sleep(0.1)
        task.cancel()

    task = asyncio.create_task(watcher())
    cancel_task = asyncio.create_task(delayed_cancel())

    with pytest.raises(asyncio.CancelledError):
        await task

    await cancel_task

Copy link
Owner

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to me like this is almost right, but raise_interrupt isn't correctly handled anymore.

Either way, I'll have to wait until I'm back home and have access to linux to test it properly.

elif raw_changes == 'signal':
# in theory the watch thread should never get a signal
if raise_interrupt:
raise KeyboardInterrupt
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this will work, since we raise the KeyboardInterrupt on line 202 above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code should never be hittable, as py.check_signals() calls PyErr_CheckSignals which always returns 0 if not on the main thread.

So I think raise_interrupt==False might make sense to suppress a KeyboardInterrupt (and maybe a CancelledError but that would be somewhat surprising) after setting the stop_event_ above. This just makes it a short-hand for with contextlib.suppress though, as it is in watch.

I also suspect that the equivalent call in watch is faulty for a different reason: if PyErr_CheckSignals returns -1, then the signal has been handled (and an exception raised and (I assume captured) and returned) and so raising KeyboardInterrupt would confuse a user who had installed a different SIGINT handler, or if the signal that raised the exception was not SIGINT at all. I guess that return Ok("signal".to_object(py)) should probably return the value it got from py.check_signals() instead, and then a KeyboardInterrupt or whatever exception the installed signal handler specified would appear naturally as an exception from the watcher.watch(debounce, step, rust_timeout, stop_event) call.

@TBBle
Copy link
Contributor

TBBle commented May 7, 2022

I was having the double-control-C-needed issue described above, and also having issues where SIGTERM (i.e. from docker stop) was going weird. I was also for a time having issues where the 'clean-up all threads' code implicit in asyncio.run would see a KeyboardInterrupt, even though I got there by handling a KeyboardInterrupt already.

The change in #136 looks reasonable to me, but I haven't tested it. It seems to mirror the changes I made locally (trimmed to the relevant parts)

class Runner:

    def stop(self):
        logger.info("Process terminating")
        self.stop_event.set()


    async def run_until_cancelled(self):
        """
        Launches tasks and awaits their completion, or until we get SIGINT (control-C) or SIGTERM (kill -TERM)
        """
        asyncio.get_event_loop().add_signal_handler(signal.SIGINT, self.stop)
        asyncio.get_event_loop().add_signal_handler(signal.SIGTERM, self.stop)

        tasks = [
            self.watch_for_new_inputs(),
            self.init_time_task(),
            self.ongoing_task_1(),
            self.ongoing_task_2(),
        ]
        for coro in asyncio.as_completed(tasks):
            try:
                await coro
            except (KeyboardInterrupt, asyncio.CancelledError):
                logger.exception("Process cancelled, terminating")
                self.stop()
                continue

    async def watch_for_new_inputs(self):
        def input_filter(_: Change, path: str) -> bool:
            return self.is_interesting(Path(path))

        if self.force_polling:
            logger.warning(
                "Development-only option: Filesystem polling implementation selected"
            )

        async for change_set in awatch(
            self.input_dir,
            watch_filter=input_filter,
            stop_event=self.stop_event,
            exit_on_signal=False,
            force_polling=self.force_polling,
        ):
            for file_change in change_set:
                logger.debug("Change received: %s", file_change)
                ...

All my other long-running tasks are bounded with while not self.stop_event.is_set(): so whichever task raises KeyboardInterrupt (or CancelledError but I can probably remove that now, as I no longer cancel any of the long-running tasks; that was from before I added self.stop_event and was instead cancelling all the outstanding tasks in this case) causes orderly shutdown of the other tasks, and hence clean termination.

Co-incidentally, I'd be interested in knowing more about the PyCon talk that said "never catch CancelledError", as I am catching it in other places too (because I cancel tasks spawned from the above and then wait for them to clean-up, but don't want to cancel my entire program in this case). Was that If an asyncio.Task fails in the woods and nobody is around to see it, does it still page you at 3am.? Sounds like I might have an anti-pattern to fix. (Edit: https://www.youtube.com/watch?v=XW7yv6HuWTE, I guess)

@samuelcolvin samuelcolvin changed the title A different attempt to fix #128 Changes to how signals are caught and handled in awatch May 13, 2022
@samuelcolvin
Copy link
Owner

@justvanrossum @TBBle could you try the changes I've proposed in 1e32ded and let me know how it works for you.

It's basically what @justvanrossum did with raise_interrupt removed since it doesn't work anymore.

@samuelcolvin
Copy link
Owner

I've tried this a fair bit on ubuntu and it seems to work well, but it's virtually impossible to think of and try every edge case.

@justvanrossum
Copy link
Contributor Author

Seems to work fine for me, thanks!

@samuelcolvin samuelcolvin merged commit 7ab3cb9 into samuelcolvin:main May 16, 2022
@justvanrossum
Copy link
Contributor Author

Thank you so much for properly finishing this! Much appreciated.

@samuelcolvin
Copy link
Owner

No problem.

@TBBle
Copy link
Contributor

TBBle commented May 17, 2022

Sorry for not getting back to you earlier. I've upgraded our local awatch-using app to 0.14, and after removing the exit_on_signal=False from my call, it all seems to be working fine.

If I don't install my own SIGINT handler to set the stop_event, I still see what appears to be a double KeyboardInterrupt (both chaining into Cancellation), but at this point I reckon that's either a flaw in my code, or Docker is somehow interfering and actually sending two SIGINTs for one control-C just to confuse me.

Anyway, thank you both for pushing these changes through. ^_^

@samuelcolvin
Copy link
Owner

Great, thanks for the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants