Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: f2py wrappers should respect SIGINT (CTRL-C) #20148

Open
HaoZeke opened this issue Oct 20, 2021 · 32 comments
Open

ENH: f2py wrappers should respect SIGINT (CTRL-C) #20148

HaoZeke opened this issue Oct 20, 2021 · 32 comments

Comments

@HaoZeke
Copy link
Member

HaoZeke commented Oct 20, 2021

Describe the issue:

Consider the following Fortran subroutine with an infinite loop. Please note that no one in their right mind would write this, but it illustrates the point.

! bla.f90
subroutine main
  do
   print*,"zzz..."
  end do
end subroutine main

Now this can be compiled into a nice extension module f2py -m sloop -c bla.f90 as can be seen in the reproducing code example.

Handling custom signals set via Python's signal library is out of scope for this issue.

Reproduce the code example:

import sloop
sloop.main()

Error message:

This now ignores all CTRL-C and other signals. Note that as implemented, callback functions do handle signals in a sense (i.e. they are mostly python which does work with signals and avoid executing fortran when a signal is caught).

NumPy/Python version information:

1.21.2 3.9.6 | packaged by conda-forge | (default, Jul 11 2021, 03:36:15)
[Clang 11.1.0 ]
@HaoZeke
Copy link
Member Author

HaoZeke commented Oct 20, 2021

Note that in other instances f2py "appears" to handle interrupts but that's mostly because of where the control flow is. For example, when using the gcc extension allowing POSIX subroutines like sleep, adapted from Philippe Engel's Modern Fortran example:

subroutine main
    implicit none
    integer :: i

    do i = 1, 10
        print '(a)', 'zzz ...'
        call sleep(2)
    end do
end subroutine main

This, with f2py -c bla.f90 -m sle kind of sort of works, but only because sleep is in C.
image

@seberg
Copy link
Member

seberg commented Oct 20, 2021

As far as I know, there is no solution for this, except checking for interrups regularly using the Python API (and there is no good API exposed to do that if you want to release the GIL right now).

So while correct, this is just how things are. NumPy doesn't check for keyboard interrupts either. (There used to be code in fft that tried to check, but it was horrible and leads to hard crashes if a threaded environment.)

@HaoZeke
Copy link
Member Author

HaoZeke commented Oct 21, 2021

I'm not sure I fully understand, the problem here is that the Fortran code ignores signals, and can be fixed by calling signal(SIGINT, SIG_DFL) or a custom handler. It wouldn't return to the Python interpreter in that situation, but it is still better than the existing behavior. It should be possible to have the module itself check PyErr_CheckSignals() the way Pybind11 handles these.

image

@seberg
Copy link
Member

seberg commented Oct 21, 2021

@HaoZeke did you try running the fortran code for a long time rather than in an infinite loop? I assume the Python interrupt handler gets called even if you are "inside fortran". You just never find out. That is exactly the same as your sleep example, just there you didn't make the loop infinite.

@seberg
Copy link
Member

seberg commented Oct 21, 2021

As to why using signal(SIGINT, SIG_DFL) is a bad idea: If you replace the signal handler you have to also restore it again. That probably works, until you run in a threaded environment where you would need logic to decide which old one to restore (it may have changed while you were running!).

I think Python could solve it with some mechanism, and maybe there are some smart ways, but you can look at numpy/core/include/numpy/npy_interrupt.h for a way that doesn't work (i.e. regularly killed peoples whole sessions in fft).

@HaoZeke
Copy link
Member Author

HaoZeke commented Oct 21, 2021

Ah that makes sense @seberg. I was actually thinking of running the C extension code in a while PyErr_CheckSignals() and setting PyErr_SetInterruptEx (described here) in the signal handler from the Fortran code. I think that should be OK, but it still wouldn't help with the threaded environment problems.

@HaoZeke
Copy link
Member Author

HaoZeke commented Oct 21, 2021

@HaoZeke did you try running the fortran code for a long time rather than in an infinite loop? I assume the Python interrupt handler gets called even if you are "inside fortran". You just never find out. That is exactly the same as your sleep example, just there you didn't make the loop infinite.

Well, if the Fortran call replaces the handler (as in the example) it doesn't make a difference.

subroutine main
  integer :: i
  call register_kill_signal_handler()
  do i=1,100000
     print*, "zzz..."
  end do
end subroutine main

With the handler defined as:

void register_kill_signal_handler_() {
  if (signal(SIGINT, SIG_DFL) == SIG_ERR) {
    exit(EXIT_FAILURE);
  }

This terminates in a more friendly manner (though it doesn't return to the Python interpreter).

@HaoZeke
Copy link
Member Author

HaoZeke commented Oct 21, 2021

I guess the core problem of this issue is that as it stands long running or buggy f2py codes have to be killed externally, would this sort of "exit to shell because the handler reset to default" kinda behaviour make more sense than the current infinite non-interruptable situation?

Also I was initially eyeing numpy/core/include/numpy/npy_interrupt.h unfortunately 😅

@seberg
Copy link
Member

seberg commented Oct 21, 2021

Well, using PyErr_CheckSignals() or PyOS_InterruptOccurred() makes sense (and e.g. NumPy could do that once in while in ufuncs, although it might require artificially chunking some ufunc callse). However, I think even PyErr_CheckSignals() requires holding the GIL (this is a public API issue in Python, there is private API that does not require this, but I am not sure that can be made public: https://bugs.python.org/issue41037).

So, if you do hold the GIL, even signal handler replacement may be OK (but maybe using the checking API is fine). Maybe that is fine for f2py even?!
If you do release the GIL: I think it is the wild west and I don't have much hope beyond digging and modifying Python, but happy to be surprised ;).

@HaoZeke
Copy link
Member Author

HaoZeke commented Oct 21, 2021

Well for Python 3.10 this seems to do exactly what we need:

static void sigint_handler(int signo){
    PyErr_SetInterrupt();
}
void register_kill_signal_handler_() {
  if (signal(SIGINT, sigint_handler) == SIG_ERR) {
    exit(EXIT_FAILURE);
  }
}

image

Also:
image

@seberg
Copy link
Member

seberg commented Oct 21, 2021

Yes, but what happens if you replace the handler twice from different places?

@HaoZeke
Copy link
Member Author

HaoZeke commented Oct 21, 2021

I was thinking that with this, the Fortran call itself will reset the handler because of call register_kill_signal_handler() within the subroutine, so the only possible handler for the Fortran function will be to execute PyErr_SetInterrupt() always so won't it still just be a KeyboardError? Unless I'm missing something.

@HaoZeke
Copy link
Member Author

HaoZeke commented Oct 21, 2021

Ah I think I get it, so the concern is if:

Fortran code sets handler, executes
Concurrently (during the execution of the function body) something else sets handler
Fortran code errors out?

For this I'm not sure.

@seberg
Copy link
Member

seberg commented Oct 21, 2021

The problem is not getting the interrupt to work in your code, the problem is only about restoring behaviour once you are done.

Now, maybe you can just always restore the default Python behaviour (and assume that is always right), no matter what the state was when you replaced it with your handler. In a multi-threaded context that may mean we go back to the default while threads are still running: OK.

npy_interrupt.h tries to be smarter: It tires to reset it to the previous handler. If two people replace the handler, there is no guarantee they finish in the right order though.
Whatever you do, unless you invent some scheme for "chaining" handlers or so, and all of the Python ecosystem agrees on it, I don't think you can really solve this. You can only hope that always restoring plain Python is fine and nobody does anything bad (i.e. use things like npy_interrupt.h).

For this I'm not sure.

Yes, basiclaly:

  1. Fortran code 1 sets handler
  2. Fortran code 2 sets handler
  3. Fortran code 1 finishes -> resets to python handler?
  4. Fortran code 2 finish -> reset to what? (npy_interrupt.h resets to code 1 handler

user presses ctrl+c AFTER everything is done: handler 1 gets called and BOOM

@seberg
Copy link
Member

seberg commented Oct 21, 2021

I thought I added a scary comment to npy_interrupt.h :).

@HaoZeke
Copy link
Member Author

HaoZeke commented Oct 21, 2021

Perhaps it would be simplest if the f2py generated handlers cannot be modified? Then it simplifies to:

  1. Fortran code 1 sets handler
  2. Fortran code 2 sets handler (same one as code 1 though)
  3. Fortran code 1 finishes -> resets to python handler?
  4. Fortran code 2 finishes -> resets to python handler (redundant though)

user presses ctrl+c AFTER everything is done:-> Still gets the python handler

@HaoZeke
Copy link
Member Author

HaoZeke commented Oct 21, 2021

The way I'm looking at this is as long as it is impossible for two f2py generated functions to have different handlers set it should be OK?

@seberg
Copy link
Member

seberg commented Oct 21, 2021

Fortran code 2 sets handler (same one as code 1 though)

Doesn't the handler set a longjump into the function code stack to abort the iteration?

But yeah, if we know the Python handler and reset to it generously, I expect things are OK (so long nobody else does dangerous stuff because they use npy_interrupt.h style logic in a threaded environment.)

I do think there may be solutions to this, i.e. keep a linked list of all handlers around and each context can remove itself from there in a thread-safe way (the magic of linked lists :)). Or maybe some Python context-var... It just feels that without a process-wide unified ABI it is all pretty doomed to have traps in a threaded environment (but, if you wait long enough that you want a snappy ctrl+c, you probably also want to release the GIL).
Although, I don't know: there may be a way to know that Python is single-threaded at the time of interest.

@pearu
Copy link
Contributor

pearu commented Oct 21, 2021

nit: "f2py does not respect SIGINT" does not make much sense. f2py just creates Python C/API wrapper functions to Fortran/C functions. Exactly the same structure would be used when writing these wrapper functions manually. f2py has nothing to do with how Python or Fortran/C functions behave with respect to handling signals. Recall: f2py is not a compiler.

@HaoZeke
Copy link
Member Author

HaoZeke commented Oct 21, 2021

nit: "f2py does not respect SIGINT" does not make much sense. f2py just creates Python C/API wrapper functions to Fortran/C functions. Exactly the same structure would be used when writing these wrapper functions manually. f2py has nothing to do with how Python or Fortran/C functions behave with respect to handling signals. Recall: f2py is not a compiler.

I agree in general, but in this case the issue is that the wrapper generated by f2py does not handle signals correctly, and the proposed fix would essentially be generating the .{c,h} file and defining the handler, also inserting the call to register it in the generated wrapper when f2py is invoked.

Though I think it is perhaps less a bug and more an ENH?

@HaoZeke HaoZeke changed the title BUG: f2py does not respect SIGINT (CTRL-C) ENH: f2py wrappers should respect SIGINT (CTRL-C) Oct 21, 2021
@HaoZeke
Copy link
Member Author

HaoZeke commented Oct 21, 2021

Fortran code 2 sets handler (same one as code 1 though)

Doesn't the handler set a longjump into the function code stack to abort the iteration?

But yeah, if we know the Python handler and reset to it generously, I expect things are OK (so long nobody else does dangerous stuff because they use npy_interrupt.h style logic in a threaded environment.)

I do think there may be solutions to this, i.e. keep a linked list of all handlers around and each context can remove itself from there in a thread-safe way (the magic of linked lists :)). Or maybe some Python context-var... It just feels that without a process-wide unified ABI it is all pretty doomed to have traps in a threaded environment (but, if you wait long enough that you want a snappy ctrl+c, you probably also want to release the GIL). Although, I don't know: there may be a way to know that Python is single-threaded at the time of interest.

I don't think it uses a longjmp. Not as far as I can tell from the implementation.

Infact I was surprised by the usage of longjmp in the signal handler anyway since I was somehow under the impression that usage of jmp from a signal handler is undefined.

@pearu
Copy link
Contributor

pearu commented Oct 21, 2021

For testing, you can use usercode statement to insert custom C code that contains signal handling hooks to generated wrapper functions. See https://numpy.org/devdocs/f2py/signature-file.html#f2py-statements .

If the enhancement is implemented, use the same approach as for threadsafe to enable the feature.

@seberg
Copy link
Member

seberg commented Oct 21, 2021

I agree with Pearu, I don't really understand. Maybe f2py wrappers can check if an interrupt has occurred while Fortran was running, but it doesn't really seem to matter, since Python checks all the time anyway?

Rather, you want to add API to allow authors to write interruptable Fortran code. And we don't even have API for that in the NumPy C-API?!

This seems all pretty running around in circles, you don't really want to register, this one:

  if (signal(SIGINT, SIG_DFL) == SIG_ERR) {
    exit(EXIT_FAILURE);
  }

right? It just kills the whole program!

Look at what npy_interrupt.h does: It stores a longjump point to the end of the execution block (i.e. the slow loop; EDIT: I guess the longjump goes to the start, but whatever) and then sets a signal handler which does the long jump to the end of that loop! That works, but means that the long-jump must always point to the correct code (i.e. the one currently being being executed).

I suppose there may be ways to do that in a thread-safe manner, but I don't know. But we will probably only run in circles until we are clear what you even want here.

Maybe, we can store the long-jump points in a linked-list or thread-local, so that "our" (i.e. the NumPy handler) will clean up correctly and only if it is restoring something sensible there... But I honestly don't know and presumably thread-local doesn't work because the signal handler runs in a main thread (or its own thread on windows) or so.

@HaoZeke
Copy link
Member Author

HaoZeke commented Oct 21, 2021

I agree with Pearu, I don't really understand. Maybe f2py wrappers can check if an interrupt has occurred while Fortran was running, but it doesn't really seem to matter, since Python checks all the time anyway?

Rather, you want to add API to allow authors to write interruptable Fortran code. And we don't even have API for that in the NumPy C-API?!

This seems all pretty running around in circles, you don't really want to register, this one:

  if (signal(SIGINT, SIG_DFL) == SIG_ERR) {
    exit(EXIT_FAILURE);
  }

right? It just kills the whole program!

Look at what npy_interrupt.h does: It stores a longjump point to the end of the execution block (i.e. the slow loop) and then sets a signal handler which does the long jump to the end of that loop! That works, but means that the long-jump must always point to the correct code (i.e. the one currently being being executed).

I suppose there may be ways to do that in a thread-safe manner, but I don't know. But we will probably only run in circles until we are clear what you even want here.

Maybe, we can store the long-jump points in a linked-list or thread-local, so that "our" (i.e. the NumPy handler) will clean up correctly and only if it is restoring something sensible there... But I honestly don't know and presumably thread-local doesn't work because the signal handler runs in a main thread (or its own thread on windows) or so.

I was thinking this might be a reasonable default:

static void sigint_handler(int signo){
    PyErr_SetInterrupt(); // Tells the Python interpreter that SIGINT occured, works with/without the GIL
}
void register_kill_signal_handler_() {
  if (signal(SIGINT, sigint_handler) == SIG_ERR) {
    exit(EXIT_FAILURE);
  }
}

Which essentially lets the regular Python handler deal with it. The problem right now is that even if Python is checking all the time it is being ignored, e.g. the original infinite loop or even the non-infinite version:

subroutine main
  integer :: i
  call register_kill_signal_handler()
  do i=1,100000
     print*, "zzz..."
  end do
end subroutine main

As it stands, currently this runs till the loop finishes, ignoring any signals, which seems to be unintended behavior, and which is what should return to the interpreter. I'm not sure why we need a longjmp in the first place though, we should simply raise an error to the interpreter right? Additionally, the fortran standard mentions:

A procedure defined by means of Fortran shall not invoke setjmp or longjmp (ISO/IEC 9899:2011, 7.13). If a procedure defined by means other than Fortran invokes setjmp or longjmp, that procedure shall not cause any procedure defined by means of Fortran to be invoked. A procedure defined by means of Fortran shall not be invoked as a signal handler (ISO/IEC 9899:2011, 7.14.1).

Which I interpreted to mean that setjmp and longjmp should not be used when Fortran functions are called in between.

@pearu
Copy link
Contributor

pearu commented Oct 21, 2021

Is the underlying issue summarized by "This now ignores all CTRL-C and other signals."?

If this is the case and the aim is to stop long-running Fortran jobs, then one can use the following workaround (requires bash):

  • Hit CTRL-Z that will stop python process as a background job.
  • Hit kill %1 that will terminate the python process.

@HaoZeke
Copy link
Member Author

HaoZeke commented Oct 21, 2021

Is the underlying issue summarized by "This now ignores all CTRL-C and other signals."?

If this is the case and the aim is to stop long-running Fortran jobs, then one can use the following workaround (requires bash):

  • Hit CTRL-Z that will stop python process as a background job.
  • Hit kill %1 that will terminate the python process.

Yup, and equally sending SIGKILL works. My first not so bright idea was simply to register SIGINT with its default handler which terminates the process, however calling PyErr_SetInterrupt() stops the Fortran function and returns to the interpreter which seems to be the right thing to do.

@pearu
Copy link
Contributor

pearu commented Oct 21, 2021

Here follows a working example of catching SIGINT from f2py generated wrappers.

Consider the following Fortran code:

! File: signal_example.f90
subroutine main
  integer :: i
  do i=1,100000000
     print*, "zzz...", i
  end do
end subroutine main

Create a signature file using:

f2py -m signal_example -h signal_example.pyf signal_example.f90

Now edit signal_example.pyf so that it will contain:

python module signal_example ! in
    interface  ! in :signal_example
       subroutine main ! in :signal_example:signal_example.f90
         usercode '''
jmp_buf jump_buffer;
int val = setjmp( jump_buffer );
if (val == SIGINT) {
  PyErr_SetString(PyExc_KeyboardInterrupt, "interrupted main");
  return NULL;
}
void sigint_handler(int signo){
    longjmp(jump_buffer, signo);
}
if (signal(SIGINT, sigint_handler) == SIG_ERR) {
  return NULL;
}
'''
        end subroutine main
    end interface 
end python module signal_example

Build the extension module using:

f2py  signal_example.pyf signal_example.f90 -c

Consider the following Python script:

import signal_example

try:
    signal_example.main()
except KeyboardInterrupt as msg:
    print("Received:", msg)
print('Exiting Python')

Run it and hit CTRL-C at some point. A result is:

<snip>
zzz...      236245
 zzz...      236246
 zzz...      236247
 zzz...      236248
^CReceived: interrupted main
Exiting Python

@seberg
Copy link
Member

seberg commented Oct 21, 2021

Here follows a working example of catching SIGINT from f2py generated wrappers.

Yes, but that code doesn't offer a solution for the FFT problem! It is basically a worse version of NPY_SIGINT_(ON|OFF), no?!
For one, modify the script to run in finite time, run it, and hit ctrl+c (while something new is running like for i in range(10000): print(i), and you will get random crashes.
Further, moving this into C means any Fortran cleanup will never run, although I suppose the author could indicate that this OK or that they just don't care.

I expect we can formulate solutions to the FFT problem under the assumption that no one else does the dangerous stuff that NPY_SIGINT_(ON|OFF) does. But I am not sure that assumption is reasonable.

Maybe what I should be saying is: I think this needs more serious grounds-up effort and understanding if you really want to fix it. For FFT luckily, it just didn't really matter anymore because of bluestein's algorithm (it never really runs forever, so ctrl+c isn't really needed).
For other algorithms it still matters, but even then I think you should start with the C problem and ignore f2py right now. If we don't have a solution in C, we can't offer it to Fortran.

Right now, the only tool that is obviously safe is running the following once in a while and aborting if it returns 1:

int
check_interrupt() {
    Py_BEGIN_ALLOW_THREADS;
    int res = PyOS_InterruptOccurred();
    Py_END_ALLOW_THREADS;
    return res;
}

@pearu
Copy link
Contributor

pearu commented Oct 21, 2021

@seberg, the just-cooked-up example above should be taken as it is. There is no guarantee that it will work under a multithreaded setup cleanly.

I think the scope of this issue is a little scattered. For instance, I am not familiar with the FFT problem which is not mentioned in the description, and I think the resolution of this issue may depend on an application as discussed below.

First, the expected behavior of using CTRL-C may have various correctness levels:

  1. The aim of CTRL-C is to stop a running python program which is "hanging" somewhere in C/Fortran code. In this case, a user does not expect any correctness in shutting down (a possibly multithreaded) python program as long as it terminates more-or-less immediately after Ctrl-C.
  2. The aim of CTRL-C is to gracefully stop a computation in C/Fortran code (executed within try-except-KeyboardInterrupt block) and continue with other tasks without shutting down the python process.

Another part of the issue is about a realistic resolution for a particular use case. For instance, we can have the following use cases:

  • user owns the Fortran/C code so that s/he can insert check_interrupt to its code. Although, it will work only when used at proper places in a computational program and this we cannot control anyway.
  • user wraps external Fortran/C libraries to Python while changing the computational parts of the code is not just feasible (no sources are available, etc), hence the check_interrupt approach cannot be applied.

These use cases illustrate that the behavior of Ctrl-C will largely depend on the user of f2py: we can only provide tools for users to shoot or not shoot their left legs..

From the f2py perspective, I think this issue should not have a resolution that is applied by default to all generated wrapper functions because Ctrl-C usage is just irrelevant for many cases. This is similar to threadsafe not being enabled by default. So, I disagree with the title.

However, for the correctness level 1, I see a practical use from introducing a new f2py statement, say, catchsigint, that inserts the code, implemented in the above example (but after a better design), to the generated wrapper functions, and make a note to the documentation that the intended usage of catchsigint is to terminate the python program, although, it works for correctness level 2 when functions are executed in single-threaded setup.

@seberg
Copy link
Member

seberg commented Oct 21, 2021

@pearu, the FFT problem is the problem of multiple threads setting up a sigint handler but not being able to restore the correct behaviour cleanly. It was very easy to observe in old NumPy versions, because even np.test() would trigger it reliably.

NPY_SIGINT_(ON|OFF) is what the FFT code used to use, and it was unable to clean up correctly in general. That means that ctrl+c crashes the process even after all FFTs finished, just as your example code causes undefined behaviour after the f2py function finishes. Except NPY_SIGINT_(ON|OFF) works fine single threaded, it only starts blowing things up if you ran multiple FFTs threaded.

My whole argument is that I don't think this issue is even actionable. At least not without spending some more thought on solving the issue of restoring the correct signal handler in a multi-threaded environment (or finding another solution).

Unless you have a use-case where you prefer risking crashing the interpreter (even after your task is complete) to make ctrl+c work promptly? Because my knee jerk reaction is that it is worse than not being able to interrupt the Fortran code using ctrl+c.

@seberg
Copy link
Member

seberg commented Oct 21, 2021

But, yes, if f2py does not use threading (release the GIL), then you can add an option to wrap it also into NPY_SIGINT_ON and NPY_SIGINT_OF, and I think that would be fine.

@pearu
Copy link
Contributor

pearu commented Oct 22, 2021

My whole argument is that I don't think this issue is even actionable. At least not without spending some more thought on solving the issue of restoring the correct signal handler in a multi-threaded environment (or finding another solution).

I agree. It sounds like a nicely complicated problem that could be fun to tackle :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants