Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check for exceptions before returning from C extension #95

Closed
wants to merge 1 commit into from

Conversation

dirk-thomas
Copy link
Member

Some background on Python C extensions: a C function being called from Python can only do either of the two:

  • return a non-NULL value or
  • set an error (with PyErr_) or exception.

If both is done the call will trigger a SystemError: <built-in function your-name-here> returned a result with an error set.

Now consider the following case: a user presses Ctrl-C which sets the KeyboardInterrupt exception while a C function is running. If the function tries to return a non-NULL value a SystemError is the consequence. Instead before returning from the C function the code should check if an error is set (using PyErr_Occurred) and if that is the case return NULL (the standard rule applies to avoid leaking memory in that case).

The benefit of the current patch can be reproduced with the following example which has a high chance of hitting Ctrl-C while this C function is running:

  • Invoke ros2 run image_tools cam2image in one terminal
  • Invoke ros2 topic echo /image in a second terminal and while it is printing messages hit Ctrl-C

Without the patch you have a high chance of seeing the SystemError. With the patch the program should gracefully exit without a visible exception.

This PR currently only patches the function rclpy_take. The same would need to be done in all C function exposed in Python for each non-NULL return (which is in quite a few places). Before replicating the change for all other cases I would like to have feedback if this is viable or if there is a different way (or we don't care about the way this "crashes").

@dirk-thomas dirk-thomas added the in progress Actively being worked on (Kanban column) label Jun 28, 2017
@dirk-thomas dirk-thomas self-assigned this Jun 28, 2017
Copy link
Member

@wjwwood wjwwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

As discussed offline, I think there is still a race condition in this scenario:

  • This code is run:
if (PyErr_Occurred()) {
  return NULL;
}
  • continues because no error set
  • keyboardinterrupt set from signal (SIGINT / ctrl-c)
  • return non-NULL from C extension because no error was set when checked

And in that case we're still in the same boat as now.

This patch reduces the chance of that happening, but it seems like a conceptual problem. Maybe there is a better solution, but +1 for this today as an incremental improvement. I think it's worth looking into in more detail later.

@dirk-thomas
Copy link
Member Author

This shouldn't be merged for beta 2 since it has too much potential for side effects.

@wjwwood
Copy link
Member

wjwwood commented Jun 28, 2017

Actually, after doing some reading I think we should be ok. Apparently the KeyboardInterrupt can only be set when PyErr_CheckSignals() (https://docs.python.org/3.6/c-api/exceptions.html#c.PyErr_CheckSignals) is explicitly called. The issue is that this function is called within many of the built-in Python C functions.

So my conclusion is that:

  • there is no race condition as I indicated above because the error is not set in the signal handler directly
  • we should not need to check if an exception is set at the end of our functions, if that helps then it means we're not checking the return code of all of the Python functions we call (one of which my be calling PyErr_CheckSignals())

@clalancette
Copy link
Contributor

Given @wjwwood 's research, it seems like we may be able to merge this now. I've fired off a CI job for it (which doesn't guarantee this is correct, but at least ensures that there are no obvious regressions):

  • Linux Build Status
  • Linux-aarch64 Build Status
  • macOS Build Status
  • Windows Build Status

@mikaelarguedas
Copy link
Member

mikaelarguedas commented Aug 4, 2017

as discussed in the waffle triage meetings of the last two weeks we decided to hold this until we look into the root of the problem rather than fixing the symptom

This PR currently only patches the function rclpy_take. The same would need to be done in all C function exposed in Python for each non-NULL return (which is in quite a few places).

This is still adressing only that function and is still labeled as in progress

@dirk-thomas
Copy link
Member Author

ros2/rosidl#240 addresses the actual problem. During the convert_to_py call two functions could fail in case a SIGINT was triggered in the meantime. With the patch in rosidl the convert_to_py returns NULL in those cases which is already being handled correctly by rclpy.

@dirk-thomas dirk-thomas deleted the check_for_exceptions_in_c branch October 16, 2017 21:40
@dirk-thomas dirk-thomas removed the in progress Actively being worked on (Kanban column) label Oct 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants