Simplify error handling in failure mode or during finalization #269

hidmic · 2020-07-27T20:07:00Z

Rationale

Error handling support in rcutils (and by extension, in every ROS 2 C library implementation or client code) falls short when it comes to error propagation when in failure mode or during finalization. As discussed in ros2/rmw_fastrtps#414 and ros2/rmw_cyclonedds#210, ensuring the initial error gets propagated, while no subsequent error goes unnoticed, clutters client code significantly.

Proposal

Introduce a mechanism for rcutils to distinguish between an error overwrite and a nested error that should be handled differently (e.g. logging to stderr directly). For that matter, have a thread local integer to track nested error handling scopes and a boolean in local storage to ensure a single error handling scope per function.

With an API like:

typedef struct rcutils_error_handling_scope_t {
    bool active;
} rcutils_error_handling_scope_t;

void rcutils_error_handling_scope_init(rcutils_error_handling_scope_t * scope);
void rcutils_error_handling_scope_enter(rcutils_error_handling_scope_t * scope);
void rcutils_error_handling_scope_leave(rcutils_error_handling_scope_t * scope);

a function may request rcutils_set_error_state to behave differently in between _enter and _leave calls, for its own code and the call chain that follows after it. In C++, RAII may be used to simplify API use even further.

The text was updated successfully, but these errors were encountered:

hidmic · 2020-07-27T20:08:42Z

FYI @clalancette @ivanpauno

wjwwood · 2020-08-18T17:42:23Z

My problem with a call chain is that you cannot know how deep it needs to be. If we have one then, in my opinion, it should be preallocated per thread and have a fixed depth, similar to how the error message itself has a fixed length and is preallocated per thread.

hidmic · 2020-08-18T19:09:29Z

I'm not sure I follow. The purpose of rcutils_error_handling_scope_t is not to store anything, but to aid changing error handling macros (e.g. RCUTILS_SET_ERROR_MSG) behavior whenever errors occur while handling another error. I guess we could generate full backtraces within each error handling "scope", but we also can start small and just print to stderr (like we do explicitly in quite a few places).

gbiggs · 2020-08-19T00:22:29Z

My problem with a call chain is that you cannot know how deep it needs to be

Wouldn't it be possible to make a reasonable guess for how much we need, at least inside the ROS API? We can take a look at our call graph and see how deep it goes.

wjwwood · 2020-08-19T00:43:09Z

The purpose of rcutils_error_handling_scope_t is not to store anything, but to aid changing error handling macros (e.g. RCUTILS_SET_ERROR_MSG) behavior whenever errors occur while handling another error.

I think the goal should be to avoid that in the first place.

The error state is trivially copied, so I'd just recommend storing the error state locally before starting cleanup that might set the error state again. For example:

rcutils_ret_t rcutils_ret = rcutils_failing_function(...);
if (RCUTILS_RET_OK != rcutils_ret) {
  rcutils_error_state_t error_state = *rcutils_get_error_state();
  rcutils_reset_error();  // cannot fail
  rcl_ret_t ret = rcl_cleanup_function(...);
  if (RCL_RET_OK != ret) {
    RCUTILS_SAFE_FWRITE_TO_STDERR("[some context:42] failed to clean up foo while handling error: ");
    new_rcutils_print_error_state_to_stderr(&error_state);  // cannot fail, this is the first error state
  } else {
    rcutils_set_error_state(error_state.message, error_state.file, error_state.line_number);
  }
}

Now, that's obviously awful to write, and I think we should try to improve that workflow, but I don't think we should be pushing behavior into the error handling code, to be honest.

Maybe I would be more convinced to see your proposed API additions in action, and see how it makes a use case before and after better.

hidmic · 2020-08-19T14:55:33Z

The error state is trivially copied, so I'd just recommend storing the error state locally before starting cleanup that might set the error state again.

That's precisely the kind of code that I'd like to avoid. Code gets cluttered very quickly, particularly for procedures that have to keep going after an error while not losing track of it (e.g. finalization functions).

Now, that's obviously awful to write, and I think we should try to improve that workflow, but I don't think we should be pushing behavior into the error handling code, to be honest.

That's fair. Thinking out loud, push/pop/print APIs for error states could be a more verbose alternative. Having an easy way to pop the current error state into a local variable OR to print it to stderr if it has already been popped into that local variable before would keep LOC count low for _fini() functions too.

wjwwood · 2020-08-19T17:29:21Z

Thinking out loud, push/pop/print APIs for error states could be a more verbose alternative.

Print makes sense to me, but push/pop implies a stack and a stack has to have a size, and that either has to be fixed or dynamically allocated. If it is fixed it can be exhausted, which may be ok, and if it's dynamic then that's a non-started in my opinion because these error handling functions need to be used in places where memory allocations are not allowed (or well desired, but not allowed if we want to use them in real-time situations).

Having an easy way to pop the current error state into a local variable OR to print it to stderr if it has already been popped into that local variable before would keep LOC count low for _fini() functions too.

I have a hard time imagining how this would be used.

Like I said, having a specific example where the code is clunky and showing how it could be improved with new API would be helpful for me to understand.

hidmic added the enhancement New feature or request label Jul 27, 2020

hidmic added this to To do in Galactic via automation Jul 27, 2020

hidmic mentioned this issue Aug 18, 2020

Error handling often overwrites errors already set ros2/rcl#740

Open

hidmic mentioned this issue Oct 1, 2020

Ensure rmw_destroy_node() completes despite run-time errors. ros2/rmw_fastrtps#458

Merged

wjwwood removed this from To do in Galactic Mar 29, 2021

wjwwood added this to To do in Humble Hawksbill via automation Mar 29, 2021

clalancette removed this from To do in Humble Hawksbill Mar 28, 2022

clalancette added the backlog label Mar 28, 2022

clalancette mentioned this issue Apr 10, 2023

Add convenience error handling macros #421

Merged

christophebedard mentioned this issue Feb 28, 2024

Improve error message handling for spdlog ros2/rcl_logging#110

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify error handling in failure mode or during finalization #269

Simplify error handling in failure mode or during finalization #269

hidmic commented Jul 27, 2020 •

edited

Loading

hidmic commented Jul 27, 2020

wjwwood commented Aug 18, 2020

hidmic commented Aug 18, 2020

gbiggs commented Aug 19, 2020

wjwwood commented Aug 19, 2020

hidmic commented Aug 19, 2020 •

edited

Loading

wjwwood commented Aug 19, 2020

Simplify error handling in failure mode or during finalization #269

Simplify error handling in failure mode or during finalization #269

Comments

hidmic commented Jul 27, 2020 • edited Loading

Rationale

Proposal

hidmic commented Jul 27, 2020

wjwwood commented Aug 18, 2020

hidmic commented Aug 18, 2020

gbiggs commented Aug 19, 2020

wjwwood commented Aug 19, 2020

hidmic commented Aug 19, 2020 • edited Loading

wjwwood commented Aug 19, 2020

hidmic commented Jul 27, 2020 •

edited

Loading

hidmic commented Aug 19, 2020 •

edited

Loading