Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

errors do not cancel computation -> endless loop #44

Closed
MaximilianPi opened this issue Jan 19, 2024 · 5 comments
Closed

errors do not cancel computation -> endless loop #44

MaximilianPi opened this issue Jan 19, 2024 · 5 comments
Assignees
Labels
bug Something isn't working

Comments

@MaximilianPi
Copy link

Hi @mihaiconstantin,

Errors in the workers do not seem to abort the computations and result in an infitine loop (tested on MacOS and Linux):

backend = parabar::start_backend(3L)
parabar::configure_bar(type = "modern", format = ":percent :eta", width = round(getOption("width")/2), clear =F)
results_tuning <- parabar::par_lapply(backend, 1:10, function(i) {
  print() # first error
  stop("Error...") 
  return(0)
})
print("End")
parabar::stop_backend(backend)

It works if you interrupt and rerun it a second time with the same workers/backend.

@mihaiconstantin mihaiconstantin added the bug Something isn't working label Jan 30, 2024
@mihaiconstantin mihaiconstantin self-assigned this Jan 30, 2024
@mihaiconstantin
Copy link
Owner

Thanks for reporting this! Do you know if this behavior also occurs when using the R6Class API?

@mihaiconstantin
Copy link
Owner

Consider for a moment the code below:

# Specification instance.
specification <- Specification$new()

# Specification details.
specification$set_cores(cores = 3)
specification$set_type(type = "psock")

# Backend instance.
backend <- AsyncBackend$new()

# Start the backend.
backend$start(specification)

# Run the task.
backend$sapply(1:10, function(x) {
    stop("First intended error.")
    stop("Second intended error.")
    return(0)
})

# Read the output.
backend$get_output(wait = TRUE)

# Stop it.
backend$stop()

The call backend$get_output(wait = TRUE) successfully reports that an error has occurred in the sub-session:

Error: ! in callr subprocess.
Caused by error in `checkForRemoteErrors(val)`:
! 3 nodes produced errors; first error: First intended error.

This is what we expect to see because in AsyncBackend.R we check the sub-session for errors and raise them in the interactive session (i.e., as seen in the lines below):

parabar/R/AsyncBackend.R

Lines 269 to 273 in 8bbeaab

# If an error ocurred in the session.
if (!is.null(output$error)) {
# Throw error in the main session.
Exception$async_task_error(output$error)
}

Since the backend works as intended, I tend to believe the issue is with the context classes in which this backend operates (i.e., maybe ProgressTrackingContext.R).

@mihaiconstantin
Copy link
Owner

It looks like the problem is, indeed, with the progress tracking, and not with the backend.

While the tasks are being executed, each worker reports the progress. The progress is then monitored from the interactive session by the .show_progress method of ProgressTrackingContext.R and displayed (e.g., as a progress bar). However, since each worker throws an error after the first task execution, subsequent executions are stopped and, consequently, no more progress is being reported. Despite this, the .show_progress is still waiting around for tasks to be executed, without knowledge that no further tasks will be executed, i.e.:

# While there are still tasks1 to be processed.
while (tasks_processed < total) {
# Get the current number of tasks processed.
current_tasks_processed <- length(readLines(log, warn = FALSE))

We need to let .show_progress know when the tasks stop executing. Otherwise, the progress bar would just get stuck at the point in time where an error occurs. In your example this happens right at the beginning, but it can also happen later, e.g.:

function(x) {
    Sys.sleep(0.01)
    if(x == 50) {
        stop("First intended error.")
        stop("Second intended error.")
    }
    return(0)
}

@mihaiconstantin
Copy link
Owner

@MaximilianPi, this is now fixed in #49 and will be in the next release.

@MaximilianPi
Copy link
Author

Great, thanks! @mihaiconstantin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

2 participants