Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an error throw class #79

Closed
crtrott opened this issue Sep 4, 2015 · 8 comments
Closed

Provide an error throw class #79

crtrott opened this issue Sep 4, 2015 · 8 comments
Assignees
Labels
Feature Request Create new capability; will potentially require voting
Milestone

Comments

@crtrott
Copy link
Member

crtrott commented Sep 4, 2015

Based on discussions with production teams, we have a use-case for a error catch class.
The problem is that the codes heavily use throw/catch mechanisms not just to aboard but also to do things like reruns. Also they want to catch more than a single error in a run. Imagine for example that a model runs into trouble because of faulty setup. In this case all problematic elements should be identified in a single run instead of just one or the first.

One idea is to have a class into which one can push_back errors. This could be based on a linked list which can be implemented thread-safe. A catch mechanism after the parallel_for can then go into that, and process all thrown errors.

@crtrott crtrott added the Feature Request Create new capability; will potentially require voting label Sep 4, 2015
@mhoemmen
Copy link
Contributor

mhoemmen commented Sep 5, 2015

I would be interested in learning more about the use case. I usually handle stuff like this in Tpetra with custom code that collects all (or some subset) of the "bad" values inside of the loop, but reports them outside of the loop. If one of the goals is to stop processing early, then what people really want is early termination of a parallel kernel.

@crtrott
Copy link
Member Author

crtrott commented Sep 5, 2015

We already have a function which comes as close as we can to aborting a kernel. I believe this class is meant as a way to collect multiple errors. What I would envision is something similar to this:

parallel_for(..) {
...
if(error_condition) {
Kokkos::collect_error("My Error Message");
do_something due to error
}
}

And internally in the parallel for we would check the state of the error collector, and if there were any in the kernel we can print and throw there the throw could use the kernel name.
For the error messages themselves we might even be able to figure out which thread it is internally.
Furthermore there should be a function which you can use after the kernel to iterate over all errors.

@hcedwar
Copy link
Contributor

hcedwar commented Sep 8, 2015

Using exceptions for expected (non-error / non-exceptional) conditions is a really bad design idiom. Exceptions should be restricted to "something better than abort" usage. We can discuss a feature with respect to requirements, but immediately jumping to exceptions is a bad path.

@nmhamster
Copy link
Contributor

The use case here is calling into virtual functions (which may include wrapped calls to Fortran). The components being called may then need to report errors because to the caller but will do in non structured ways (i.e. no common return codes). The customer who presented this issue currently throws in receiving the error at the first case, this causes a fix up routine to run and then the whole computation is re-run again. What is desired is to collect all the "throwing" cases. Throws cannot be used in OpenMP across thread boundaries so this cannot be effectively parallelized in the prototype code the customer is working on. The proposal is to have a Kokkos class which allows them to collect up the errors as they occur and then inspect them at the end of the kernel. I think the inspection of the collected errors should allow the developer to decide what to do - i.e. do /they/ want to throw an exception, re-run code etc. That decision doesn't need to be in Kokkos.

@crtrott
Copy link
Member Author

crtrott commented Sep 8, 2015

Jup. My usage of the word "throw" was a bit uncareful. I didn't want to say that Kokkos should actually trigger an exception. This is mainly about collecting errors in a parallel environment and let the users outside of that parallel region inspect them and decide what to do.

@mrtupek
Copy link

mrtupek commented Oct 8, 2015

We would be OK with having a class that we can push-back errors to. In fact, we'd even be OK if that class was allocated/sized up front, and have error code/strings push-back'd until that space is used up. Any error after that can be ignored. Outside the loop we'd then process those error messages and do what we will with them.

@crtrott crtrott added this to the Backlog milestone Nov 23, 2015
@hcedwar
Copy link
Contributor

hcedwar commented Nov 8, 2016

ISO/C++ Standard committee decision on exception handling within C++17 parallel algorithms:
It is a trait/property of the execution policy for how exceptions thrown within the body of a parallel algorithm are handled.

@crtrott crtrott self-assigned this Nov 29, 2016
@crtrott crtrott modified the milestones: END 2016, Backlog Nov 29, 2016
@crtrott
Copy link
Member Author

crtrott commented Nov 29, 2016

Ok I just merged in Patricks error reporter class which should get us reasonably far. This is based on pull request #518.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Request Create new capability; will potentially require voting
Projects
None yet
Development

No branches or pull requests

5 participants