-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide an error throw class #79
Comments
I would be interested in learning more about the use case. I usually handle stuff like this in Tpetra with custom code that collects all (or some subset) of the "bad" values inside of the loop, but reports them outside of the loop. If one of the goals is to stop processing early, then what people really want is early termination of a parallel kernel. |
We already have a function which comes as close as we can to aborting a kernel. I believe this class is meant as a way to collect multiple errors. What I would envision is something similar to this: parallel_for(..) { And internally in the parallel for we would check the state of the error collector, and if there were any in the kernel we can print and throw there the throw could use the kernel name. |
Using exceptions for expected (non-error / non-exceptional) conditions is a really bad design idiom. Exceptions should be restricted to "something better than abort" usage. We can discuss a feature with respect to requirements, but immediately jumping to exceptions is a bad path. |
The use case here is calling into virtual functions (which may include wrapped calls to Fortran). The components being called may then need to report errors because to the caller but will do in non structured ways (i.e. no common return codes). The customer who presented this issue currently throws in receiving the error at the first case, this causes a fix up routine to run and then the whole computation is re-run again. What is desired is to collect all the "throwing" cases. Throws cannot be used in OpenMP across thread boundaries so this cannot be effectively parallelized in the prototype code the customer is working on. The proposal is to have a Kokkos class which allows them to collect up the errors as they occur and then inspect them at the end of the kernel. I think the inspection of the collected errors should allow the developer to decide what to do - i.e. do /they/ want to throw an exception, re-run code etc. That decision doesn't need to be in Kokkos. |
Jup. My usage of the word "throw" was a bit uncareful. I didn't want to say that Kokkos should actually trigger an exception. This is mainly about collecting errors in a parallel environment and let the users outside of that parallel region inspect them and decide what to do. |
We would be OK with having a class that we can push-back errors to. In fact, we'd even be OK if that class was allocated/sized up front, and have error code/strings push-back'd until that space is used up. Any error after that can be ignored. Outside the loop we'd then process those error messages and do what we will with them. |
ISO/C++ Standard committee decision on exception handling within C++17 parallel algorithms: |
Ok I just merged in Patricks error reporter class which should get us reasonably far. This is based on pull request #518. |
Based on discussions with production teams, we have a use-case for a error catch class.
The problem is that the codes heavily use throw/catch mechanisms not just to aboard but also to do things like reruns. Also they want to catch more than a single error in a run. Imagine for example that a model runs into trouble because of faulty setup. In this case all problematic elements should be identified in a single run instead of just one or the first.
One idea is to have a class into which one can push_back errors. This could be based on a linked list which can be implemented thread-safe. A catch mechanism after the parallel_for can then go into that, and process all thrown errors.
The text was updated successfully, but these errors were encountered: