-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
This commit changes the semantics for lost movers, to avoid a segfault #121
Conversation
Previously, if movers were lost the following would happen: 1) The user would get a warning, 2) The particle->i value would be shifted by move_p to be out of bounds 3) The particle never makes it to boundary_p in order to processes it This either teleports the particle across the domain; or causes a segfault in the next timestep. This PR: 1) Adds some notes to the user about how serious that warning is 2) Undoes the shift, so instead of segfauliting the code now "holds" the particle at the cell boundary This breaks physics, but to a lesser extent than the previous teleporting of a particle. It's still crucial the user uses a correct value for nm To recreate these issues just use a deck where the intial nm is set to 1 during define_species in the input deck
Codecov Report
@@ Coverage Diff @@
## devel #121 +/- ##
==========================================
- Coverage 83.14% 83.13% -0.02%
==========================================
Files 118 118
Lines 7287 7287
Branches 1124 1124
==========================================
- Hits 6059 6058 -1
- Misses 752 753 +1
Partials 476 476
Continue to review full report at Codecov.
|
Question to reviewers: Fixing this segfault makes it more likely we'll burn CPU time on invalid simulations. Do we want to add some sort of optionally detection that causes an ERROR instead? If so, do we want that exit to happen after diagnostics etc have happened? |
I advocate making this an error rather than a warning even though that is not the traditional behavior of VPIC. It is possible that if only a few particles are dropped, they won't significantly affect the physics, and you could "save" a simulation by continuing. But that is unlikely because it requires a "just right" mover size that is slightly less than the minimum needed. If you're that good at picking mover sizes, you should have no problem always having enough. More likely than saving a simulation, I think, is wasting compute time on a simulation that is already wrong. Further, users probably don't search their output for warnings, and so might not even know there is a problem. I think this makes it way too easy for wrong results to be trusted. More philosophically, I think that when VPIC errs in its calculation of the physics, it should present an error to the user. Holding the particles back instead of dropping them is probably massively better for the physics, but it's still wrong. This could be useful for test simulations that won't resolve the physics anyway, but I don't think continuing should be allowed unless the user specifically enables a physics-breaking option. |
Ideally the user wouldn't have to worry about such an arcane thing as how many particles might be passed among ranks in any given timestep, but resizing arrays inside |
The most important thing to me is that we do not segfault, because at that point a lot of debugging starts. I am totally ok with making this an error. I am not sure if we want leave in the unshifting code for the adventures user that might want to downgrade the error to a warning if they are caught in a strange corner. Or we should keep the code simple and just die, telling to use to arrange for more movers. Ideally the user what never have to worry about picking the right number of movers, but I have seen several cases where the automatic number of "-1" doesn't work well. Is there at least a way to suggest a better number of movers when we error out? So that the use gets an error message of "Pipeline %i (species = %s) ran out of storage for %i movers. Try increasing the movers in define_species to %i. If you don't have sufficient memory to do that increase the number of nodes." If we do stick with the warning, we should probably mention that at the end of the simulation saying "stuck particles" instead of "normal exit" or something. |
OK, it sounds like we all agree. Plan:
|
@petschge @SVLuedtke-LANL can one of you give me a sanity check on this from a low-level code perspective please? |
Other than those two small errors, it looks sane to me. |
Previously, if movers were lost the following would happen:
This either teleports the particle across the domain; or causes a
segfault in the next timestep.
This PR:
particle at the cell boundary
This breaks physics, but to a lesser extent than the previous
teleporting of a particle. It's still crucial the user uses a correct
value for nm
To recreate these issues just use a deck where the intial nm is set to 1
during define_species in the input deck