New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Signal-based checkpoint sometimes outputs unconverged solution #25755
Labels
C: Framework
P: normal
A defect affecting operation with a low possibility of significantly affects.
T: defect
An anomaly, which is anything that deviates from expectations.
Comments
pbehne
added
P: normal
A defect affecting operation with a low possibility of significantly affects.
T: defect
An anomaly, which is anything that deviates from expectations.
labels
Oct 16, 2023
nmnobre
added a commit
to farscape-project/moose
that referenced
this issue
Oct 16, 2023
@pbehne what are you charging for this work? The indirect number? |
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Oct 18, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Oct 19, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Oct 19, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Oct 23, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Oct 27, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Oct 27, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Oct 30, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Oct 31, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Nov 30, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Dec 1, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Dec 14, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Dec 14, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Dec 18, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Dec 19, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Dec 19, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Dec 19, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Dec 20, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Dec 20, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
pbehne
added a commit
to pbehne/moose
that referenced
this issue
Dec 20, 2023
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
@lindsayad: yes. |
maxnezdyur
pushed a commit
to maxnezdyur/moose
that referenced
this issue
Jan 5, 2024
This commit patches the issue (idaholab#25755) where a checkpoint can write an unconverged solution depending on when the USR1 signal is received, thereby affecting the accuracy of recovered simulations. The fix 1) adds logic that ensures a checkpoint is not output unless _current_execute_flag is contained in _execute_on, and 2) enforces that Checkpoint’s ‘execute_on’ parameter may only be set to ‘TIMESTEP_END’ so that only converged solutions are output. Exodiff tests are added to ensure that simulations recovered from signal-based checkpoints result in the same solutions as the uninterrupted simulations.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
C: Framework
P: normal
A defect affecting operation with a low possibility of significantly affects.
T: defect
An anomaly, which is anything that deviates from expectations.
Please assign this issue to me, as I have a fix and am updating tests.
Bug Description
This bug is regarding MOOSE's ability to output a checkpoint when receiving a signal via the command
kill -s USR1 <PID>
. If the signal is sent while MOOSE is busy solving a time step, a checkpoint of an unconverged solution is output. When the problem is restarted using this checkpoint, MOOSE assumes that the checkpoint contains the converged solution for the time step at the time of checkpoint writing, and proceeds to solve the next time step using the unconverged solution as the solution for the preceding time step. This results in an incorrect solution for all time steps after the checkpoint.Steps to Reproduce
The attached
test.txt
input file has a postprocessor that reports the average value of the solution at each time step. This input should first be run in its entirety to obtain the correct pp values. During this initial run, a signal-based checkpoint should be created using thekill
command specified above. To observe the incorrect behavior, the signal should be sent while MOOSE is in the middle of solving a time step. Next, the problem should be restarted using the--recover
command line input. If the checkpoint recorded an unconverged solution, the pp values will be different from the initial run for time steps after the signal was sent. If they are not, then the signal was not sent at the "correct" time. Try again, and if not successful, increase the mesh refinement to slow down the problem to time the signal better.Impact
This bug has the effect that restarting simulations from signal-based checkpoints can result in incorrect results, depending on when the signal was sent.
test.txt
The text was updated successfully, but these errors were encountered: