Skip to content

restarting killed depletion simulations improperly saving h5 data #3387

@lewisgross1296

Description

@lewisgross1296

Bug Description

SEE TL;DR at end for concise summary

In #3272, I added the capability for depletion runs to continue an existing run, given that the initial timesteps/source rates found in depletion_results.h5 match those provided to the integrator requesting a continue run.

I found that the continue feature runs "successfully" once but that a second attempt at continuing the same simulation causes the error handling to think you've improperly defined a continue run, i.e. hitting this block.

if not np.array_equal(prev_timesteps, timesteps[:num_prev]):
raise ValueError(
"You are attempting to continue a run in which the previous timesteps "
"do not have the same initial timesteps as those provided to the "
"Integrator. Please make sure you are using the correct timesteps."
)

This happened despite using identical data in the first, second and third run... perplexing. I was getting this issue every time I attempted a second continue.

I think I've narrowed down the issue to the writing of the depletion_results.h5 file. I modified the Integrator class source code locally to print out what the class thought all the data was and discovered that one of the timesteps in the depletion_results.h5 did not match the actual dt supplied at that step.

For example, here are the timesteps passed to the integrator in my depletion run script

This simulation will deplete at 225000.0 W with the following time steps (units of days)
[1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 15.0, 15.0, 15.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0]

When attempting the second continue run, I printed some timestep checks inside of openmc/deplete/abc.py Here's the printing of prev_timesteps declared here

# Get timesteps and source rates from previous results
prev_times = operator.prev_res.get_times(timestep_units)
prev_source_rates = operator.prev_res.get_source_rates()
prev_timesteps = np.diff(prev_times)

and timesteps[:num_prev] from the snippet above with the if condition

prev_timesteps = 
 [  1.   1.   1.   1.   1.   5.   5.   5.  15.  15.  15.  60.  60. 120.  60.  60.  60.  60.  60.  60.  60.  60.  60.  60.  60.  60.  60.  60.]
timesteps[:num_prev] = 
 [1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 15.0, 15.0, 15.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0, 60.0]

Notice that there is a 120 in the timesteps from depletion_results.h5 where there is a 60 in the timesteps provided to the integrator.

Looking at the HPC, I noticed coincidentally that the 120 corresponds exactly to the last step of the first run. This makes me suspect that the first continue run mistakenly increments the time at that step in the first write to h5 of the continue.

Steps to Reproduce

The fastest way to reproduce would be to launch a depletion simulation with two or more steps. Then, launch a continue run (with the same first steps) and two additional steps. If you launch another properly defined continue run and any number of new steps, the error message for attempting a continue run with invalid steps should occur. I can add an MWE for this soon

Environment

Initially found on an HPC environment using OpenMC v0.15.2. I recreated locally with this commit

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions