New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Obs] 4.4 - Checkpointing #4352
Conversation
else: | ||
checkpoint_other_fn = f'{checkpoint_dir}/{chk_basename}.prev.json' | ||
|
||
return checkpoint_fn, checkpoint_other_fn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If two separate files are required, we should return an error if these are equal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. As written, it will still "work" as expected if you use the same filename for both. You just may get corruption if the process dies during the checkpointing process. Let me think about this.
- it's confusing and the user may be making a mistake
- a power user may not want two different files around and specifically wants to roll the dice with non-atomic checkpointing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test that uses the same name for both. Let me know if you think I should disallow this behavior. I don't have a strong opinion either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is case (2) expected to be common? e.g. are these files particularly large, or are users expected to create many of them at once? Otherwise, I think the risk of confusion from (1) outweighs the utility of this behavior.
If case (2) is common, it may still be useful to guard it with an "are you sure?"-type flag, similar to the permit_terminal_measurements
guard on simulate_expectation_values
:
Cirq/cirq-core/cirq/sim/sparse_simulator.py
Lines 223 to 228 in fb43b84
if not permit_terminal_measurements and program.are_any_measurements_terminal(): | |
raise ValueError( | |
'Provided circuit has terminal measurements, which may ' | |
'skew expectation values. If this is intentional, set ' | |
'permit_terminal_measurements=True.' | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I banned it
chk_basename = os.path.basename(checkpoint_fn) | ||
chk_basename, _, ext = chk_basename.rpartition('.') | ||
if ext != 'json': | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we prefer an error for this instead of e.g. f'{chk_basename}.prev.{ext}'
below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original intent was to limit the amount of "magic" it would do for you, and if you're not following normal filename semantics then there's no telling what's going on. Specifically: if you don't have a file extension rpartition
will give you ('', '', 'filename'
which would give a weird automatic checkpoint_other_fn
.
If you had something like .tar.gz
the automatic filename would be basename.tar.prev.gz
which is also weird.
Do you think I should relax it? I could meet halfway and accept other file extensions but reject the case where there's no file extension, i.e. there's no '.'
in the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some validation that it follows filname.ext
pattern. I left in the json extension check. Let me know what you think and I can remove the json extension check. I don't have a strong opinion either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinions on my end either, but your comment helped clarify the reasoning for me. I think the current behavior should be fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok keeping the current behavior. We can loosen the json extension check further if the need arises
return None, None | ||
|
||
if checkpoint_fn is None: | ||
checkpoint_dir = tempfile.mkdtemp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we call mkdtemp
in tests, we should clean up the directories it creates to ensure tests are hermetic. TemporaryDirectory may help with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right. I added the pytest tmpdir
fixture to the test. For the actual code, we want the checkpoint files to stick around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would os.mkdir
be preferable, given that the files are meant to outlive the Python process (i.e. they are non-temporary)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mkdir
makes a directory but you'd still need to choose its location. This will give you a file in /tmp
which gets cleaned up after like 30 days or whatever your computer's policy is.
chk_basename = os.path.basename(checkpoint_fn) | ||
chk_basename, _, ext = chk_basename.rpartition('.') | ||
if ext != 'json': | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinions on my end either, but your comment helped clarify the reasoning for me. I think the current behavior should be fine.
else: | ||
checkpoint_other_fn = f'{checkpoint_dir}/{chk_basename}.prev.json' | ||
|
||
return checkpoint_fn, checkpoint_other_fn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is case (2) expected to be common? e.g. are these files particularly large, or are users expected to create many of them at once? Otherwise, I think the risk of confusion from (1) outweighs the utility of this behavior.
If case (2) is common, it may still be useful to guard it with an "are you sure?"-type flag, similar to the permit_terminal_measurements
guard on simulate_expectation_values
:
Cirq/cirq-core/cirq/sim/sparse_simulator.py
Lines 223 to 228 in fb43b84
if not permit_terminal_measurements and program.are_any_measurements_terminal(): | |
raise ValueError( | |
'Provided circuit has terminal measurements, which may ' | |
'skew expectation values. If this is intentional, set ' | |
'permit_terminal_measurements=True.' | |
) |
return None, None | ||
|
||
if checkpoint_fn is None: | ||
checkpoint_dir = tempfile.mkdtemp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would os.mkdir
be preferable, given that the files are meant to outlive the Python process (i.e. they are non-temporary)?
@95-martin-orion back at you |
Automerge cancelled: A status check is failing. |
This will not raise an exception if the file exists (on windows)
Save "checkpoint" files during observable estimation. With enough observables or enough samples or low enough variables you can construct long running calls to this functionality. These options will (optionally) make sure data is not lost in those scenarios. - It's off by default - If you just toggle it to True, it will save data in a temporary directory. The use case envisaged here is to guard against data loss in an unforseen interruption - You can provide your own filenames. The use case here can be part of the nominal operation where you use that file as the saved results for a given run - We need two filenames so we can do an atomic `mv` so errors during serialization won't result in data loss. The two filenames should be on the same disk or `mv` isn't atomic. We don't enforce that.
Save "checkpoint" files during observable estimation.
With enough observables or enough samples or low enough variables you can construct long running calls to this functionality. These options will (optionally) make sure data is not lost in those scenarios.
mv
so errors during serialization won't result in data loss. The two filenames should be on the same disk ormv
isn't atomic. We don't enforce that.