Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Obs] 4.4 - Checkpointing #4352

Merged
merged 7 commits into from Aug 4, 2021

Conversation

mpharrigan
Copy link
Collaborator

@mpharrigan mpharrigan commented Jul 23, 2021

Save "checkpoint" files during observable estimation.

With enough observables or enough samples or low enough variables you can construct long running calls to this functionality. These options will (optionally) make sure data is not lost in those scenarios.

  • It's off by default
  • If you just toggle it to True, it will save data in a temporary directory. The use case envisaged here is to guard against data loss in an unforseen interruption
  • You can provide your own filenames. The use case here can be part of the nominal operation where you use that file as the saved results for a given run
  • We need two filenames so we can do an atomic mv so errors during serialization won't result in data loss. The two filenames should be on the same disk or mv isn't atomic. We don't enforce that.

@mpharrigan mpharrigan requested review from cduck, vtomole and a team as code owners July 23, 2021 21:51
@google-cla google-cla bot added the cla: yes Makes googlebot stop complaining. label Jul 23, 2021
else:
checkpoint_other_fn = f'{checkpoint_dir}/{chk_basename}.prev.json'

return checkpoint_fn, checkpoint_other_fn
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If two separate files are required, we should return an error if these are equal.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. As written, it will still "work" as expected if you use the same filename for both. You just may get corruption if the process dies during the checkpointing process. Let me think about this.

  1. it's confusing and the user may be making a mistake
  2. a power user may not want two different files around and specifically wants to roll the dice with non-atomic checkpointing

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test that uses the same name for both. Let me know if you think I should disallow this behavior. I don't have a strong opinion either way

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is case (2) expected to be common? e.g. are these files particularly large, or are users expected to create many of them at once? Otherwise, I think the risk of confusion from (1) outweighs the utility of this behavior.

If case (2) is common, it may still be useful to guard it with an "are you sure?"-type flag, similar to the permit_terminal_measurements guard on simulate_expectation_values:

if not permit_terminal_measurements and program.are_any_measurements_terminal():
raise ValueError(
'Provided circuit has terminal measurements, which may '
'skew expectation values. If this is intentional, set '
'permit_terminal_measurements=True.'
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I banned it

chk_basename = os.path.basename(checkpoint_fn)
chk_basename, _, ext = chk_basename.rpartition('.')
if ext != 'json':
raise ValueError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we prefer an error for this instead of e.g. f'{chk_basename}.prev.{ext}' below?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original intent was to limit the amount of "magic" it would do for you, and if you're not following normal filename semantics then there's no telling what's going on. Specifically: if you don't have a file extension rpartition will give you ('', '', 'filename' which would give a weird automatic checkpoint_other_fn.

If you had something like .tar.gz the automatic filename would be basename.tar.prev.gz which is also weird.

Do you think I should relax it? I could meet halfway and accept other file extensions but reject the case where there's no file extension, i.e. there's no '.' in the name.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some validation that it follows filname.ext pattern. I left in the json extension check. Let me know what you think and I can remove the json extension check. I don't have a strong opinion either way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinions on my end either, but your comment helped clarify the reasoning for me. I think the current behavior should be fine.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok keeping the current behavior. We can loosen the json extension check further if the need arises

return None, None

if checkpoint_fn is None:
checkpoint_dir = tempfile.mkdtemp()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we call mkdtemp in tests, we should clean up the directories it creates to ensure tests are hermetic. TemporaryDirectory may help with this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. I added the pytest tmpdir fixture to the test. For the actual code, we want the checkpoint files to stick around

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would os.mkdir be preferable, given that the files are meant to outlive the Python process (i.e. they are non-temporary)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mkdir makes a directory but you'd still need to choose its location. This will give you a file in /tmp which gets cleaned up after like 30 days or whatever your computer's policy is.

cirq-core/cirq/work/observable_measurement_test.py Outdated Show resolved Hide resolved
cirq-core/cirq/work/observable_measurement.py Show resolved Hide resolved
chk_basename = os.path.basename(checkpoint_fn)
chk_basename, _, ext = chk_basename.rpartition('.')
if ext != 'json':
raise ValueError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No strong opinions on my end either, but your comment helped clarify the reasoning for me. I think the current behavior should be fine.

else:
checkpoint_other_fn = f'{checkpoint_dir}/{chk_basename}.prev.json'

return checkpoint_fn, checkpoint_other_fn
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is case (2) expected to be common? e.g. are these files particularly large, or are users expected to create many of them at once? Otherwise, I think the risk of confusion from (1) outweighs the utility of this behavior.

If case (2) is common, it may still be useful to guard it with an "are you sure?"-type flag, similar to the permit_terminal_measurements guard on simulate_expectation_values:

if not permit_terminal_measurements and program.are_any_measurements_terminal():
raise ValueError(
'Provided circuit has terminal measurements, which may '
'skew expectation values. If this is intentional, set '
'permit_terminal_measurements=True.'
)

return None, None

if checkpoint_fn is None:
checkpoint_dir = tempfile.mkdtemp()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would os.mkdir be preferable, given that the files are meant to outlive the Python process (i.e. they are non-temporary)?

cirq-core/cirq/work/observable_measurement.py Show resolved Hide resolved
@mpharrigan
Copy link
Collaborator Author

@95-martin-orion back at you

@mpharrigan mpharrigan added the automerge Tells CirqBot to sync and merge this PR. (If it's running.) label Aug 2, 2021
@CirqBot
Copy link
Collaborator

CirqBot commented Aug 2, 2021

Automerge cancelled: A status check is failing.

@CirqBot CirqBot removed the automerge Tells CirqBot to sync and merge this PR. (If it's running.) label Aug 2, 2021
@mpharrigan mpharrigan added the automerge Tells CirqBot to sync and merge this PR. (If it's running.) label Aug 4, 2021
@CirqBot CirqBot added the front_of_queue_automerge CirqBot uses this label to indicate (and remember) what's being merged next. label Aug 4, 2021
@CirqBot CirqBot merged commit 120eb87 into quantumlib:master Aug 4, 2021
@CirqBot CirqBot removed automerge Tells CirqBot to sync and merge this PR. (If it's running.) front_of_queue_automerge CirqBot uses this label to indicate (and remember) what's being merged next. labels Aug 4, 2021
rht pushed a commit to rht/Cirq that referenced this pull request May 1, 2023
Save "checkpoint" files during observable estimation.

With enough observables or enough samples or low enough variables you can construct long running calls to this functionality. These options will (optionally) make sure data is not lost in those scenarios.

 - It's off by default
 - If you just toggle it to True, it will save data in a temporary directory. The use case envisaged here is to guard against data loss in an unforseen interruption
 - You can provide your own filenames. The use case here can be part of the nominal operation where you use that file as the saved results for a given run
 - We need two filenames so we can do an atomic `mv` so errors during serialization won't result in data loss. The two filenames should be on the same disk or `mv` isn't atomic. We don't enforce that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/expectation-value cla: yes Makes googlebot stop complaining.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants