[Obs] 4.4 - Checkpointing #4352

mpharrigan · 2021-07-23T21:51:17Z

Save "checkpoint" files during observable estimation.

With enough observables or enough samples or low enough variables you can construct long running calls to this functionality. These options will (optionally) make sure data is not lost in those scenarios.

It's off by default
If you just toggle it to True, it will save data in a temporary directory. The use case envisaged here is to guard against data loss in an unforseen interruption
You can provide your own filenames. The use case here can be part of the nominal operation where you use that file as the saved results for a given run
We need two filenames so we can do an atomic mv so errors during serialization won't result in data loss. The two filenames should be on the same disk or mv isn't atomic. We don't enforce that.

95-martin-orion · 2021-07-23T22:29:51Z

cirq-core/cirq/work/observable_measurement.py

+        else:
+            checkpoint_other_fn = f'{checkpoint_dir}/{chk_basename}.prev.json'
+
+    return checkpoint_fn, checkpoint_other_fn


If two separate files are required, we should return an error if these are equal.

Good point. As written, it will still "work" as expected if you use the same filename for both. You just may get corruption if the process dies during the checkpointing process. Let me think about this.

it's confusing and the user may be making a mistake

a power user may not want two different files around and specifically wants to roll the dice with non-atomic checkpointing

I added a test that uses the same name for both. Let me know if you think I should disallow this behavior. I don't have a strong opinion either way

Is case (2) expected to be common? e.g. are these files particularly large, or are users expected to create many of them at once? Otherwise, I think the risk of confusion from (1) outweighs the utility of this behavior.

If case (2) is common, it may still be useful to guard it with an "are you sure?"-type flag, similar to the permit_terminal_measurements guard on simulate_expectation_values:

Cirq/cirq-core/cirq/sim/sparse_simulator.py

Lines 223 to 228 in fb43b84

if not permit_terminal_measurements and program.are_any_measurements_terminal():

raise ValueError(

'Provided circuit has terminal measurements, which may '

'skew expectation values. If this is intentional, set '

'permit_terminal_measurements=True.'

)

I banned it

95-martin-orion · 2021-07-23T22:40:26Z

cirq-core/cirq/work/observable_measurement.py

+        chk_basename = os.path.basename(checkpoint_fn)
+        chk_basename, _, ext = chk_basename.rpartition('.')
+        if ext != 'json':
+            raise ValueError(


Any reason we prefer an error for this instead of e.g. f'{chk_basename}.prev.{ext}' below?

The original intent was to limit the amount of "magic" it would do for you, and if you're not following normal filename semantics then there's no telling what's going on. Specifically: if you don't have a file extension rpartition will give you ('', '', 'filename' which would give a weird automatic checkpoint_other_fn.

If you had something like .tar.gz the automatic filename would be basename.tar.prev.gz which is also weird.

Do you think I should relax it? I could meet halfway and accept other file extensions but reject the case where there's no file extension, i.e. there's no '.' in the name.

I added some validation that it follows filname.ext pattern. I left in the json extension check. Let me know what you think and I can remove the json extension check. I don't have a strong opinion either way.

No strong opinions on my end either, but your comment helped clarify the reasoning for me. I think the current behavior should be fine.

ok keeping the current behavior. We can loosen the json extension check further if the need arises

95-martin-orion · 2021-07-23T22:58:08Z

cirq-core/cirq/work/observable_measurement.py

+        return None, None
+
+    if checkpoint_fn is None:
+        checkpoint_dir = tempfile.mkdtemp()


If we call mkdtemp in tests, we should clean up the directories it creates to ensure tests are hermetic. TemporaryDirectory may help with this.

you're right. I added the pytest tmpdir fixture to the test. For the actual code, we want the checkpoint files to stick around

Would os.mkdir be preferable, given that the files are meant to outlive the Python process (i.e. they are non-temporary)?

mkdir makes a directory but you'd still need to choose its location. This will give you a file in /tmp which gets cleaned up after like 30 days or whatever your computer's policy is.

cirq-core/cirq/work/observable_measurement_test.py

cirq-core/cirq/work/observable_measurement.py

95-martin-orion · 2021-07-26T20:00:45Z

cirq-core/cirq/work/observable_measurement.py

+        chk_basename = os.path.basename(checkpoint_fn)
+        chk_basename, _, ext = chk_basename.rpartition('.')
+        if ext != 'json':
+            raise ValueError(


No strong opinions on my end either, but your comment helped clarify the reasoning for me. I think the current behavior should be fine.

95-martin-orion · 2021-07-26T20:14:30Z

cirq-core/cirq/work/observable_measurement.py

+        else:
+            checkpoint_other_fn = f'{checkpoint_dir}/{chk_basename}.prev.json'
+
+    return checkpoint_fn, checkpoint_other_fn


Is case (2) expected to be common? e.g. are these files particularly large, or are users expected to create many of them at once? Otherwise, I think the risk of confusion from (1) outweighs the utility of this behavior.

If case (2) is common, it may still be useful to guard it with an "are you sure?"-type flag, similar to the permit_terminal_measurements guard on simulate_expectation_values:

Cirq/cirq-core/cirq/sim/sparse_simulator.py

Lines 223 to 228 in fb43b84

if not permit_terminal_measurements and program.are_any_measurements_terminal():

raise ValueError(

'Provided circuit has terminal measurements, which may '

'skew expectation values. If this is intentional, set '

'permit_terminal_measurements=True.'

)

95-martin-orion · 2021-07-26T20:18:25Z

cirq-core/cirq/work/observable_measurement.py

+        return None, None
+
+    if checkpoint_fn is None:
+        checkpoint_dir = tempfile.mkdtemp()


Would os.mkdir be preferable, given that the files are meant to outlive the Python process (i.e. they are non-temporary)?

cirq-core/cirq/work/observable_measurement.py

mpharrigan · 2021-08-02T16:34:58Z

@95-martin-orion back at you

CirqBot · 2021-08-02T23:06:04Z

Automerge cancelled: A status check is failing.

This will not raise an exception if the file exists (on windows)

Save "checkpoint" files during observable estimation. With enough observables or enough samples or low enough variables you can construct long running calls to this functionality. These options will (optionally) make sure data is not lost in those scenarios. - It's off by default - If you just toggle it to True, it will save data in a temporary directory. The use case envisaged here is to guard against data loss in an unforseen interruption - You can provide your own filenames. The use case here can be part of the nominal operation where you use that file as the saved results for a given run - We need two filenames so we can do an atomic `mv` so errors during serialization won't result in data loss. The two filenames should be on the same disk or `mv` isn't atomic. We don't enforce that.

[Obs] 4.4 - Checkpointing

9f4412e

mpharrigan added the area/expectation-value label Jul 23, 2021

mpharrigan requested a review from 95-martin-orion July 23, 2021 21:51

mpharrigan assigned 95-martin-orion Jul 23, 2021

mpharrigan requested review from cduck, vtomole and a team as code owners July 23, 2021 21:51

google-cla bot added the cla: yes Makes googlebot stop complaining. label Jul 23, 2021

95-martin-orion requested changes Jul 23, 2021

View reviewed changes

mpharrigan added 2 commits July 26, 2021 09:51

Review comments

8b50b6f

Merge remote-tracking branch 'origin/master' into obs-4.4-checkpointing

fbb3f95

95-martin-orion requested changes Jul 26, 2021

View reviewed changes

disallow duplicate fns

d42060a

95-martin-orion approved these changes Aug 2, 2021

View reviewed changes

mpharrigan added the automerge Tells CirqBot to sync and merge this PR. (If it's running.) label Aug 2, 2021

CirqBot removed the automerge Tells CirqBot to sync and merge this PR. (If it's running.) label Aug 2, 2021

mpharrigan added 2 commits August 2, 2021 16:11

Merge remote-tracking branch 'origin/master' into obs-4.4-checkpointing

9c78f8d

os.rename -> os.replace

3c0829d

This will not raise an exception if the file exists (on windows)

mpharrigan added the automerge Tells CirqBot to sync and merge this PR. (If it's running.) label Aug 4, 2021

CirqBot added the front_of_queue_automerge CirqBot uses this label to indicate (and remember) what's being merged next. label Aug 4, 2021

Merge branch 'master' into obs-4.4-checkpointing

292e16f

CirqBot merged commit 120eb87 into quantumlib:master Aug 4, 2021

CirqBot removed automerge Tells CirqBot to sync and merge this PR. (If it's running.) front_of_queue_automerge CirqBot uses this label to indicate (and remember) what's being merged next. labels Aug 4, 2021

mpharrigan mentioned this pull request Aug 16, 2021

Observable Measurement - 4 - Sampling Loop #3647

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Obs] 4.4 - Checkpointing #4352

[Obs] 4.4 - Checkpointing #4352

mpharrigan commented Jul 23, 2021 •

edited

95-martin-orion Jul 23, 2021

mpharrigan Jul 26, 2021

mpharrigan Jul 26, 2021

95-martin-orion Jul 26, 2021

mpharrigan Aug 2, 2021

95-martin-orion Jul 23, 2021

mpharrigan Jul 26, 2021

mpharrigan Jul 26, 2021 •

edited

95-martin-orion Jul 26, 2021

mpharrigan Aug 2, 2021

95-martin-orion Jul 23, 2021

mpharrigan Jul 26, 2021

95-martin-orion Jul 26, 2021

mpharrigan Jul 26, 2021

95-martin-orion Jul 26, 2021

95-martin-orion Jul 26, 2021

95-martin-orion Jul 26, 2021

mpharrigan commented Aug 2, 2021

CirqBot commented Aug 2, 2021

	if not permit_terminal_measurements and program.are_any_measurements_terminal():
	raise ValueError(
	'Provided circuit has terminal measurements, which may '
	'skew expectation values. If this is intentional, set '
	'permit_terminal_measurements=True.'
	)

[Obs] 4.4 - Checkpointing #4352

[Obs] 4.4 - Checkpointing #4352

Conversation

mpharrigan commented Jul 23, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpharrigan Jul 26, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mpharrigan commented Aug 2, 2021

CirqBot commented Aug 2, 2021

mpharrigan commented Jul 23, 2021 •

edited

mpharrigan Jul 26, 2021 •

edited