Metropolis algorithm for sampling shots #332

stavros11 · 2021-02-28T09:33:38Z

Implements a custom operator for calculating measurement frequencies based on the Metropolis algorithm. Specifically given a probability distribution p(s) over bitstrings (usually the square of the wavefunction), we start s = argmax p(s) and we perform bit-flips (add random powers of 2) which we accept if p(s') / p(s) > random[0, 1]. All bitstrings sampled with this procedure are added in the frequencies, that is we do not use any additional moves to get uncorrelated samples.

Validity

I checked the statistics of this procedure by calculating the KL divergence between the target probability distribution p(s) and the one that corresponds to the measured frequencies q(s) = (number of times s appears) / nshots. Lower KL means better agreement between the samples and the target distribution. I compare the Metropolis approach implemented here with the rejection sampling implemented in #330.

Left: Using a uniform distribution for p(s), that is for example the one obtained with the following circuit:

c = Circuit(10)
c.add((gates.H(i) for i in range(10)))
c.add(gates.M(*range(10)))

Right: Averaging the KL over 30 random distributions p(s) (also for 10 qubits, that is of size 1024).

Rejection sampling indeed performs slightly better in this test, however I believe Metropolis, even with correlated samples, is also acceptable.

Performance

I performed the following benchmark using this branch and #330 (rejection sampling) on DGX CPU

from qibo import K

nqubits = 10
probs = K.ones(2 ** nqubits)
probs = probs / K.sum(probs)

start_time = time.time()
frequencies = K.sample_frequencies(probs, nshots)
print(time.time() - start_time)

nshots	Rejection Sampling	Metropolis
1e8	3.215017	0.333904
2e8	6.362053	0.565742
4e8	12.60637	1.085155
6e8	18.87215	1.566486
8e8	25.160118	2.056643
1e9	31.419534	2.562747
2e9	62.782110	5.049047
4e9	124.32199	9.682504
6e9	180.82613	15.061134

codecov · 2021-02-28T09:44:13Z

Codecov Report

Merging #332 (dc2c88a) into measurements (6fd90b4) will not change coverage.
The diff coverage is 100.00%.

@@              Coverage Diff               @@
##           measurements      #332   +/-   ##
==============================================
  Coverage        100.00%   100.00%           
==============================================
  Files                75        75           
  Lines             12187     12214   +27     
==============================================
+ Hits              12187     12214   +27

Flag	Coverage Δ
unittests	`100.00% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/qibo/backends/tensorflow.py	`100.00% <100.00%> (ø)`
src/qibo/config.py	`100.00% <100.00%> (ø)`
src/qibo/tensorflow/custom_operators/__init__.py	`100.00% <100.00%> (ø)`
...m_operators/python/ops/qibo_tf_custom_operators.py	`100.00% <100.00%> (ø)`
...o/tests_new/test_measurement_gate_probabilistic.py	`100.00% <100.00%> (ø)`
...qibo/tests_new/test_tensorflow_custom_operators.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6fd90b4...dc2c88a. Read the comment docs.

scarrazza · 2021-02-28T09:52:15Z

src/qibo/backends/tensorflow.py

@@ -47,7 +47,8 @@ def __init__(self):
        self.op = None
        if op._custom_operators_loaded:
            self.op = op
-
+        self._seed = 1234 # seed to use in the measurement frequency custom op


Could we set the current seed used by tf instead?

That was my original plan, however I could not find a "get_seed" method for Tensorflow. I will have a second look though. An alternative solution would be to set the seed randomly, for example via time, and just set the same seed for Tensorflow, like

self._seed = int(time.time()) self.backend.random.set_seed(self._seed)

scarrazza · 2021-02-28T09:54:37Z

src/qibo/tensorflow/custom_operators/cc/kernels/measurements.cc

+  {
+    int64 nstates = 1 << nqubits;
+    srand(user_seed);
+    unsigned thread_seed[omp_get_max_threads()];


I would avoid using C99 specs here, I believe these omp_* functions return int not const int, so we may prefer to allocate dynamically or use a std::vector.

scarrazza · 2021-02-28T10:57:33Z

@stavros11 thanks for this implementation, it looks quite promising. I have a couple of questions concerning the quality of this approach:

how the kl scales with nqubits?
if I understand correctly, you are comparing the uniform approach vs rejection/metropolis. What about using an analytic expression, e.g. for the QFT as reference?

stavros11 · 2021-02-28T13:09:38Z

Thank you for the comments.

how the kl scales with nqubits?

I have been doing some tests regarding this and here is a plot that does the same test for 20 qubits:

It appears that more shots are required to get low KL but this is true both for the Metropolis and the rejection algorithm. This plot is again for uniform p(s) but the same happens when using a random p(s). I will try to produce some KL vs nqubits plots.

if I understand correctly, you are comparing the uniform approach vs rejection/metropolis. What about using an analytic expression, e.g. for the QFT as reference?

No the KL is calculated by comparing the distribution of the rejection/metropolis samples (which is q(s) = frequency(s) / nshots) with the exact target distribution p(s). Specifically KL = sum over all s ( p(s) log ( q(s) / p(s) ) ) and this we can calculate since we have p(s) for each s (just the wavefunction squared).

In the first post p(s) is indeed uniform (that is p(s) = 1 / nstates) for the left plot but it is a random distribution p(s) = tf.random.uniform(nstates) (normalized) for the right plot. I can do it for the QFT but I would expect it to have similar behavior. For example QFT with |000...0> as initial state will actually give the uniform p(s) so it will be the same as the left plot.
What I haven't compared is what is the KL of the tf.random.categorical approach that is currently used in Qibo. This would be a third line in the above plots but I would expect it to be close to the rejection one.

scarrazza · 2021-02-28T14:25:23Z

Great, thanks for the test and clarification.
I think we can consider this approach as the default for sampling.

stavros11 · 2021-02-28T15:33:32Z

Great, thanks for the test and clarification.
I think we can consider this approach as the default for sampling.

In principle we could keep both sampling approaches but I agree that this seems to be a better choice since it is much faster and not much different in terms of statistical correctness. I made the two changes you suggested in the comments above and I will do a few more plots regarding the nqubits scaling of the KL. I am not sure what is the issue with CI, perhaps not related to this PR?

scarrazza · 2021-03-01T08:23:50Z

@stavros11, thanks I have fixed CI (was a temporary failure). I think this PR is ready to be merged if you are not planning other changes. Let me know.

stavros11 · 2021-03-01T11:42:11Z

@stavros11, thanks I have fixed CI (was a temporary failure). I think this PR is ready to be merged if you are not planning other changes. Let me know.

I agree with merging. I have checked up to 30 qubits and up to 10^10 shots with 10 qubits and everything seems to work without any precision issues. My only concern is that this seems to run when using a machine with GPU. I think the execution is done on GPU but it automatically fall backs to CPU for the measurement since we do not have the CUDA kernel for this yet. I am not sure if this is the expected behavior.

Here are some additional KL results including the categorical kernel:

Scaling with number of shots for 10 qubits.

Scaling with number of qubits for 10^6 shots.

Execution times for these runs are the following:

For 10 qubits:

nshots	Metropolis (sec)	Rejection (sec)	`tf.random.categorical` (sec)
100	0.001926	0.0018654	0.001610
1000	0.001919	0.001943	0.001789
1e4	0.002068	0.0028894	0.003031
1e5	0.002682	0.0101371	0.013999
1e6	0.006419	0.0565665	0.113498
1e7	0.029231	0.3882477	1.033740
1e8	0.217336	3.2174122	10.416167
3e8	0.578732	9.4760861
6e8	1.517557	18.883093
1e9	2.488320	31.408768

For 10^6 shots:

nshots	Metropolis (sec)	Rejection (sec)	`tf.random.categorical` (sec)
6	0.005672	0.009360	0.078742
8	0.004703	0.019943	0.094087
10	0.00725	0.046534	0.111727
12	0.00912	0.183703	0.137952
14	0.01830	0.535228	0.169207
16	0.040208	1.9285371	0.252327
18	0.046489	7.4501459	0.362962
20	0.062015	29.454006	0.466291
22	0.084872	118.180581	0.743794
24	0.144699	565.525054	1.596632
26	0.372883	2964.00242	4.021918

scarrazza · 2021-03-01T17:08:03Z

My only concern is that this seems to run when using a machine with GPU. I think the execution is done on GPU but it automatically fall backs to CPU for the measurement since we do not have the CUDA kernel for this yet. I am not sure if this is the expected behaviour.

What do you mean exactly? By default, the custom operator is accessible for CPU and GPU, so it will try to run MeasureFrequenciesOp. If you remember, some time ago, we had checks in order to raise an error if the operator was not available for GPU.

scarrazza · 2021-03-01T19:48:54Z

src/qibo/tensorflow/custom_operators/cc/kernels/measurements.cc

+    // grab the input tensor
+    Tensor frequencies = context->input(0);
+    const Tensor& cumprobs = context->input(1);
+    const Tensor& nshots = context->input(2);


@stavros11, for the GPU would be great if we could consider nshots as attribute, otherwise we cannot access its value for the GPU block/thread computation.

The issue is that if we define nshots as attribute:

REGISTER_OP("MeasureFrequencies") .Attr("Tfloat: {float32, float64}") .Attr("Tint: {int32, int64}") .Input("frequencies: Tint") .Input("probs: Tfloat") .Attr("nshots: int") ...

then its type is int32 and this creates problems for many shots (like 10^10). I am not sure if Tensorflow supports int64 for attributes, but I think no according to their custom op guide.

Right, so I think a potential solution is to store the nshots value as the shape of the input tensor, so we can access this information from CPU. What do you think?

There is no tensor with shape that involves nshots, do you mean creating a new tensor of that shape as input? Won't this cause memory problems when nshots is very large?

Yes, this will cause memory issues. What about always float/double and then we cast to int64?

In terms of range, float should work, the range is between 1.2E-38 to 3.4E+38.

Maybe a more elegant solution is to always copy the nshots to CPU by allocating a new variable using with tf.device('CPU'): and then calling the measurement operator. This should allow us to keep using the int64 but also access kernels from custom operators.

Maybe a more elegant solution is to always copy the nshots to CPU by allocating a new variable using with tf.device('CPU'): and then calling the measurement operator. This should allow us to keep using the int64 but also access kernels from custom operators.

Would this require any change in the custom operator or we leave nshots as it is (input) and we just cast it on CPU in the Python code? Would there be any performance issues with the GPU due to copying nshots from CPU?

Yes, we keep the op unchanged but overload the python method with the cpu cast. I don't think we will see memory issues, if I understand the code, nshots will be always on a cpu tensor. Could you please give a try?

I am not sure if I understood correctly how this will help in the GPU kernel but can you please check if what I pushed in the last commit is correct? The reason I did not overload the python method in qibo_tf_custom_operators.py is that I don't want to import K there (it leads to a circular import so we would have to import it from within the function).

The measure_frequencies op is only used by sample_frequencies in the Tensorflow backend so what I did should be equivalent to overloading the method.

Thanks. I have tested, however nshots is copied back to the GPU after the cast. So this approach doesn't work.

stavros11 · 2021-03-01T19:56:12Z

What do you mean exactly? By default, the custom operator is accessible for CPU and GPU, so it will try to run MeasureFrequenciesOp. If you remember, some time ago, we had checks in order to raise an error if the operator was not available for GPU.

If I run a circuit with measurements on GPU using this branch, then the circuit is executed on GPU but the measurement op runs on CPU. I am not sure if this is the expected default behavior or if it should raise an error since this op does not exist for GPU.

scarrazza · 2021-03-01T19:58:46Z

Yes, this behaviour is expected, given that we did not register GPU operators, all tensors are cast to CPU and executed by the custom operator.

Add nshot switcher

Speeding up Metropolis for frequency

stavros11 added 30 commits February 22, 2021 15:36

Create measurement custom op files

b9ac95b

Implement measurement frequency custom operator

36ee76c

Change map to unordered_map

735d334

Use rand_r

4554ece

seed definition inside omp parallel

959f3a1

Use int64 for nshots

a1bcc38

Switch nshots and nqubits order

fd00945

Make nshots input

79f094d

Fix frequency sum

db38318

Fix random_number=1 issue

f2f8ca2

Use custom op in qibo measurements

070b45c

Merge branch 'measurements' into frequencyop

e1b3e6b

Fix probabilistic tests

7e6691b

Merge branch 'measurements' into frequencyop

f74e584

Use array for thread seeds

218f757

Use the same seed for all threads

b4ff3fa

Use random seed for each thread

e5113dd

Update tests

a4cb8df

Allow user to set seed

90f2b3a

Fix tests for macos and windows

bcb1528

Fix macos typo

218e554

Add test for measurement custom op

7125630

Fix test for mac

286cdd2

Use metropolis instead of rejection sampling

fcb501c

Fix initial shot

78724cc

Add comments

68bfeb4

Update op use

6d058fa

Switch nshots loop to int64

9fb6cfc

Update probabilistic tests

9f76237

Fix initial shot calculation

60d3637

stavros11 requested a review from scarrazza February 28, 2021 09:33

scarrazza reviewed Feb 28, 2021

View reviewed changes

stavros11 added 2 commits February 28, 2021 19:08

Use std::vector for thread_seed

158a067

Use time as tf random seed

638211f

scarrazza reviewed Mar 1, 2021

View reviewed changes

stavros11 and others added 3 commits March 2, 2021 12:25

Cast nshots to CPU

eb085de

Make nshots float attribute

9100ee7

preferring speed instead of memory

42768c0

scarrazza mentioned this pull request Mar 4, 2021

Speeding up Metropolis for frequency #338

Merged

2 tasks

scarrazza and others added 7 commits March 5, 2021 12:27

adding threshold for custom operator frequency sampling

96e4451

adding switch for 1e5 shots

4df03da

updating tests

4f76c17

adjusting test for GPU

186c9a8

fixing darwin reference

17ae569

Merge pull request #341 from Quantum-TII/metropolisboostswitch

20b3002

Add nshot switcher

Merge pull request #338 from Quantum-TII/metropolisboost

dc2c88a

Speeding up Metropolis for frequency

scarrazza merged commit 549652a into measurements Mar 5, 2021

scarrazza deleted the metropolisop branch March 6, 2021 09:26

stavros11 mentioned this pull request Apr 21, 2023

Circuit frequencies and samples do not match #856

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metropolis algorithm for sampling shots #332

Metropolis algorithm for sampling shots #332

stavros11 commented Feb 28, 2021

codecov bot commented Feb 28, 2021 •

edited

Loading

scarrazza Feb 28, 2021

stavros11 Feb 28, 2021

scarrazza Feb 28, 2021 •

edited

Loading

scarrazza commented Feb 28, 2021

stavros11 commented Feb 28, 2021 •

edited

Loading

scarrazza commented Feb 28, 2021

stavros11 commented Feb 28, 2021

scarrazza commented Mar 1, 2021

stavros11 commented Mar 1, 2021

scarrazza commented Mar 1, 2021

scarrazza Mar 1, 2021

stavros11 Mar 1, 2021

scarrazza Mar 1, 2021

stavros11 Mar 1, 2021

scarrazza Mar 1, 2021

scarrazza Mar 1, 2021

stavros11 Mar 1, 2021

scarrazza Mar 1, 2021 •

edited

Loading

stavros11 Mar 2, 2021 •

edited

Loading

scarrazza Mar 2, 2021

stavros11 commented Mar 1, 2021

scarrazza commented Mar 1, 2021 •

edited

Loading

Metropolis algorithm for sampling shots #332

Metropolis algorithm for sampling shots #332

Conversation

stavros11 commented Feb 28, 2021

codecov bot commented Feb 28, 2021 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scarrazza Feb 28, 2021 • edited Loading

Choose a reason for hiding this comment

scarrazza commented Feb 28, 2021

stavros11 commented Feb 28, 2021 • edited Loading

scarrazza commented Feb 28, 2021

stavros11 commented Feb 28, 2021

scarrazza commented Mar 1, 2021

stavros11 commented Mar 1, 2021

scarrazza commented Mar 1, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scarrazza Mar 1, 2021 • edited Loading

Choose a reason for hiding this comment

stavros11 Mar 2, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stavros11 commented Mar 1, 2021

scarrazza commented Mar 1, 2021 • edited Loading

codecov bot commented Feb 28, 2021 •

edited

Loading

scarrazza Feb 28, 2021 •

edited

Loading

stavros11 commented Feb 28, 2021 •

edited

Loading

scarrazza Mar 1, 2021 •

edited

Loading

stavros11 Mar 2, 2021 •

edited

Loading

scarrazza commented Mar 1, 2021 •

edited

Loading