Conversation
Also, qsimcirq needs to provide hooks for this behavior. |
Added an arbitrary-state test; the rest of the TFQ tests seem to have parity with
Spun off #65 to address this - will likely require changes to the pybind module. |
sergeisakov
left a comment
There was a problem hiding this comment.
I think that this state sampler is biased because the state norm doesn't sum to one.
MichaelBroughton
left a comment
There was a problem hiding this comment.
This is a good start. I would say that if you intend to fuzz test this function via a cirq.Sampler interface later on then this amount of testing is probably fine. If you aren't going to do that then it might be worthwhile to make an automated test that can generate many large random states and compare that you are getting the right distributions when you are sampling from them.
Why is this the case? As I understand it, |
b512212 to
412c9ef
Compare
There was a problem hiding this comment.
The reason is that sum_i |c_i|^2 != 1. For instance, if sum_i |c_i|^2 < 1 then the largest bitstring will be overrepresented. I have been working on a state sampler today. I ended up with the following. The method is basically the same.
template <typename StateSpace>
std::vector<uint64_t> SampleFromState(
uint64_t seed, uint64_t num_samples, const StateSpace& state_space,
const typename StateSpace::State& state) {
std::vector<uint64_t> bitstrings;
if (num_samples > 0) {
bitstrings.reserve(num_samples);
double norm = 0;
for (uint64_t k = 0; k < state_space.Size(state); ++k) {
norm += std::norm(state_space.GetAmpl(state, k));
}
std::mt19937_64 rgen(seed);
double rgen_norm = norm / std::mt19937_64::max();
std::vector<double> rs;
rs.reserve(num_samples);
for (uint64_t i = 0; i < num_samples; ++i) {
rs.emplace_back(rgen() * rgen_norm);
}
std::sort(rs.begin(), rs.end());
uint64_t m = 0;
double csum = 0;
for (uint64_t k = 0; k < state_space.Size(state); ++k) {
csum += std::norm(state_space.GetAmpl(state, k));
while (m < num_samples && rs[m] <= csum) {
bitstrings.emplace_back(k);
++m;
}
}
}
return bitstrings;
}
|
EDIT: as noted below, the optimization described below is not feasible. As a preface: one upside to the current sampling method is that it doesn't need to iterate over all states (although it does so currently). This can greatly reduce sampling runtime for e.g. Porter-Thomas distributions over many-qubit circuits.
What are the sources and scale of error here? If it's only floating-point underflow, I don't think we need special handling for the sum; even for large numbers of qubits such effects should be negligible (double has 52 data bits -> max underflow is of the order
As noted above, this requires iterating over all states, which could take a while for many-qubit circuits. |
|
How do you propose to iterate over only a fraction of states? One can just exit the for loop when j == m. However, this greatly reduces sampling runtime only when m is small (m < 10). There are basically two sources of error: single precision and round-off errors. The effect is quite severe. For instance, I have 1 - sum_i |c_i|^2 = 2.89e-06 for circuits/circuit_q30 of depth 8. So if you need, say, 1e7 samples then the sampling that uses the tfq code will be significantly biased towards some particular bitstring (the last one that gets picked up in the for loop). Iterating over all states is typically quite fast compared to circuit simulation when using the same number of threads. We iterate over all states many times to simulate a large circuit. |
|
For the record that is 30 bitstrings that would come up out of 10 million. If you are concerned about the last bitstring having a probability of almost 0 and then having 30 extra samples of it showing up, maybe instead we can just pick the index in the wavefunction that has the highest probability and then add 30 more samples to that entry in the case of norm underflow. This would move the bias towards the highest probability bitstring instead. It would also allow sampling to happen in one pass over the wavefunction instead of two. while (j < m) {
samples->push_back(samples->at(samples->size() - 1));
j++;
}to be while (j < m) {
samples->push_back(samples->at(highest_prob_index));
j++;
}What do people think ? |
Disregard my previous comment; I had made an assumption (that the states were ordered by probability) that is incorrect. |
|
Given that iterating over all elements is unavoidable, it seems reasonable to me to use Sergei's approach - it won't significantly impact runtime, and it gives us more confidence in the resulting samples. I'll send another commit with the fix. |
|
That is true. Although when the states are this large, iterating over them once vs twice could lead to performance differences. |
Manual tests on a 30-qubit, depth-8 circuit using 8 threads show a noticeable difference in sampling runtime with the calculated sum-of-norms: using the current code, sampling takes ~10% the time of the simulation; with calculated sum-of-norms, it's closer to ~20%. (Note however that these are rough measures, and that the impact of sampling is less pronounced for larger circuit depth) There are ways around this; of note is the use of precomputed sums, as described here. This would make the initial pass O(n), with subsequent passes taking O(n/m) for |
I think it doesn't fix the bias. Though this bias (with or without your fix) should have negligible effect for many applications. |
You should use one thread because sampling is not parallelized.
I doubt it will be much faster. |
|
Given that the runtime of sampling is O(n) in either case, I think avoiding bias should take priority here. I recommend that we submit Sergei's implementation (now included in this PR) and push runtime improvements to a future PR. |
| } | ||
|
|
||
| std::mt19937_64 rgen(seed); | ||
| double rgen_norm = norm / std::mt19937_64::max(); |
There was a problem hiding this comment.
It is probably better to use std::uniform_real_distribution(0.0, norm) as in your initial commit.
| for (uint64_t k = 0; k < state_space.Size(state); ++k) { | ||
| csum += std::norm(state_space.GetAmpl(state, k)); | ||
|
|
||
| while (m < num_samples && rs[m] <= csum) { |
There was a problem hiding this comment.
This can be replaced by while (rs[m] <= csum && m < num_samples) if you replace rs.reserve(num_samples) by rs.reserve(num_samples + 1) above. Note that there is a bug in the original TFQ code: array access outside of its boundary. If you use std::uniform_real_distribution then replace rs[m] <= csum by rs[m] < csum.
|
Closing this PR in favor of #68. |
Adds a sampler for quantum states, (mostly) copied from TFQuantum. See #39.
Remaining work before review: add tests to match those in TFQ.