Skip to content

Add state sampler#63

Closed
95-martin-orion wants to merge 6 commits intomasterfrom
sampler
Closed

Add state sampler#63
95-martin-orion wants to merge 6 commits intomasterfrom
sampler

Conversation

@95-martin-orion
Copy link
Collaborator

Adds a sampler for quantum states, (mostly) copied from TFQuantum. See #39.

Remaining work before review: add tests to match those in TFQ.

@95-martin-orion
Copy link
Collaborator Author

Remaining work before review: add tests to match those in TFQ.

Also, qsimcirq needs to provide hooks for this behavior.

@95-martin-orion
Copy link
Collaborator Author

95-martin-orion commented Mar 26, 2020

Remaining work before review: add tests to match those in TFQ.

Added an arbitrary-state test; the rest of the TFQ tests seem to have parity with simulator_avx_test.

Also, qsimcirq needs to provide hooks for this behavior.

Spun off #65 to address this - will likely require changes to the pybind module.

@95-martin-orion 95-martin-orion marked this pull request as ready for review March 26, 2020 15:52
Copy link
Collaborator

@sergeisakov sergeisakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this state sampler is biased because the state norm doesn't sum to one.

Copy link
Collaborator

@MichaelBroughton MichaelBroughton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good start. I would say that if you intend to fuzz test this function via a cirq.Sampler interface later on then this amount of testing is probably fine. If you aren't going to do that then it might be worthwhile to make an automated test that can generate many large random states and compare that you are getting the right distributions when you are sampling from them.

@95-martin-orion
Copy link
Collaborator Author

I think that this state sampler is biased because the state norm doesn't sum to one.

Why is this the case? As I understand it, std::norm(ampl) returns |ampl|^2. For a superposition of states with amplitudes ampl[], I would expect the norms of each to sum to one.

Copy link
Collaborator

@sergeisakov sergeisakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason is that sum_i |c_i|^2 != 1. For instance, if sum_i |c_i|^2 < 1 then the largest bitstring will be overrepresented. I have been working on a state sampler today. I ended up with the following. The method is basically the same.

template <typename StateSpace>
std::vector<uint64_t> SampleFromState(
    uint64_t seed, uint64_t num_samples, const StateSpace& state_space,
    const typename StateSpace::State& state) {
  std::vector<uint64_t> bitstrings;

  if (num_samples > 0) {
    bitstrings.reserve(num_samples);

    double norm = 0;
    for (uint64_t k = 0; k < state_space.Size(state); ++k) {
      norm += std::norm(state_space.GetAmpl(state, k));
    }

    std::mt19937_64 rgen(seed);
    double rgen_norm = norm / std::mt19937_64::max();

    std::vector<double> rs;
    rs.reserve(num_samples);

    for (uint64_t i = 0; i < num_samples; ++i) {
      rs.emplace_back(rgen() * rgen_norm);
    }

    std::sort(rs.begin(), rs.end());

    uint64_t m = 0;
    double csum = 0;

    for (uint64_t k = 0; k < state_space.Size(state); ++k) {
      csum += std::norm(state_space.GetAmpl(state, k));

      while (m < num_samples && rs[m] <= csum) {
        bitstrings.emplace_back(k);
        ++m;
      }
    }
  }

  return bitstrings;
}

@95-martin-orion
Copy link
Collaborator Author

95-martin-orion commented Mar 26, 2020

EDIT: as noted below, the optimization described below is not feasible.

As a preface: one upside to the current sampling method is that it doesn't need to iterate over all states (although it does so currently). This can greatly reduce sampling runtime for e.g. Porter-Thomas distributions over many-qubit circuits.

The reason is that sum_i |c_i|^2 != 1. For instance, if sum_i |c_i|^2 < 1 then the largest bitstring will be overrepresented.

What are the sources and scale of error here? If it's only floating-point underflow, I don't think we need special handling for the sum; even for large numbers of qubits such effects should be negligible (double has 52 data bits -> max underflow is of the order 2^(qubits-52), or 2^-15 ~= 1/32,000 for 37 qubits)

I have been working on a state sampler today {...}

    double norm = 0;
    for (uint64_t k = 0; k < state_space.Size(state); ++k) {
      norm += std::norm(state_space.GetAmpl(state, k));
    }

As noted above, this requires iterating over all states, which could take a while for many-qubit circuits.

@sergeisakov
Copy link
Collaborator

How do you propose to iterate over only a fraction of states? One can just exit the for loop when j == m. However, this greatly reduces sampling runtime only when m is small (m < 10).

There are basically two sources of error: single precision and round-off errors. The effect is quite severe. For instance, I have 1 - sum_i |c_i|^2 = 2.89e-06 for circuits/circuit_q30 of depth 8. So if you need, say, 1e7 samples then the sampling that uses the tfq code will be significantly biased towards some particular bitstring (the last one that gets picked up in the for loop).

Iterating over all states is typically quite fast compared to circuit simulation when using the same number of threads. We iterate over all states many times to simulate a large circuit.

@MichaelBroughton
Copy link
Collaborator

MichaelBroughton commented Mar 26, 2020

For the record that is 30 bitstrings that would come up out of 10 million. If you are concerned about the last bitstring having a probability of almost 0 and then having 30 extra samples of it showing up, maybe instead we can just pick the index in the wavefunction that has the highest probability and then add 30 more samples to that entry in the case of norm underflow. This would move the bias towards the highest probability bitstring instead. It would also allow sampling to happen in one pass over the wavefunction instead of two.
Basically this line would change

while (j < m) {
      samples->push_back(samples->at(samples->size() - 1));
      j++;
}

to be

while (j < m) {
      samples->push_back(samples->at(highest_prob_index));
      j++;
}

What do people think ?

@95-martin-orion
Copy link
Collaborator Author

How do you propose to iterate over only a fraction of states? One can just exit the for loop when j == m. However, this greatly reduces sampling runtime only when m is small (m < 10).

Disregard my previous comment; I had made an assumption (that the states were ordered by probability) that is incorrect.

@95-martin-orion
Copy link
Collaborator Author

Given that iterating over all elements is unavoidable, it seems reasonable to me to use Sergei's approach - it won't significantly impact runtime, and it gives us more confidence in the resulting samples. I'll send another commit with the fix.

@MichaelBroughton
Copy link
Collaborator

That is true. Although when the states are this large, iterating over them once vs twice could lead to performance differences.

@95-martin-orion
Copy link
Collaborator Author

That is true. Although when the states are this large, iterating over them once vs twice could lead to performance differences.

Manual tests on a 30-qubit, depth-8 circuit using 8 threads show a noticeable difference in sampling runtime with the calculated sum-of-norms: using the current code, sampling takes ~10% the time of the simulation; with calculated sum-of-norms, it's closer to ~20%. (Note however that these are rough measures, and that the impact of sampling is less pronounced for larger circuit depth)

There are ways around this; of note is the use of precomputed sums, as described here. This would make the initial pass O(n), with subsequent passes taking O(n/m) for m stored sums.

@sergeisakov
Copy link
Collaborator

For the record that is 30 bitstrings that would come up out of 10 million. If you are concerned about the last bitstring having a probability of almost 0 and then having 30 extra samples of it showing up, maybe instead we can just pick the index in the wavefunction that has the highest probability and then add 30 more samples to that entry in the case of norm underflow. This would move the bias towards the highest probability bitstring instead. It would also allow sampling to happen in one pass over the wavefunction instead of two.

What do people think ?

I think it doesn't fix the bias. Though this bias (with or without your fix) should have negligible effect for many applications.

@sergeisakov
Copy link
Collaborator

sergeisakov commented Mar 27, 2020

Manual tests on a 30-qubit, depth-8 circuit using 8 threads show a noticeable difference in sampling runtime with the calculated sum-of-norms:

You should use one thread because sampling is not parallelized.

There are ways around this; of note is the use of precomputed sums, as described here. This would make the initial pass O(n), with subsequent passes taking O(n/m) for m stored sums.

I doubt it will be much faster.

@95-martin-orion
Copy link
Collaborator Author

Given that the runtime of sampling is O(n) in either case, I think avoiding bias should take priority here. I recommend that we submit Sergei's implementation (now included in this PR) and push runtime improvements to a future PR.

}

std::mt19937_64 rgen(seed);
double rgen_norm = norm / std::mt19937_64::max();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably better to use std::uniform_real_distribution(0.0, norm) as in your initial commit.

for (uint64_t k = 0; k < state_space.Size(state); ++k) {
csum += std::norm(state_space.GetAmpl(state, k));

while (m < num_samples && rs[m] <= csum) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be replaced by while (rs[m] <= csum && m < num_samples) if you replace rs.reserve(num_samples) by rs.reserve(num_samples + 1) above. Note that there is a bug in the original TFQ code: array access outside of its boundary. If you use std::uniform_real_distribution then replace rs[m] <= csum by rs[m] < csum.

@95-martin-orion
Copy link
Collaborator Author

Closing this PR in favor of #68.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants