-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add run_batch to Sampler and QuantumEngineSampler #3265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cirq/work/sampler.py
Outdated
| params_list: Parameter sweeps to use with the circuits. The number | ||
| of sweeps should match the number of circuits and will be | ||
| paired in order with the circuits. | ||
| repetitions: Number of circuit repetitions to run. Each sweep value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question here, was wondering what the reasoning is around keeping the number of repetitions fixed across all circuits, rather than able to be specified per-circuit? I imagine there are situations where a user might want to get a different number samples from each circuit. For example, in equation 60 of this paper there is an expected number of samples required to achieve a desired precision that is different for each circuit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps @dstrain115 or @maffoo can answer that question. Here I'm just following the interface of the already existing method cirq.google.Engine.run_batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, you do not receive any speed up if you specify different numbers of repetitions per circuit since the hardware cannot batch them together. This is specific to quantum engine, which makes me think that maybe we should not add this to Sampler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dstrain115, I agree that the batching is very specific currently to Quantum Engine, and it seems that it might be too early to create this abstraction as we simply don't know how/whether other NISQ devices will utilize batching (we should probably ask the Pasqal/AQT/IQM folks!).
On the other hand I see two drivers for this to happen:
- simulator/hardware gap and hardware/hardware gap - switching from simulation to hardware now has a friction point - at the extreme the user will have to maintain 5 different circuit execution strategies for simulation, Pasqal, AQT, IQM, Google Quantum Engine...it would be great to increase the overlap and reduce the amount of special code required for each architecture
- A similar question comes up in case of downstream users - e.g. how will TFQ leverage batching in QE? It seems that the Sampler interface is the main entrypoint, it would simplify maintenance for them. If we keep run_batch on the QE level, then they'll have to implement logic that goes below the Sampler abstraction level.
@kevinsung what is your main use case for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning to use this to speed up XEB benchmarking: https://github.com/quantumlib/Cirq/blob/master/cirq/experiments/grid_parallel_two_qubit_xeb.py.
As is typical of the code in the experiments/ directory, that code works at the Sampler abstraction level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sending batches of circuits is also useful for simulators: knowing all circuits upfront, rather than seeing them one at a time, allows various optimizations to be made. For example the batch of circuits can be parallelized: in TFQ, in order to use the cirq density matrix simulator (which inherits from cirq.Sampler), we use the code here to parallelize over batches of circuits. If there were a run_batch call in the API of cirq.Sampler, then this parallelization logic could be moved into the implementation of the simulator.
We also use batching functionality for cirq-backend wavefunction simulations (for things that inherit cirq.SimulatesFinalState), and a similar logic would lead to the creation of a new API call simulate_batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in our use case an answer to the question "What is batching for?" is to take advantage of multi-threaded machines to speed up simulation, and I think the reason to do it in the simulator itself is to reduce code duplication in the case that users besides us look to take advantage of this parallelization across circuits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to clear up confusion around the phrase "it could be interpreted as a sweep over repetitions": I think it is not quite a sweep over repetitions, rather I meant that each circuit in the batch would be paired with a number of samples to extract from it; so that len(repetitions) == len(programs). The API could also accept a single number, in which case it pads out repetitions to contain len(programs) copies of the passed number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the several comments, just want to be sure our use case is understood; if it still seems too specific to merit the API change I understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your comments and providing the TFQ/simulator perspective @zaqqwerty and sorry for the confusion about repetitions.
balopat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend closing this. See my previous comment. Sorry for not thinking this through upfront on the issue in the first place.
|
how do you plan to support a development flow where you can test code on a simulator and then jump to "the real thing" while using batching? |
Great question. My main hypothesis is that in the NISQ era we will not be able to invent/enforce a perfect abstraction that will work across all simulators and all quantum platforms that also allows for optimal execution on all these backends. To your question: anywhere outside the boundaries of "common functionality" the user will have to drop down to backend specific logic to leverage the specific features of that backend. I.e I might be completely off track here but something like: i.e. the user code will have to handle explicitly the cases of different platforms. Now, the Google code could offer a simulated version - to try out. That might be a good idea, so that you can test the batching logic in your code without having to run it against the real service. The question is where to draw that boundary and how. I think the boundary is around "unoptimized, generic workflows and/or universally applicable optimizations". If we can do optimization in a generic way, we should include that in our abstract classes. If we cannot, we should keep it in the subclasses. If we see that a certain optimization appears in most of our backends (e.g. batching turns out to be exactly the same for all our platforms) then it's probably worthwhile to pull it up the stack. But before we have this confidence it will just create confusion. |
|
I think specifying a batch of circuits that can't be expressed as a sweep is a pretty natural thing to want to do in many NISQ experiments, so I would suggest that we should include support for this in the standard An alternative to this kind of "explicit batching" is to have the sampler implementation implicitly batch things and exeecute them in an optimal way on the hardware. This is what we do internally, but it requires being able to submit multiple run calls without waiting which means we need to work out the model for async execution, which is not trivial. Explicit batching is something we can add now without all the async complications. |
|
Yes, as @maffoo said, there is no extra cost to using |
|
@maffoo 's comment suggests another answer to the question "What is batching for?" which is that |
|
Thank you for all the comments - I'm almost convinced to introduce this. The final question is still around the seeming difference in the backends: the repetitions. Is it a single number or one for each circuit? Would TFQ batching benefit from having a single number or is it an unnecessary limitation for the user? I would go with the more generic case in the interface and then the user can discover that in order to get any speed advantage on different platforms they'll have to parametrize batching accordingly, which in QE case will be the same repetition for each circuit. |
I agree. |
|
I also agree about going with the more generic case in the interface, since for the TFQ use case at least it will be most useful if the repetitions are able to be specified per-circuit in the batch. |
balopat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated review: based on the discussion let's make this work in the Sampler interface, but let's change the repetitions to be per program. We should also add ample explanation in the docstring to encourage users to check the docs on the actual device / simulator they are using to figure out what parametrization will lead to actual speed up.
|
Cool. It would be great if I can also address #3285 here. The proposal is that the user should not be forced to pass in a list of parameter resolvers; it should just work if no resolvers are passed. |
|
I have updated this PR to allow the number of repetitions to be a list and documented that child classes may have requirements that must be met to obtain a speedup. I have also updated the behavior of the method so that it works if no sweeps are provided. However, I didn't modify the behavior of |
Sounds good - I forgot to reply: let's do that in a separate PR to keep things clean / small. |
|
This looks good to me. @dstrain115 can you please have a look too at the final result? |
balopat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
@dstrain115 I've addressed your comments; could you take another look? |
dstrain115
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with a few small nits.
Resolves #3224 .