Users have noted that calling qsim_simulator.simulate() from Cirq takes considerably longer than the qsim simulation itself. This is likely due to the added cost of copying results from C++ to Python, which in theory can be avoided.
To resolve this issue, the pybind layer should ensure that results from Python are captured in C++ objects without copying.