Further SIMD optimization and thread number optimization for Pauli operations #219
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
overview
This PR includes several independent updates for required optimizations.
Refactor source codes
Since some codes in csim-folder are too long, I've separated them into small files for the readability.
SIMD optimization for 2-qubit gate
In PR #181, several SIMD optimization codes are deployed to the public branch.
However, since 2-qubit gates require many conditions, they are left not SIMD optimized.
In this PR, 2-qubit gates are optimized with AVX2 instructions.
Light quantum circuit optimizer
Previous quantum circuit optimizer performs detailed circuit optimization using commutation relations of quantum gates. Though this requires time scaling polynomially to the number of quantum gates, it takes a time comparable with quantum circuit simulation itself. Furthermore, typical Ansatz circuits only requires simple circuit optimization.
Thus, I've implemented light-weight-version quantum circuit optimizer, which lets gate A absorb gate B only if the following conditions are all satisfied.
This optimization requires linear time to the number of gates.
Tuning parallelization conditions
When quantum circuits are sufficiently small, overheads of parallelization exceed speed up by parallelization. In such a case, we should not do parallelization if qulacs is compiled with OpenMP flags.
Several basic gates are forced to single thread, but multi-qubit Pauli gates and its rotation are left un-optimized. I've tuned optimization condition for this gate.