Skip to content
This repository was archived by the owner on Jan 12, 2024. It is now read-only.
This repository was archived by the owner on Jan 12, 2024. It is now read-only.

Consider using passive OMP_WAIT_POLICY in native simulator #457

@IrinaYatsenko

Description

@IrinaYatsenko

By default (at least, on Windows 10) OpenMP seems to be configured to use spin-locks on the worker threads. This might make high-throughput workloads faster as the worker threads are always ready. However, it also might lead to wasting CPU cycles on the spin-locks while waiting for real work without any benefit for the "wall time" performance of the scenario. Profiling QML benchmarks (see attached) for 20 qubits shows that the worker threads are spending about 2/3 of the time in SwitchToThread, wasting power and hogging the CPU resources on the machine.

We should investigate into what kind of workloads are typical and consider setting OMP_WAIT_POLICY=passive. Another way to tackle this is to understand why we are not achieving the desired load on the worker threads. I ran the benchmark on 8 threads (rather than 16, the simulator allocates by default on my 16/32 core machine), there was a small regression in wall time per gate with slightly increased load per thread, but it was still below the full capacity.

Note 1: discussed with @thomashaener, he's suggested to profile for 24+ qubits as at 20 qubits the problem might still be too small to load all 16 threads.

Note 2: should also look into profiling cache accesses.

QML_benchmark.zip

omp_threads_spinning

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions