Please install QOKit from source. If you have issues with urllib3, downgrade it after QOKit install by running `pip install urllib3==1.26.6`

First, we check that all the simulators are loaded properly

In [1]:
from qokit.fur import get_available_simulator_names

print(get_available_simulator_names("x"))

['gpu', 'python']


The output should be `['gpu', 'c', 'python']`. `'c'` simulator is optional and is not relevant to this benchmark. However, if it's missing, you can compile it manually by running `make -C qokit/fur/c/csim/src/` in the home directory of QOKit

In [2]:
import numpy as np
from tqdm import tqdm
import networkx as nx
import timeit 
from qokit.qaoa_objective_labs import get_qaoa_labs_objective
from qokit.qaoa_objective_maxcut import get_qaoa_maxcut_objective
from qokit.qaoa_objective import get_qaoa_objective

Note that `f_labs` requires a precomputed diagonal for higher `N`. 
You can precompute the diagonal once using 
```
from qokit.labs import negative_merit_factor_from_bitstring
ens = precompute_energies(negative_merit_factor_from_bitstring, N)
outpath = f"../qokit/assets/precomputed_merit_factors/precomputed_energies_{N}.npy"
np.save(outpath, ens, allow_pickle=False)
```
and saving it on disc under `qokit/assets/precomputed_merit_factors`. Note that precomputation can take a while.

In [3]:
# number of qubits
for N in [24, 26, 28]:
    print(f"N={N}")
    # QAOA depth
    p = 6

    theta = np.random.uniform(0,1,2*p)
    G = nx.random_regular_graph(4, N, seed=42)

    # Function initialization may not be fast
    f_maxcut = get_qaoa_maxcut_objective(N, p, G)
    f_labs = get_qaoa_labs_objective(N, p)

    # Function evaluation is fast
    for f, label in [(f_labs, "LABS"), (f_maxcut, "MaxCut")]:
        f(theta) # do not count the first evaluation
        times = []
        for _ in tqdm(range(10)):
            start = timeit.default_timer()
            f(theta)
            end = timeit.default_timer()
            times.append(end-start)
        print(f"\t{label} finished in {np.mean(times):.4f} on average, min: {np.min(times):.4f}, max: {np.max(times):.4f}")

N=24


100%|██████████| 10/10 [00:00<00:00, 12.12it/s]


	LABS finished in 0.0821 on average, min: 0.0806, max: 0.0885


100%|██████████| 10/10 [00:00<00:00, 12.87it/s]


	MaxCut finished in 0.0773 on average, min: 0.0759, max: 0.0836
N=26


100%|██████████| 10/10 [00:03<00:00,  2.84it/s]


	LABS finished in 0.3509 on average, min: 0.3481, max: 0.3601


100%|██████████| 10/10 [00:03<00:00,  3.02it/s]


	MaxCut finished in 0.3308 on average, min: 0.3289, max: 0.3393
N=28


100%|██████████| 10/10 [00:13<00:00,  1.39s/it]


	LABS finished in 1.3889 on average, min: 1.3856, max: 1.3953


100%|██████████| 10/10 [00:13<00:00,  1.31s/it]

	MaxCut finished in 1.3112 on average, min: 1.3088, max: 1.3201





This is what I measured on `g4dn.2xlarge` (NVIDIA T4 GPU) as the time to evaluate `f_labs(theta)` and `f_maxcut(theta)` (commit `f6f6f565`):

```
N=24
	LABS finished in 0.1823 on average, min: 0.1802, max: 0.1902
	MaxCut finished in 0.1676 on average, min: 0.1637, max: 0.1758
N=26
	LABS finished in 0.8143 on average, min: 0.8102, max: 0.8229
	MaxCut finished in 0.7606 on average, min: 0.7571, max: 0.7692
N=28
	LABS finished in 3.2480 on average, min: 3.2361, max: 3.2598
	MaxCut finished in 2.9858 on average, min: 2.9793, max: 2.9949
```

Same benchmark on `g5.2xlarge` (NVIDIA A10G):

```
N=24
	LABS finished in 0.0821 on average, min: 0.0806, max: 0.0885
	MaxCut finished in 0.0773 on average, min: 0.0759, max: 0.0836
N=26
	LABS finished in 0.3509 on average, min: 0.3481, max: 0.3601
	MaxCut finished in 0.3308 on average, min: 0.3289, max: 0.3393
N=28
	LABS finished in 1.3889 on average, min: 1.3856, max: 1.3953
	MaxCut finished in 1.3112 on average, min: 1.3088, max: 1.3201
```