## Deep learning quantum DD compiler

## Methodology

1. **Model training.** Two deep learning models were trained on 1,000 random 5-qubit circuits to predict the difference two equivalent random circuits' Hamming distance to the ideal output. One model was trained on IBM Q Burlington, and one model was trained on IBM Q London. These supremacy-style random circuits had many gaps due to 2-qubit gates compensating for partial connectivity after compiling to maximum optimization; we filled the gaps completely randomly with sequences of U3 gates equivalent to the identity (i.e. random dynamical decoupling sequences).

2. **Compiling.** We ran the trained model on a new set of 500 random circuits. For each random circuit, we generated 1,000 randomly padded circuits. The deep learning model was then used to predict the relative noise between pairs of circuits in 50 tournaments of 20 competitors each, selecting the circuit expected to be the least noisy as the "compiled" circuit.

3. **Testing.** The compiled circuits were run on IBM Q Burlington, Essex, London, Ourense, Vigo (which have the same architecture) and Yorktown (which has a different architecture). When the compiled circuit corresponded to the device for which the noise model was trained, we observed a 12% (95% CI [12%, 13%]). On all other 5-qubit devices, the deep learning compiler also performed better than the IBM Qiskit compiler, but only by 5.2% (95% CI [4.9%, 5.6%]). This is shown to be significantly better than random dynamical decoupling sequences and the generally used XYXY sequence.

In [1]:
import numpy as np
import bootstrapped.bootstrap as bs
import bootstrapped.stats_functions as bs_stats
import glob

Average impact of random DD sequences on the training set:

In [7]:
def average_dd(file):
    a = np.load(file)
    free = np.expand_dims(a[:, 0], 1)
    compiled = a[:, 1:]
    return bs.bootstrap(((compiled - free)/free).flatten(), stat_func=bs_stats.mean)

print('Burlington', average_dd('supremacy_all_5_unique/burlington_noise.npy'))
print('London', average_dd('supremacy_all_5_unique/london_noise.npy'))

Burlington -0.04472841360952699    (-0.045877976501480885, -0.043586562159673985)
London -0.07317828321117872    (-0.07469895131131679, -0.07161936271596131)


We evaluate the change in noise as determined by Hamming distance between compiled and identity for all computers. Here we also show the difference in noise between a given device and the device for which the noise model was trained for (either IBM Q Burlington or IBM Q London). This analysis demonstrates that the noise model is device-specific with over 95% confidence for all comparisons to other devices.

In [9]:
def diff(files):
    data = []
    for i in range(len(files)):
        d = []
        for f in sorted(files[i]):
            d.append(np.load(f))
        if len(d) == 0:
            return None
        data.append(np.concatenate(d))
    diff = (data[0][:len(data[1])] - data[1])/data[1]
    return diff

In [52]:
def compare(computer):
    home = 'test_noise_5_' + computer + '/'
    home_files = [glob.glob(home + computer + '_compiled*'), glob.glob(home + computer + '_identity*')]
    home_diff = diff(home_files)

    print('---------- MODEL TRAINED ON', computer, '----------')
    print(computer, 'change in Hamming distance', bs.bootstrap(home_diff, stat_func=bs_stats.mean))
    print()

    comparisons = ['burlington', 'essex', 'london', 'ourense', 'vigo', 'yorktown']
    comparisons.remove(computer)
    all_diffs = []
    for c in comparisons:
        files = [glob.glob(home + c + '_compiled*'), glob.glob(home + c + '_identity*')]
        dr_diff = diff(files)
        all_diffs.append(dr_diff)
        print('Device:', c)
        if dr_diff is None:
            print('no data')
        else:
            print(c + ' change in noise', bs.bootstrap(dr_diff, stat_func=bs_stats.median))
            print('noise difference from burlington', bs.bootstrap(dr_diff - home_diff, stat_func=bs_stats.median))
        print()
    print()
    
    return home_diff, np.concatenate(all_diffs)
    
b_home, b_diffs = compare('burlington')
l_home, l_diffs = compare('london')

---------- MODEL TRAINED ON burlington ----------
burlington change in Hamming distance -0.11140341171900789    (-0.12424162473869524, -0.09785457534663462)

Device: essex
essex change in noise -0.05544860532039153    (-0.07066875273166959, -0.0426917043367816)
noise difference from burlington 0.0495800995168085    (0.03855165660877469, 0.06416576698036144)

Device: london
london change in noise -0.05561113504815794    (-0.05954463235886391, -0.04937955851956487)
noise difference from burlington 0.04336468345014906    (0.028038757397818553, 0.054271579530622165)

Device: ourense
ourense change in noise -0.06148141545139791    (-0.07027990407352763, -0.05256197981566858)
noise difference from burlington 0.03574368774382461    (0.024658910206309573, 0.04752435282444325)

Device: vigo
vigo change in noise -0.047762428196207886    (-0.056711075319585455, -0.03707863334554757)
noise difference from burlington 0.04772189166971301    (0.025048129892091833, 0.06569951600202092)

Device: yorkto

We can aggregate these individual device results into two numbers:
* 12.3% (95% [11.5%, 13.0%]): the percent improvement in Hamming distance from the observed output to the ideal output of the deep learning compiled circuit minus the Qiskit compiled circuit _on the device for which deep learning was trained_
* 5.2% (95% [4.9%, 5.6%]): the percent improvement in Hamming distance from the observed output to the ideal output of the deep learning compiled circuit minus the Qiskit compiled circuit _on the device for which deep learning was **not** trained_

In [28]:
print('total deep learning noise', bs.bootstrap(np.concatenate((b_home, l_home)), stat_func=bs_stats.mean))
print('average improvement on different device', bs.bootstrap(np.concatenate((b_diffs, l_diffs)), stat_func=bs_stats.mean))

total deep learning noise -0.12280828589343883    (-0.13044564003565168, -0.11521732448599387)
average improvement on different device -0.05224541626483875    (-0.05577127586622978, -0.04875028977743232)


Out of the compiled circuits for which all gaps were multiples of 4, we can check how the deep learning compilation compares to inserting XYXY sequences, as is common in dynamical decoupling. Deep learning is found to improve noise 6.5% (95% CI [2.1%, 10.6%]) percentage points better than XYXY padding.

In [56]:
b_xyxy = np.load('test_noise_xyxy_subset/burlington_compiled_00000.npy') - np.load('test_noise_xyxy_subset/burlington_identity_00000.npy')
l_xyxy = np.load('test_noise_xyxy_subset/london_compiled_00000.npy') - np.load('test_noise_xyxy_subset/london_identity_00000.npy')
b_xyxy_candidates = np.array([16, 182, 246, 263, 264, 303, 354, 396, 419, 482])
l_xyxy_candidates = np.array([182, 246, 264, 303, 354, 396, 419])
print(bs.bootstrap(np.concatenate((l_xyxy - l_home[l_xyxy_candidates], b_xyxy - b_home[b_xyxy_candidates])), stat_func=bs_stats.mean))

0.06467127855852302    (0.021068539533414254, 0.10577344875083491)
