We want to run a basic test to check if running QOSE on AWS is quicker. first we try this without parallelizing gradients.

In [1]:
import pennylane as qml
from pennylane import numpy as np
import networkx as nx

In [2]:
from subarchitecture_tree_search import run_tree_architecture_search


In [3]:
import os
import pickle
import time

In [4]:
    # Create a unique name for your experiment
    EXPERIMENT_NAME = 'LocalvsRemoteTree'

    # Create a directory to store the data
    if not os.path.exists('data'):
        os.mkdir('data/')

    data_path = f'data/{EXPERIMENT_NAME}'
    if not os.path.exists(data_path):
        os.mkdir(data_path)

In [5]:
    config = {'nqubits': 3,
              'min_tree_depth': 2,
              'max_tree_depth': 6,
              'prune_rate': 0.3,
              'prune_step': 3,
              'plot_trees': False,
              'data_set': 'moons',
              'nsteps': 20,
              'opt': qml.AdamOptimizer,
              'batch_size': 25,
              'n_samples': 1500,
              'learning_rate': 0.01,
              'save_frequency': 1,
              'save_path': data_path
              }

In [6]:
    with open(data_path + '/config.pickle', 'wb') as f:
        pickle.dump(config, f)

In [8]:

t_0_remote = time.time()
run_tree_architecture_search(config, "remote")
t_1_remote = time.time()


NoRegionError: You must specify a region.

In [7]:
t_0_local = time.time()
run_tree_architecture_search(config, "local")
t_1_local = time.time()


Depth = 1
Depth = 2
Prune Tree
Grow Pruned Tree
Depth = 3
Grow Tree
Depth = 4
Grow Tree
Depth = 5
Prune Tree
Grow Pruned Tree


In [None]:
print("Execution time on remote device (seconds):", t_1_remote - t_0_remote)
print("Execution time on local device (seconds):", t_1_local - t_0_local)

In [None]:
with open(data_path + '/tree_depth_2.pickle', "rb") as f:
        results = pickle.load(f)

In [None]:

    restore_depth = 3
    with open(data_path + f'/tree_depth_{restore_depth}.pickle', 'rb') as f:
        G = pickle.load(f)
  

In [None]:
nx.get_node_attributes(G, 'W')

In [None]:
d_qnode_remote = qml.grad(qnode_remote)

t_0_remote_grad = time.time()
d_qnode_remote(params)
t_1_remote_grad = time.time()

print("Gradient calculation time on remote device (seconds):", t_1_remote_grad - t_0_remote_grad)

.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

     Gradient calculation time on remote device (seconds): 20.92005863400118

Now, the local device:

<div class="alert alert-danger"><h4>Warning</h4><p>Evaluating the gradient with ``default.qubit`` will take a long time, consider
    commenting-out the following lines unless you are happy to wait.</p></div>



In [None]:
d_qnode_local = qml.grad(qnode_local)

t_0_local_grad = time.time()
d_qnode_local(params)
t_1_local_grad = time.time()

print("Gradient calculation time on local device (seconds):", t_1_local_grad - t_0_local_grad)

.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

     Gradient calculation time on local device (seconds): 941.8518133479993

Wow, the local device needs around 15 minutes or more! Compare this to less than a minute spent
calculating the gradient on SV1. This provides a powerful lesson in parallelization.

What if we had run on SV1 with ``parallel=False``? It would have taken around 3 minutes—still
faster than a local device, but much slower than running SV1 in parallel.

Scaling up QAOA for larger graphs
---------------------------------

The quantum approximate optimization algorithm (QAOA) is a candidate algorithm for near-term
quantum hardware that can find approximate solutions to combinatorial optimization
problems such as graph-based problems. We have seen in the main
:doc:`QAOA tutorial<tutorial_qaoa_intro>` how QAOA successfully solves the minimum vertex
cover problem on a four-node graph.

Here, let's be ambitious and try to solve the maximum cut problem on a twenty-node graph! In
maximum cut, the objective is to partition the graph's nodes into two groups so that the number
of edges crossed or 'cut' by the partition is maximized (see the diagram below). This problem is
NP-hard, so we expect it to be tough as we increase the number of graph nodes.

.. figure:: ../_static/max-cut.png
    :align: center
    :scale: 100%
    :alt: The maximum cut problem
    :target: javascript:void(0);

Let's first set the graph:



In [None]:
import networkx as nx

nodes = n_wires = 20
edges = 60
seed = 1967

g = nx.gnm_random_graph(nodes, edges, seed=seed)
positions = nx.spring_layout(g, seed=seed)

nx.draw(g, with_labels=True, pos=positions)

.. figure:: ../_static/20_node_graph.png
    :align: center
    :scale: 100%
    :target: javascript:void(0);

We will use the remote SV1 device to help us optimize our QAOA circuit as quickly as possible.
First, the device is loaded again for 20 qubits



In [None]:
dev = qml.device(
    "braket.aws.qubit",
    device_arn=device_arn,
    wires=n_wires,
    s3_destination_folder=s3_folder,
    parallel=True,
    max_parallel=20,
    poll_timeout_seconds=30,
)

Note the specification of ``max_parallel=20``. This means that up to ``20`` circuits will be
executed in parallel on SV1 (the default value is ``10``).

<div class="alert alert-danger"><h4>Warning</h4><p>Increasing the maximum number of parallel executions can result in a greater rate of
    spending on simulation fees on Amazon Braket. The value must also be set bearing in mind your
    service
    `quota <https://docs.aws.amazon.com/braket/latest/developerguide/braket-quotas.html>`__.</p></div>

The QAOA problem can then be set up following the standard pattern, as discussed in detail in
the :doc:`QAOA tutorial<tutorial_qaoa_intro>`.



In [None]:
cost_h, mixer_h = qml.qaoa.maxcut(g)
n_layers = 2


def qaoa_layer(gamma, alpha):
    qml.qaoa.cost_layer(gamma, cost_h)
    qml.qaoa.mixer_layer(alpha, mixer_h)


def circuit(params, **kwargs):
    for i in range(n_wires):  # Prepare an equal superposition over all qubits
        qml.Hadamard(wires=i)

    qml.layer(qaoa_layer, n_layers, params[0], params[1])


cost_function = qml.ExpvalCost(circuit, cost_h, dev, optimize=True)
optimizer = qml.AdagradOptimizer(stepsize=0.1)

We're now set up to train the circuit! Note, if you are training this circuit yourself, you may
want to increase the number of iterations in the optimization loop and also investigate changing
the number of QAOA layers.

<div class="alert alert-danger"><h4>Warning</h4><p>The following lines are computationally intensive. Remember that running it will result in
    simulation fees charged to your AWS account. We recommend monitoring your usage on the AWS
    dashboard.</p></div>



In [None]:
import time

np.random.seed(1967)
params = 0.01 * np.random.uniform(size=[2, n_layers])
iterations = 10

for i in range(iterations):
    t0 = time.time()

    params, cost_before = optimizer.step_and_cost(cost_function, params)

    t1 = time.time()

    if i == 0:
        print("Initial cost:", cost_before)
    else:
        print(f"Cost at step {i}:", cost_before)

    print(f"Completed iteration {i + 1}")
    print(f"Time to complete iteration: {t1 - t0} seconds")

print(f"Cost at step {iterations}:", cost_function(params))

np.save("params.npy", params)
print("Parameters saved to params.npy")

.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

   Initial cost: -29.98570234095951
   Completed iteration 1
   Time to complete iteration: 93.96246099472046 seconds
   Cost at step 1: -27.154071768632154
   Completed iteration 2
   Time to complete iteration: 84.80994844436646 seconds
   Cost at step 2: -29.98726230006233
   Completed iteration 3
   Time to complete iteration: 83.13504934310913 seconds
   Cost at step 3: -29.999163153600062
   Completed iteration 4
   Time to complete iteration: 85.61391234397888 seconds
   Cost at step 4: -30.002158646044307
   Completed iteration 5
   Time to complete iteration: 86.70688223838806 seconds
   Cost at step 5: -30.012058444011906
   Completed iteration 6
   Time to complete iteration: 83.26341080665588 seconds
   Cost at step 6: -30.063709712612443
   Completed iteration 7
   Time to complete iteration: 85.25566911697388 seconds
   Cost at step 7: -30.32522304705352
   Completed iteration 8
   Time to complete iteration: 83.55433392524719 seconds
   Cost at step 8: -31.411030331978186
   Completed iteration 9
   Time to complete iteration: 84.08745908737183 seconds
   Cost at step 9: -33.87153965616938
   Completed iteration 10
   Time to complete iteration: 87.4032838344574 seconds
   Cost at step 10: -36.05424874438809
   Parameters saved to params.npy

This example shows us that a 20-qubit QAOA problem can be trained within around 1-2 minutes per
iteration by using parallel executions on the Amazon Braket SV1 device to speed up gradient
calculations. If this problem were run on ``default.qubit`` without parallelization, we would
expect for training to take much longer.

The results of this optimization can be investigated by saving the parameters
:download:`here </demonstrations/braket/params.npy>` to your working directory. See if you can
analyze the performance of this optimized circuit following a similar strategy to the
:doc:`QAOA tutorial<tutorial_qaoa_intro>`. Did we find a large graph cut?

