# Tutorial 1: using a Quantum Device to extract machine-learning features

This notebook reproduces the first part of the [QEK paper](https://journals.aps.org/pra/abstract/10.1103/PhysRevA.107.042615) using the library's low-level API.

By the end of this notebook, you will know how to:

1. Import a molecular dataset (the library supports other type of graphs, of course).
2. Compile a register and a sequence of pulses from each graph.
3. Launch the execution of this compiled sequence on a quantum emulator or a physical QPU.
4. Use the result to extract the relevant machine-learning features.

A [companion notebook](./tutorial%20220-20Machine20Learning20with20QEK.ipynb) reproduces the machine-learning part of the QEK paper.

If you are not interested in quantum-level details, you may prefer the companion [high-level notebook](./tutorial%201%20-%20Using%20a%20Quantum%20Device%20to%20Extract%20Machine-Learning%20Features%20copy.ipynb) that mirrors this notebook, but using a higher-level API that takes care of all such issues.

## Dataset preparation

As in any machine learning task, we first need to load and prepare data. QEK can work with many types of graphs, including molecular graphs. For this tutorial, we will use the PTC-FM dataset, which contains such molecular graphs.

In [None]:
# Load the original PTC-FM dataset
import torch_geometric.datasets as pyg_dataset
og_ptcfm = pyg_dataset.TUDataset(root="dataset", name="PTC_FM")

display("Loaded %s samples" % (len(og_ptcfm), ))

This package lets researchers embed _graphs_ on Analog Quantum Devices. To do this, we need to give these graphs a geometry (their positions in,
space) and to confirm that the geometry is compatible with a Quantum Device.

This package builds upon the [Pulser framework](https://pulser.readthedocs.io/). Our objective, in this notebook, is to _compile_ graphs
into _Pulser Sequences_, the format understood by our Quantum Devices. In turn, a Pulser Sequence is defined by a target Quantum Device,
a _Pulser Register_ (the position of qubits) and _Pulser Pulses_ (the laser impulses controlling the evolution of the analog device).

As the geometry depends on the Quantum Device, we need to specify a device to use. For the time being, we'll use Pulser's `AnalogDevice`, which is
a reasonable default device. We'll show you a bit further how to use another device.

In this example, our graphs are representations of molecules. To simplify things, we'll use the dedicated class
`qek.data.graphs.PTCFMGraph` that use bio-chemical tools to compute a reasonable geometry from molecular data using the PTCFM conventions for a specific
Quantum Device. For other classes of graph, you will need to decide how to compute the geometry and use `qek.data.graphs.BaseGraph`.



In [None]:
from tqdm import tqdm
import pulser as pl
import qek.data.graphs as qek_graphs


graphs_to_compile = []

for i, data in enumerate(tqdm(og_ptcfm)):
    graph = qek_graphs.PTCFMGraph(data=data, device=pl.AnalogDevice, id=i)
    graphs_to_compile.append(graph)


## Compile a Register and a Pulse

Once the embedding is found, we compile a Register (the position of atoms on the Quantum Device) and a Pulse (the lasers applied to these atoms).

Note that not all graphs can be embedded on a given device. In this notebook, for the sake of simplicity, we simply discard graphs that cannot be trivially embedded. Future versions of this library may succeed at embedding more graphs.

In [None]:
from qek.shared.error import CompilationError

compiled = [] 

for graph in tqdm(graphs_to_compile):
    try:
        register = graph.compile_register()
        pulse = graph.compile_pulse()
    except CompilationError:
        # Let's just skip graphs that cannot be computed.
        print("Graph %s cannot be compiled for this device" % (graph.id, ))
        continue
    compiled.append((graph, register, pulse))
print("Compiled %s graphs into registers/pulses" % (len(compiled, )))

Let's take a look at some of these registers and pulses.

In [None]:
example_graph, example_register, example_pulse = compiled[64]

# The molecule, as laid out on the Quantum Device.
example_register.draw(blockade_radius=pl.AnalogDevice.min_atom_distance + 0.01)

# The laser pulse used to control its state evolution.
example_pulse.draw()

## Experimenting with registers and pulses

You can, of course, adopt different registers or pulses.

In [None]:
import pulser

example_register = pulser.Register({"q0": (0, 0)})
example_pulse = pulser.Pulse.ArbitraryPhase(
    amplitude=pulser.waveforms.RampWaveform(duration=150, start=100, stop=300),
    phase=pulser.waveforms.ConstantWaveform(duration=150, value=15),
    post_phase_shift=5
)

example_register.draw()
example_pulse.draw()

For this, you'll probably want to take a look at [the documentation of Pulser](https://pulser.readthedocs.io/).


# Executing the compiled sequences on an emulator

While our objective is to run the sequences on a physical QPU, it is generally a good idea to test out some of these sequences on an emulator first. For this example, we'll use the QutipEmulator, the simplest emulator provided with Pulser.

In [None]:
from qek.data.dataset import ProcessedData
from qek.backends import QutipBackend

# In this tutorial, to make things faster, we'll only run the sequences that require 5 qubits or less.
# If you wish to run more entries, feel free to increase this value.
#
# # Warning
#
# Emulating a Quantum Device takes exponential amount of resources and time! If you set MAX_QUBITS too
# high, you can bring your computer to its knees and/or crash this notebook.
MAX_QUBITS = 5

processed_dataset = []
executor = QutipBackend(device=pl.AnalogDevice)
for graph, register, pulse in tqdm(compiled):
    if len(register.qubits) > MAX_QUBITS:
        continue
    states = await executor.run(register=register, pulse=pulse)
    processed_dataset.append(ProcessedData.from_register(register=register, pulse=pulse, device=pl.AnalogDevice, state_dict=states, target=graph.target))

As mentioned, there are limits to what an emulator can do.

Pasqal has also developed an emulator called emu-mps, which generally provides much better performance and resource usage, so if you hit resource limits, don't hesitate to [check it out](https://github.com/pasqal-io/emulators)!

# Executing compiled sequences on a QPU

Once you have checked that the pulses work on an emulator, you will probably want to move to a QPU. Execution on a QPU takes
resources polynomial in the number of qubits, which hopefully means an almost exponential speedup for large number of qubits.

To experiment with a QPU, you will need either physical access to a QPU, or an account with [PASQAL Cloud](https://docs.pasqal.cloud), which provides you remote access to QPUs built and hosted by Pasqal. In this section, we'll see how to use the latter.

If you don't have an account, just skip to the next section!

In [None]:
HAVE_PASQAL_ACCOUNT = False # If you have a PASQAL Cloud account, fill in the details and set this to `True`.

if HAVE_PASQAL_ACCOUNT: 
    from qek.backends import RemoteQPUBackend
    processed_dataset = []

    # Initialize connection

    my_project_id = "your_project_id"# Replace this value with your project_id on the PASQAL platform.
    my_username   = "your_username"  # Replace this value with your username or email on the PASQAL platform.
    my_password   = "your_password"  # Replace this value with your password on the PASQAL platform.
        # Security note: In real life, you probably don't want to write your password in the code.
        # See the documentation of PASQAL Cloud for other ways to provide your password.

    # Initialize the cloud client
    executor = RemoteQPUBackend(username=my_username, project_id=my_project_id, password=my_password)

    # Fetch the specification of our QPU
    device = await executor.device()

    # As previously, create the list of graphs and embed them.
    graphs_to_compile = []
    for i, data in enumerate(tqdm(og_ptcfm)):
        graph = qek_graphs.PTCFMGraph(data=data, device=device, id=i)
        graphs_to_compile.append(graph)

    compiled = []
    for graph in tqdm(graphs_to_compile):
        sequence = None
        try:
            register = graph.compile_register()
            pulse = graph.compile_pulse()
        except CompilationError:
            # Let's just skip graphs that cannot be computed.
            print("Sequence %s cannot be compiled for this device" % (graph.id, ))
            continue
    compiled.append((graph, register, pulse))

    # Now that the connection is initialized, we just have to send the work
    # to the QPU and wait for the results.
    for graph, register, pulse in tqdm(compiled):

        # Send the work to the QPU and await the result
        states = await executor.run(register=register, pulse=pulse)
        processed_dataset.append(ProcessedData.from_register(register=register, pulse=pulse, device=device, state_dict=states, target=graph.target))

There are other ways to use the SDK. For instance, you can enqueue a job and check later whether it has completed. Also, to work around the long waiting lines, Pasqal provides high-performance distributed and hardware-accelerated emulators, which you can access through the SDK.

For more details, [take a look at the documentation of the SDK](https://docs.pasqal.cloud/).


## ...or using the provided dataset

For this notebook, instead of spending hours running the simulator on your computer, we're going to skip
this step and load on we're going to cheat and load the results, which are conveniently stored in `ptcfm_processed_dataset.json`.

In [None]:
import qek.data.dataset as qek_dataset
processed_dataset = qek_dataset.load_dataset(file_path="ptcfm_processed_dataset.json")
print(f"Size of the quantum compatible dataset = {len(processed_dataset)}")

## A look at the results

Let's take a look at one of our samples:

In [None]:
from qek.data.dataset import ProcessedData

# The geometry we compiled from this graph for execution on the Quantum Device.
dataset_example: ProcessedData = processed_dataset[64]
dataset_example.draw_register()

In [None]:
# The laser pulses we used to drive the execution on the Quantum Device.
dataset_example.draw_pulse()

The results of executing the embedding on the Quantum Device are in field `state_dict`:

In [None]:
display(dataset_example.state_dict)
print(f"Total number of samples: {sum(dataset_example.state_dict.values())}")

This dictionary represents an approximation of the quantum state of the device for this graph after completion of the algorithm.

- each of the keys represents one possible state for the register (which represents the graph), with each qubit (which represents a single node) being in state `0` or `1`;
- the corresponding value is the number of samples observed with this specific state of the register.

In this example, for instance, we can see that the state observed most frequently is `10000001010`, with 43/1000 samples.

Note: Since Quantum Devices are inherently non-deterministic, you will probably obtained different samples if you run this on a Quantum Device instead of loading the dataset.


## Machine learning-features

From the state dictionary, we derive as machine-learning feature the _distribution of excitation_. We'll use this in the next notebook to define our kernel.

In [None]:
dataset_example.draw_excitation()

# What now?

What we have seen so far covers the use of a Quantum Device to extract machine-learning features.

For the next step, we'll see [how to use these features for machine learning](./tutorial%202%20-%20Machine-Learning%20with%20the%20Quantum%20EvolutionKernel.ipynb).