# HW 3 - Quantum state preparation with RL

For this homework we will make use of the Ion Trap environment developed by Hendrik Poulsen Nautrup [here](https://github.com/HendrikPN/rl-ion-trap-tutorial). The environment simulates the preparation of quantum states in an qudit ion trap quantum computer using a restricted set of quantum gates. The goal is to prepare a specific target state with a given Schmidt Rank vector (SRV), starting from an initial state by applying a sequence of quantum gates (actions). Let's start by defining the environment:

In [3]:
from ion_trap import IonTrapEnv
import numpy as np


# the SRV defines the entanglement of the goal state
srv = [3,3,3]

KWARGS = {'phases': {'pulse_angles': [np.pi/2], 'pulse_phases': [np.pi/2], 'ms_phases': [-np.pi/2]}, # Gates available
          'num_ions': 3, 
          'goal': [srv],
          'max_steps': 10 # This is already default for the environment
         } 

env = IonTrapEnv(**KWARGS)

This environment has a discrete action space, where each action corresponds to applying a specific quantum gate to the current state. The number of possible actions is given by:

In [4]:
env.num_actions

7

To perform an action, we just use:

In [5]:
action = 5
observation, reward, done = env.step(action)

The observation is the current state of the quantum system:

In [6]:
observation

array([[ 0.+0.000000e+00j],
       [-1.-6.123234e-17j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j],
       [ 0.+0.000000e+00j]])

The reward is 1 if the goal state is reached. On the other hand, `done` is `True` if the episode has ended, either because the goal state has been reached or because the maximum number of steps has been taken.

For this homework, we will consider an enhanced version of this problem, and train an agent to prepare quantum states with different target SRVs. The agent will receive as input both the current state of the quantum system and the target SRV, and will have to learn a policy that can generalize across different target states. We will consider the SRVs given by the function `is_valid_srv` below:

In [8]:
from utils import is_valid_srv
from itertools import product
import torch

all_srv = list(product([1,2,3], repeat = env.num_ions))

valid_srv = torch.tensor([srv for srv in all_srv if srv == (1, 1, 1) or is_valid_srv(srv)[0]])
valid_srv

tensor([[1, 1, 1],
        [1, 2, 2],
        [2, 1, 2],
        [2, 2, 1],
        [2, 2, 2],
        [2, 2, 3],
        [2, 3, 2],
        [2, 3, 3],
        [3, 2, 2],
        [3, 2, 3],
        [3, 3, 2],
        [3, 3, 3]])

## Dummy implementation
Your submission to Codabench is just as before, you will need to submit a model that has to give a gate sequence for the given SRVs. Here is a dummy implementation of how it should work:

In [9]:
class model:

    def __init__(self):
        pass    
    
    def pred(self, samples):

        # This agent will just create random gate sequences        
        preds = []
        for s in samples:
            num_gates = torch.randint(3,10,(1,))
            gate_sequence = torch.randint(0, 7, (num_gates,))
            preds.append(gate_sequence)
            
        return preds

Then, the ingestion program in Codabench will run the following code to evaluate your model:

In [10]:
agent = model()
preds = agent.pred(valid_srv)
preds

[tensor([2, 2, 0, 3, 5, 6]),
 tensor([2, 4, 3, 5, 1, 5]),
 tensor([4, 0, 3, 5]),
 tensor([3, 0, 2, 2]),
 tensor([3, 3, 5, 5, 1, 3, 0, 4, 5]),
 tensor([2, 0, 6, 2, 1, 6, 2]),
 tensor([5, 2, 3, 0, 1, 1, 1]),
 tensor([0, 6, 0, 3, 3, 5, 3, 3, 4]),
 tensor([0, 4, 6]),
 tensor([5, 0, 5, 5, 5, 6, 2]),
 tensor([1, 0, 4]),
 tensor([6, 1, 6, 6, 6, 1, 4])]

From here, the scoring program will evaluate your predictions just as below. The score in this homework is the average gate length achieved by your agent (i.e. smaller is better). However, only gate sequences that lead to correct target states (`reward == 1`) will count, and any other will be penalized with the maximum gate length allowed by the environment (10 in this case).


In [11]:
seq_length = []
for srv, pred in zip(valid_srv, preds):

    KWARGS = {'phases': {'pulse_angles': [np.pi/2], 'pulse_phases': [np.pi/2], 'ms_phases': [-np.pi/2]}, 'num_ions': 3, 'goal': [list(srv)]}
    env = IonTrapEnv(**KWARGS) 
    
    env.reset()
    for action in pred:
        # perform action on environment and receive observation and reward
        observation, reward, done = env.step(action)
        # print(idx_srv, action, reward)
    if reward == 1:
        seq_length.append(len(pred))
    else:
        seq_length.append(env.max_steps)

# The random agent is very bad, hence the score being 10!
np.mean(seq_length)

np.float64(10.0)