# Variational Classifier with Pennylane

(necessary to pass the project) To familiarize yourself with the basic workflow in Quantum Machine Learning, work through the tutorial on Variational Classifier. Implement and present the usual steps in this workflow and explain in your own words the purpose of each step.

This task follows the [Variational Classifier tutorial](https://pennylane.ai/qml/demos/tutorial_variational_classifier/) by Pennylane. Building on what is taught, we create a QML model for the full iris dataset involving 3 types of irises (instead of only 2) and 4 features (instead of only the first 2).

In [1]:
import pennylane as qml
from pennylane import numpy as np
from pennylane.optimize import NesterovMomentumOptimizer

np.random.seed(0)

First we download [the iris dataset](https://www.kaggle.com/datasets/uciml/iris) from Kaggle, and load it with some processing. Our quantum classifier can output values between 0 to 1, however since our iris species is a categorical variable, it is not ideal to give them sequential values. Hence, we decide to employ one-hot encoding with the last 2 columns of our data structure.

In [3]:
import csv

data = []
with open('variational_classifier/data/Iris.csv', newline='') as csvfile:
    spamreader = csv.DictReader(csvfile, delimiter=',')
    for row in spamreader:
        data_row = [float(row['SepalLengthCm']), float(row['SepalWidthCm']), 
                float(row['PetalLengthCm']), float(row['PetalWidthCm']), 
                1 if row['Species'] == 'Iris-setosa' else -1,
                1 if row['Species'] == 'Iris-versicolor' else -1]

        data.append(data_row)

data = np.array(data)

Similar to the tutorial, we need to pad our X data so that relative difference between each X data point (or each row of data) is still preserved after normalisation. Since we already has 4 features (using 2 qubits), we add an extra qubit for the padding such that our vector size is $2^3 = 8$.

In [4]:
Y1 = data[:, 4]
Y2 = data[:, 5]

X = data[:, 0:4]
print(f"First X sample (original)  : {X[0]}")

# pad the vectors to size 2^3=8 with constant values
padding = np.ones((len(X), 4)) * 0.1
X_pad = np.c_[X, padding]
print(f"First X sample (padded)    : {X_pad[0]}")

# normalize each input
normalization = np.sqrt(np.sum(X_pad**2, 1)) # finds euclidean length
X_norm = (X_pad.T / normalization).T
print(f"First X sample (normalized): {X_norm[0]}")

First X sample (original)  : [5.1 3.5 1.4 0.2]
First X sample (padded)    : [5.1 3.5 1.4 0.2 0.1 0.1 0.1 0.1]
First X sample (normalized): [0.80337378 0.55133495 0.22053398 0.03150485 0.01575243 0.01575243
 0.01575243 0.01575243]


Here we define some helper functions. We utilise Pennylane's Amplitude Embedding function to directly prepare the states corresponding to our data points. We also utilise the same quantum layers used for the tutorial for simplicity, although they can be altered assuming that entanglement still occurs.

From this circuit, we can return the expectation and add a trainable bias to capture any biases in our model or in the dataset. We use the same functions given in the tutorial to calculate the cost of our circuit, which we define to be the square loss for our model. The accuracy function allows us to assess our model.

In [None]:
dev = qml.device('default.qubit')

def state_preparation(x):
    qml.AmplitudeEmbedding(features=x, wires=range(3), normalize=True)


def layer(layer_weights):
    for wire in range(3):
        qml.Rot(*layer_weights[wire], wires=wire)
    for pair in [[0, 1], [1, 2], [2, 0]]:
        qml.CNOT(wires=pair)


@qml.qnode(dev)
def circuit(weights, x):
    state_preparation(x)

    for layer_weights in weights:
        layer(layer_weights)

    return qml.expval(qml.PauliZ(0))

In [None]:
def variational_classifier(weights, bias, x):
    return circuit(weights, x) + bias

def square_loss(labels, predictions):
    # We use a call to qml.math.stack to allow subtracting the arrays directly
    return np.mean((labels - qml.math.stack(predictions)) ** 2)

def cost(weights, bias, X, Y):
    predictions = [variational_classifier(weights, bias, x) for x in X]
    return square_loss(Y, predictions)

def accuracy(labels, predictions):
    acc = sum(abs(l - p) < 1e-5 for l, p in zip(labels, predictions))
    acc = acc / len(labels)
    return acc

Since we employed one-hot encoding with two variables (whether the species is Iris-setosa or not and whether the species is Iris-versicolor or not), we will create two QML models to classify for each variable. 

These two QML models are meant to be sequential and considered together to form a larger classifier. We will directly use the outputs in the first model when processing the input data for the second model.

In [5]:

num_data = len(Y1)
num_train = int(0.75 * num_data)
index = np.random.permutation(range(num_data))
X_train = X_norm[index[:num_train]]
Y1_train = Y1[index[:num_train]]
X_val = X_norm[index[num_train:]]
Y1_val = Y1[index[num_train:]]

In [9]:
num_qubits = 3
num_layers = 6

weights_init = 0.01 * np.random.randn(num_layers, num_qubits, 3, requires_grad=True)
bias_init = np.array(0.0, requires_grad=True)

In [287]:
opt = NesterovMomentumOptimizer(0.01)
batch_size = 5

# train the variational classifier
weights = weights_init
bias = bias_init
for it in range(60):
    # Update the weights by one optimizer step
    batch_index = np.random.randint(0, num_train, (batch_size,))
    X_train_batch = X_train[batch_index]
    Y1_train_batch = Y1_train[batch_index]
    weights, bias, _, _ = opt.step(cost, weights, bias, X_train_batch, Y1_train_batch)

    # Compute predictions on train and validation set
    predictions_train = [np.sign(variational_classifier(weights, bias, X_train[i])) for i in range(len(X_train))]
    predictions_val = [np.sign(variational_classifier(weights, bias, X_val[i])) for i in range(len(X_val))]

    # Compute accuracy on train and validation set
    acc_train = accuracy(Y1_train, predictions_train)
    acc_val = accuracy(Y1_val, predictions_val)

    if (it + 1) % 2 == 0:
        _cost = cost(weights, bias, X_norm, Y1)
        print(
            f"Iter: {it + 1:5d} | Cost: {_cost:0.7f} | "
            f"Acc train: {acc_train:0.7f} | Acc validation: {acc_val:0.7f}"
        )

weights1 = weights
bias1 = bias

Iter:     2 | Cost: 1.9806145 | Acc train: 0.3392857 | Acc validation: 0.3157895
Iter:     4 | Cost: 1.7449219 | Acc train: 0.3392857 | Acc validation: 0.3157895
Iter:     6 | Cost: 1.4196121 | Acc train: 0.0089286 | Acc validation: 0.0000000
Iter:     8 | Cost: 1.2482700 | Acc train: 0.6071429 | Acc validation: 0.5789474
Iter:    10 | Cost: 1.2433723 | Acc train: 0.6607143 | Acc validation: 0.6842105
Iter:    12 | Cost: 1.3268026 | Acc train: 0.6607143 | Acc validation: 0.6842105
Iter:    14 | Cost: 1.2540876 | Acc train: 0.6607143 | Acc validation: 0.6842105
Iter:    16 | Cost: 1.1414321 | Acc train: 0.6607143 | Acc validation: 0.6842105
Iter:    18 | Cost: 1.0441755 | Acc train: 0.6607143 | Acc validation: 0.6842105
Iter:    20 | Cost: 0.9486916 | Acc train: 0.6607143 | Acc validation: 0.6842105
Iter:    22 | Cost: 0.8696014 | Acc train: 0.6607143 | Acc validation: 0.6842105
Iter:    24 | Cost: 0.7674455 | Acc train: 0.6607143 | Acc validation: 0.6842105
Iter:    26 | Cost: 0.678641

We see from the above that our model has a 100% accuracy when classifying for whether the species is Iris-Setosa! We save our model's trained weights and bias into the variables weights1 and bias1 for future prediction.

Our next model classifies for whether a datapoint is that of species Iris-versicolor, and from our first model we already have some datapoints classified as Iris-setosa which we can eliminate from the second model. The following code shows the data processing for our second model.

In [290]:
# data processing after the first QML model

predictions = [np.sign(variational_classifier(weights1, bias1, X_norm[i])) for i in range(len(X_norm))]


num_data = len(Y1)
num_train = int(0.75 * num_data)

X2 = X_norm[[True if prediction == -1 else False for prediction in predictions]]
print(np.shape(X2))
Y2 = Y2[[True if prediction == -1 else False for prediction in predictions]]


num_data = len(Y2)
num_train = int(0.75 * num_data)
index = np.random.permutation(range(num_data))
X2_train = X2[index[:num_train]]
X2_val = X2[index[num_train:]]
Y2_train = Y2[index[:num_train]]
Y2_val = Y2[index[num_train:]]


(100, 8)


Unfortunately, the gradient descent methods were unsuitable for classifying categorical data, especially since Iris-versicolor and the remaining Iris-virginica were much closer in their characteristics (Iris-setosa has drastic differences in its features compared to the other two). 

We tested out the optimizers AdagradOptimizer(0.05), SPSAOptimizer(), NesterovMomentumOptimizer(0.03), but the models producing the best results were NesterovMomentumOptimizer(0.01) and NesterovMomentumOptimizer(0.015), which were able to produce 70-80% validation accuracy.

In [302]:
opt = NesterovMomentumOptimizer(0.01)
batch_size = 5

# train the variational classifier
weights = weights_init
bias = bias_init
for it in range(60):
    # Update the weights by one optimizer step
    batch_index = np.random.randint(0, num_train, (batch_size,))
    X2_train_batch = X2_train[batch_index]
    Y2_train_batch = Y2_train[batch_index]
    weights, bias, _, _ = opt.step(cost, weights, bias, X2_train_batch, Y2_train_batch)

    # Compute predictions on train and validation set
    predictions_train = [np.sign(variational_classifier(weights, bias, X2_train[i])) for i in range(len(X2_train))]
    predictions_val = [np.sign(variational_classifier(weights, bias, X2_val[i])) for i in range(len(X2_val))]

    # Compute accuracy on train and validation set
    acc_train = accuracy(Y2_train, predictions_train)
    acc_val = accuracy(Y2_val, predictions_val)

    if (it + 1) % 2 == 0:
        _cost = cost(weights, bias, X2, Y2)
        print(
            f"Iter: {it + 1:5d} | Cost: {_cost:0.7f} | "
            f"Acc train: {acc_train:0.7f} | Acc validation: {acc_val:0.7f}"
        )

weight2 = weights
bias2 = bias

Iter:     2 | Cost: 1.4526274 | Acc train: 0.4800000 | Acc validation: 0.5600000
Iter:     4 | Cost: 1.3249414 | Acc train: 0.4800000 | Acc validation: 0.5600000
Iter:     6 | Cost: 1.1259806 | Acc train: 0.4800000 | Acc validation: 0.5600000
Iter:     8 | Cost: 1.0547436 | Acc train: 0.4800000 | Acc validation: 0.5600000
Iter:    10 | Cost: 1.0172980 | Acc train: 0.4800000 | Acc validation: 0.6000000
Iter:    12 | Cost: 1.0037268 | Acc train: 0.5200000 | Acc validation: 0.5200000
Iter:    14 | Cost: 1.0004456 | Acc train: 0.5200000 | Acc validation: 0.3200000
Iter:    16 | Cost: 1.0179078 | Acc train: 0.5200000 | Acc validation: 0.4400000
Iter:    18 | Cost: 1.0169162 | Acc train: 0.5200000 | Acc validation: 0.4400000
Iter:    20 | Cost: 0.9969315 | Acc train: 0.5333333 | Acc validation: 0.4400000
Iter:    22 | Cost: 0.9912207 | Acc train: 0.5333333 | Acc validation: 0.4400000
Iter:    24 | Cost: 0.9875108 | Acc train: 0.5333333 | Acc validation: 0.4400000
Iter:    26 | Cost: 0.976119

In [303]:
opt = NesterovMomentumOptimizer(0.015)
batch_size = 5

# train the variational classifier
weights = weights_init
bias = bias_init
for it in range(60):
    # Update the weights by one optimizer step
    batch_index = np.random.randint(0, num_train, (batch_size,))
    X2_train_batch = X2_train[batch_index]
    Y2_train_batch = Y2_train[batch_index]
    weights, bias, _, _ = opt.step(cost, weights, bias, X2_train_batch, Y2_train_batch)

    # Compute predictions on train and validation set
    predictions_train = [np.sign(variational_classifier(weights, bias, X2_train[i])) for i in range(len(X2_train))]
    predictions_val = [np.sign(variational_classifier(weights, bias, X2_val[i])) for i in range(len(X2_val))]

    # Compute accuracy on train and validation set
    acc_train = accuracy(Y2_train, predictions_train)
    acc_val = accuracy(Y2_val, predictions_val)

    if (it + 1) % 2 == 0:
        _cost = cost(weights, bias, X2, Y2)
        print(
            f"Iter: {it + 1:5d} | Cost: {_cost:0.7f} | "
            f"Acc train: {acc_train:0.7f} | Acc validation: {acc_val:0.7f}"
        )

Iter:     2 | Cost: 1.4322826 | Acc train: 0.4800000 | Acc validation: 0.5600000
Iter:     4 | Cost: 1.1172829 | Acc train: 0.4800000 | Acc validation: 0.5600000
Iter:     6 | Cost: 1.0256163 | Acc train: 0.5066667 | Acc validation: 0.3200000
Iter:     8 | Cost: 1.2844049 | Acc train: 0.5200000 | Acc validation: 0.4400000
Iter:    10 | Cost: 1.6085753 | Acc train: 0.5200000 | Acc validation: 0.4400000
Iter:    12 | Cost: 1.7212959 | Acc train: 0.5200000 | Acc validation: 0.4400000
Iter:    14 | Cost: 1.5175456 | Acc train: 0.5200000 | Acc validation: 0.4400000
Iter:    16 | Cost: 1.2334350 | Acc train: 0.5200000 | Acc validation: 0.4400000
Iter:    18 | Cost: 1.0613616 | Acc train: 0.5200000 | Acc validation: 0.4400000
Iter:    20 | Cost: 0.9811421 | Acc train: 0.5200000 | Acc validation: 0.3600000
Iter:    22 | Cost: 0.9665639 | Acc train: 0.5466667 | Acc validation: 0.6800000
Iter:    24 | Cost: 0.9743451 | Acc train: 0.4800000 | Acc validation: 0.5600000
Iter:    26 | Cost: 0.975711