## A Quantum Variational Classifier

We will apply a Variational Quantum Classifier (VQC) to tackle the Titanic challenge. Although VQCs are generally not expected to outperform classical models on classical datasets, there are certain problems where quantum machine learning models are believed to offer an advantage over classical approaches. Regardless, this provides a valuable opportunity to explore quantum machine learning and understand how it can be applied to real-world use cases.

For a detailed course on quantum machine learning (which includes VQC methods) you can refer to [Qiskit's textbook](https://qiskit.org/learn/course/machine-learning-course/)

In [None]:
!pip install --root-user-action=ignore -q qiskit;
!pip install --root-user-action=ignore -q qiskit-machine-learning;
!pip install --root-user-action=ignore -q qiskit-ibm-runtime
!pip install --root-user-action=ignore -q pylatexenc;

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import warnings
warnings.filterwarnings('ignore')

### Load the train and test datasets

In [None]:
train = pd.read_csv('../input/titanic/train.csv')
test = pd.read_csv("../input/titanic/test.csv")

### Explore the dataset

In [None]:
train.info()

In [None]:
# Replace female/male with 0/1
train['Sex'].replace(['female','male'], [0,1], inplace=True)
test['Sex'].replace(['female','male'], [0,1], inplace=True)

In [None]:
men, women = train.value_counts(subset=['Sex'])
print(f'No. of men onboard: {men}')
print(f'No. of women onboard: {women}')

In [None]:
survivors = train[train.Survived == 1]
men_perc = (survivors.Sex == 1).sum() / men * 100
women_perc = (survivors.Sex == 0).sum() / women * 100
print(f'Percentage of men that survived: {round(men_perc, 2)}%')
print(f'Percentage of women that survived: {round(women_perc, 2)}%')

In [None]:
train.corrwith(train.Survived)

### Extract training features

Here we consider only the *Ticket class*, *Sex*, *Age*, *Number of siblings and parents on board* as well as the *Fare paid* as features to train our quantum classifier. We also fill the `NaN` values in the 'Age' Column with the mean value of the column.

In [None]:
train_features = train[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']]
train_features.Age.fillna(train_features.Age.mean(), inplace=True)
train_features.Age = train_features.Age.astype(int)
train_labels = train['Survived']

We also normalize the values so everything falls in the range $[0,1]$.

In [None]:
from sklearn.preprocessing import MinMaxScaler

train_features = MinMaxScaler().fit_transform(train_features)

### Load the data to the circuit

We use a `ZZFeatureMap` to encode our data to the quantum circuit. There are also other ways we could use such as `ZFeatureMap`, `PauliFeatureMap` or even a custom feature map

In [None]:
from qiskit.circuit.library import ZZFeatureMap

num_features = train_features.shape[1]
feature_map = ZZFeatureMap(feature_dimension=num_features, reps=1)
feature_map.decompose().draw(output="mpl", fold=20)

To complete a VQC circuit we need an ansatz which is a parametric circuit with parameters that are going to be optimized later to achieve high accuracy. Here we use Qiskit's built-in `RealAmplitutes` ansatz. Again, there are plenty of other options that may or may not yield better results.

In [None]:
from qiskit.circuit.library import RealAmplitudes

ansatz = RealAmplitudes(num_qubits=num_features, reps=2)
ansatz.decompose().draw(output="mpl", fold=20)

### Train the VQC

Finally, we have to train our VQC circuit. This is done by optimizing the parameters so that the predictions on the training data are as accurate as possible. Here we use the gradient-free COBYLA optimizer for a maximum of 100 iterations. While increasing the number of iteretions and/or the depth of the circuit can boost the accuracy, it will also lead to longer training times.

In [None]:
from qiskit.algorithms.optimizers import COBYLA

optimizer = COBYLA(maxiter=100)

In [None]:
from qiskit.primitives import Sampler

sampler = Sampler(options={"shots": 512})

In [None]:
from matplotlib import pyplot as plt
from IPython.display import clear_output

objective_func_vals = []
plt.rcParams["figure.figsize"] = (12, 6)


def callback_graph(weights, obj_func_eval):
    clear_output(wait=True)
    objective_func_vals.append(obj_func_eval)
    plt.title("Objective function value against iteration")
    plt.xlabel("Iteration")
    plt.ylabel("Objective function value")
    plt.plot(range(len(objective_func_vals)), objective_func_vals)
    plt.show()

In [None]:
import time
from qiskit_machine_learning.algorithms.classifiers import VQC

vqc = VQC(
    sampler=sampler,
    feature_map=feature_map,
    ansatz=ansatz,
    optimizer=optimizer,
    callback=callback_graph,
)

# clear objective value history
objective_func_vals = []

start = time.time()
vqc.fit(train_features, train_labels.values)
elapsed = time.time() - start

print(f"Training time: {round(elapsed)} seconds")

In [None]:
train_acc = vqc.score(train_features, train_labels)
print(f'Accuracy on the training data: {round(train_acc, 2) * 100}%')

### Make predictions

With our circuit optimized, we can now make predictions on the test data. We first apply the same transformations to our training dataset (select appropriate features, fill `NaN`s and normalize). Then we use Qiskit's built-in `VQC.predict` method to make predictions on the test data.

In [None]:
test_features = test[['Pclass', 'Sex', 'Age', 'SibSp', 'Parch', 'Fare']]
test_features.Age.fillna(test_features.Age.mean(), inplace=True)
test_features.Age = test_features.Age.astype(int)
test_features.Fare.fillna(test_features.Fare.mean(), inplace=True)

In [None]:
test_features = MinMaxScaler().fit_transform(test_features)

In [None]:
y = vqc.predict(test_features) # predictions vector

We finally write our predictions to a new dataset and export it to csv. We are now ready to submit our predictions!

In [None]:
output = pd.DataFrame({"PassengerId": test.PassengerId, "Survived": y})
output

In [None]:
output.to_csv('submission.csv', index=False)