# Titanic example

This notebook illustrate a toy example, with three passengers, and uses a squared distance classifier, inspired by Machine Learning with Quantum Computers by Schuld and Petruccione, to predict if a passenger will survive the 2021 Titanic disaster.

Two of the three passengers are the training set.  One survived and one died. The aim is to predict the fate of the third passenger in the mini-test set.  

Data is given for all three passengers consisting of a cabin number, assumed to be between 1 and 2,500, and a ticket price, assumed to be between £1 and £10,000, stored in a vector $\bf{x_m}$ for the training data, and $\bf{x}$ for the test data.  A nearest neighbour classifier is used to classify the third passenger, with $$p(y=1|\bf{x}) = \frac{1}{\chi} \frac{1}{M_1} \sum_{m|y^m = 1}\left( 1 - \frac{1}{c}\|\bf{x} - \bf{x^m} \|^2\right)$$

where $M_1$ is the sum over all training inputs labeled with $y_m$ = 1, $c$ is an arbitary constant, and $\chi$ is a normalisation factor to ensure $p(y = 0|\bf{x}) + p(y = 1|\bf{x}) = 1$

Import modules needed:

In [None]:
from pathlib import Path
import numpy as np
import math
import pennylane as qml
from functools import partial

HOME_DIR = '..'
BASE_DIR = Path(HOME_DIR)

import sys
sys.path.append(HOME_DIR)

from config.config import DATA, SHOTS, C
PROJECT = '01_titanic'
FOLDER = 'processed'
FILE = 'processed_data.csv'

from src.modules.data_helper_functions import (read_csv, 
                                              clean_and_print_data,
                                              find_gamma_m, 
                                              normalise,
                                              find_norm,
                                              find_test_data,
                                              pre_process_feature_vector,
                                              prepare_quantum_feature_vector,
                                              normalise_feature_vector,
                                              )

from src.modules.graph_functions import plot_simple_scatter

from src.modules.quantum_helper_functions import (make_wires,
                                                  my_amplitude_encoding,
                                                  )

ImportError: cannot import name 'my_amplitude_encoding' from 'src.modules.data_helper_functions' (c:\Users\DanielGoldsmith\python_notebooks\qml\notebooks\..\src\modules\data_helper_functions.py)

Load the data, clean and print

In [None]:
file_path = BASE_DIR.joinpath(DATA).joinpath(PROJECT).joinpath(FOLDER).joinpath(FILE)
print(f'Data will be loaded from {file_path}')
data = read_csv(file_path)
print(f'\nThe raw data is:')
for items in data:
    print(items)

labels, x1, x2, y = clean_and_print_data(data)

Plot data

In [None]:
plot_simple_scatter(x1, x2, labels, y)

## Calculate the square distance classifier:

Normalise and print the data:

In [None]:
x1, x2 = normalise(x1, x2)
print(f'\nThe normalised feature values are:')
print(f'x1={[f'{v:.3f}' for v in x1]}')
print(f'x2={[f'{v:.3f}' for v in x2]}')
print(f'y={y}')
plot_simple_scatter(x1, x2, labels, y)

Find the test data:

In [None]:
x = find_test_data(x1, x2, y)

Calculate $p(y = 0|\bf{x})$ and $p(y = 1|\bf{x})$, and print results

In [None]:
M0, M1, p0, p1 = 0, 0, 0, 0
for i, item in enumerate(y):
    x_m = np.array([x1[i],x2[i]])
    print(f'\nProcessing training point {x_m} with label {item}, {i=}:')
    if item == 1:
        M1 += 1
        p1 += find_gamma_m(x, x_m, C)
        print(f'For point {x_m} gamma_m with a passenger who survived at point {x} is {p1:.3f}.')
    elif item == 0:
        M0 += 1
        p0 += find_gamma_m(x, x_m, C)
        print(f'For point {x_m} gamma_m with a passenger who died at point {x} is {p0:.3f}.')
    elif item != '':
        raise Exception('Value of y is {item} which is not allowed') 
    
p0, p1 = p0/M0, p1/M1 # find average value
#normalise
chi = p0 + p1
p0, p1 = p0/chi, p1/chi

print(f'The probability that the test passenger dies is {p0:.1%}.')
print(f'The probability that the test passenger survives is {p1:.1%}.')

if p1 > p0:
    print('The classifier predicts survival!')
else:
    print('The classifier predicts death!')

## Quantum classifer

Prepare the data ready to load into a quantum computer
- Add an extra copy of the features of Passsenger 3, and 
- tidy up y to be integer

In [None]:
x1, x2, y = pre_process_feature_vector(x1, x2, y)
alpha = prepare_quantum_feature_vector(x1, x2, y)
features= normalise_feature_vector(alpha)

Prepare a complete feature vector

In [None]:
norm = float(find_norm(features))
if norm != 1.0:
    raise Exception(f'Normalisation failed, norm = {norm}')

Find the number of qubits required.

In [None]:
n_qubits = math.ceil(math.log2(len(features)))
if n_qubits != math.log2(len(features)):
    raise Exception(f'The number of features must be a power of 2, not {len(features)}, n_qubits = {n_qubits}')
print(f'Number of qubits required = {n_qubits}')

In [None]:
my_wires = make_wires(n_qubits)
dev_unique_wires = qml.device('default.qubit', wires=my_wires)

In [None]:
@partial(qml.set_shots, shots=SHOTS)
@qml.qnode(dev_unique_wires)
def circuit(features):
    #qml.AmplitudeEmbedding(features, wires=my_wires, normalize=False)
    my_amplitude_encoding(features, wires=my_wires)

    # Apply Hadamard on the first qubit (wire 'q1')
    qml.Hadamard(wires='q1')

    # Post-select on measuring 0 on wire 'q1'
    m_0 = qml.measure('q1', postselect=0)

    return qml.probs(wires=['q4'])

In [None]:
result = circuit(features)
print(f'The result of the quantum circuit are {result}.')
print(f'\nThe results of the quantum classifier are:')
print(f'The probality of survival is {result[1]:.1%}, and non-survival is {result[0]:.1%}.')


In [None]:
qml.drawer.use_style("black_white")
fig, ax = qml.draw_mpl(circuit)(features)