# Helstrom Quantum Centroid Classifier

This binary classifieris based on the concept of distinguishability between quantum states. It acts on density matrices—called density patterns—which is the quantum encoding of classical patterns of a dataset. The input vectors are encoded into quantum densities using either amplitude encoding or the inverse of the standard stereographic projection encoding method and the HQC model is trained using the encoded values.

Ref[1]:Sergioli G, Giuntini R, Freytes H (2019) A new quantum approach to binary classification.
    PLoS ONE 14(5): e0216224. https://doi.org/10.1371/journal.pone.0216224

Ref[2] : https://helstrom-quantum-centroid-classifier.readthedocs.io/en/latest/user_guide.html

In [1]:
import tensorflow as tf
import os, glob, cv2
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

In [2]:
#Install and upgrade hqc 
#!pip install hqc
#!pip install hqc --upgrade

In [3]:
import hqc

In [4]:
model1 = hqc.HQC(rescale=1.0, encoding='stereo', n_copies=2, class_wgt='equi', n_jobs=4, n_splits=2)

In [5]:
#To get your HQC classification model, fit the features matrix X and binary target vector y
#model1.fit(X,y)

In [6]:
#Make X and y

In [7]:
SEED = 42
IMG_HEIGHT =28
IMG_WIDTH = 28

# eye_diseases_classification dataset
IMG_ROOT = 'D:\Womanium2023\GlobalQuantumProject\Datasets\eye_diseases_classification\Proc\\'
IMG_DIR = [IMG_ROOT+'normal',
           IMG_ROOT+'cataract']

In [8]:
IMG_DIR

['D:\\Womanium2023\\GlobalQuantumProject\\Datasets\\eye_diseases_classification\\Proc\\normal',
 'D:\\Womanium2023\\GlobalQuantumProject\\Datasets\\eye_diseases_classification\\Proc\\cataract']

In [9]:
df = pd.DataFrame(0,
                  columns=['paths',
                           'cataract'],
                  index=range(2500))

filepaths = glob.glob(IMG_ROOT + '*/*')

for i, filepath in enumerate(filepaths):
    filepath = os.path.split(filepath)
    df.iloc[i, 0] = filepath[0] + '/' + filepath[1]

    if filepath[0] == IMG_DIR[0]:    # normal
            df.iloc[i, 1] = 0
    elif filepath[0] == IMG_DIR[1]:  # cataract
            df.iloc[i, 1] = 1

In [12]:
df = df[df.paths !=0]

In [13]:
print('Number of normal and cataract images')
print(df['cataract'].value_counts())

Number of normal and cataract images
cataract
0    1074
1     938
Name: count, dtype: int64


In [14]:
train_df, test_df = train_test_split(df,
                                     test_size=0.2,
                                     random_state=SEED,
                                     stratify=df['cataract'])

train_df, val_df = train_test_split(train_df,
                                    test_size=0.15,
                                    random_state=SEED,
                                    stratify=train_df['cataract'])

In [15]:
from tqdm import tqdm

def create_datasets(df):
    imgs = []

    for path in tqdm(df['paths']):
        #print(path)
        img = cv2.imread(path,0)
        img = cv2.resize(img,(IMG_HEIGHT,IMG_WIDTH))
        img=img.flatten() #convert 2d to 1d array
        imgs.append(img)

    imgs = np.array(imgs, dtype='float32')

    labels = df.cataract
    return imgs, labels


train_imgs, train_labels = create_datasets(train_df)
val_imgs, val_labels = create_datasets(val_df)
test_imgs, test_labels = create_datasets(test_df)

train_labels = train_labels.astype(np.uint8)
val_labels = val_labels.astype(np.uint8)
test_labels = test_labels.astype(np.uint8)

#train_imgs = train_imgs / 255.0
#val_imgs = val_imgs / 255.0
#test_imgs = test_imgs / 255.0



100%|████████████████████████████████████████████████████████████████████████████| 1367/1367 [00:00<00:00, 3081.67it/s]
100%|██████████████████████████████████████████████████████████████████████████████| 242/242 [00:00<00:00, 3612.43it/s]
100%|██████████████████████████████████████████████████████████████████████████████| 403/403 [00:00<00:00, 3636.73it/s]


In [35]:
print('trainnumber =', train_imgs.shape,'testnumber = ', test_imgs.shape,'valnumber =', val_imgs.shape)

trainnumber = (1367, 784) testnumber =  (403, 784) valnumber = (242, 784)


In [61]:
from sklearn.decomposition import KernelPCA
from sklearn import preprocessing

kernel_pca = KernelPCA(
    n_components=6, kernel="rbf", gamma=None, fit_inverse_transform=True, alpha=0.1
)

# rescale the data so it has unit standard deviation and zero mean.Fit using full set of train data
scaler = preprocessing.StandardScaler().fit(train_imgs)
train_scaled = scaler.transform(train_imgs)
val_scaled = scaler.transform(val_imgs)
test_scaled = scaler.transform(test_imgs)

In [62]:
# Reduce dataset size if required
n_train = 1367
n_test = 403
train_images = train_scaled[:n_train]
train_labels = train_labels[:n_train]
test_images = test_scaled[:n_test]
test_labels = test_labels[:n_test]

In [63]:
train_images.dtype

dtype('float32')

In [64]:
#apply kernel pca for reducing dimensionality 
X_train= kernel_pca.fit(train_images).transform(train_images).astype(np.float32)
testfit = kernel_pca.transform(test_images).astype(np.float32)
valfit = kernel_pca.transform(val_scaled).astype(np.float32)

In [65]:
model1.fit(X_train, train_labels)

In [None]:
#model1.predict_proba(X_train)

In [None]:
#model1.predict(testfit)

In [66]:
model1.score(testfit, test_labels)

0.7419354838709677

In [67]:
model1.score(valfit, val_labels)

0.7768595041322314

For best hyperparameter search, use GridSearchCV
from sklearn.model_selection import GridSearchCV

param_grid = { 'rescale':[0.5,1,1.5],'encoding':['amplit', 'stereo'], 'n_copies':[1, 2], 'class_wgt':['equi', 'weighted']}
models = GridSearchCV(hqc.HQC(), param_grid).fit(X_train,train_labels)

pd.DataFrame(models.cv_results_)
models.best_score_
models.best_params_