Use Elastic Eland & Elasticsearch-DSL python libraries and IBM Qiskit

Runs on a real quantum computer

Runs both Qiskit's qSVM & Scikit's classical SVM with data accessed via Eland 

Eland and Dataframes do not have a SVM yet

Classifying nyc-restaurants dataset sitting on Elasticsearch in a qSVM running on Qiskit

An example of a classification problem that requires a feature map for which computing the kernel is not efficient classically.

This means that the required computational resources are expected to scale exponentially with the size of the problem.

We show how this can be solved in a quantum processor by direct estimation of the kernel in the feature space. 

In [None]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
#"regular" qiskit tools
from qiskit import BasicAer
from qiskit import IBMQ
from qiskit.utils import QuantumInstance, algorithm_globals 

In [None]:
#special svm tools
from qiskit.circuit.library import ZZFeatureMap
from qiskit_machine_learning.algorithms import QSVC
from qiskit_machine_learning.kernels import QuantumKernel

In [None]:
from sklearn.svm import SVC
from sklearn.cluster import SpectralClustering
from sklearn.metrics import normalized_mutual_info_score

In [2]:
import eland as ed
from elasticsearch import Elasticsearch
from elasticsearch_dsl import A  #aggregation
from elasticsearch_dsl import Search

In [3]:
#Load the Elastic Clould ID and password
#for connecting to Elasticsearch Cloud
with open("elastic_cloud_password.txt") as f:
          ELASTIC_CLOUD_PSSWD = f.read()
with open("elastic_cloud_id.txt") as g: ELASTIC_CLOUD_ID = g.read()
#Connect to an Elastic Cloud Instance
es = Elasticsearch(cloud_id=ELASTIC_CLOUD_ID,
                   http_auth=("elastic",ELASTIC_CLOUD_PSSWD),
                   sniff_on_start=True,
                   sniff_on_connection_fail=True,
                   sniffer_timeout=60)

In [None]:
#Function for pretty_printing JSON
def json_pretty(x):
    import json
    print(json.dumps(x, indent=2, sort_keys=True))

In [None]:
#little blurb for Jupyter Notebook to fill width of the browser
#from IPython.core.display import  display, HTML
from IPython.display import  display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

In [None]:
###some cat commands on Elasticsaerch
#https://elasticsearch-py.readthedocs.io/en/v7.16.2/api.html#elasticsearch
#es.cat.nodes(h="name").split("\n")
#es.cat.indices(v=True).split("\n")
#es.cat.indices(s="index",h="index").split("\n")

In [None]:
#es.search(index="nyc_restaurants")
es.count(index="nyc_restaurants")

My own dataset is  nyc_restaurants.  

Probably came from the link here: https://github.com/elastic/examples/tree/master/Exploring%20Public%20Datasets/nyc_restaurants 

Classify nyc_restaurants by the "grade" fields (A,B,C,P,G,N,Z)

Need to munge nyc_restaurants data to look like the "ad_hoc" dataset that was used by the qiskit example 

In [None]:
#terms aggregation defined
terms_agg = A('terms', field='grade')

In [None]:
#search that applies the terms agg, a
#s = Search().params(size=0).aggs.bucket('grade_terms',a)
s = Search(using=es,index='nyc_restaurants').params(size=0)

In [None]:
# add an aggregation to the search.  Agg the field 'grade_terms' into buckets.
s.aggs.bucket('grade_terms', terms_agg)

In [None]:
response = s.execute()

In [None]:
print(response)

In [None]:
# What are the grades?
json_pretty(response.to_dict())

So the possible grades are A,B,C,P,Z,N,G

Elasticsearch references
https://elasticsearch-dsl.readthedocs.io/en/latest/search_dsl.html
https://www.programcreek.com/python/example/117133/elasticsearch_dsl.A
https://kb.objectrocket.com/elasticsearch/how-to-use-the-search-api-for-the-python-elasticsearch-client-265

Need to get our training and testing datasets ready.  After that we can set up the QuantumKernel class to calculate a kernel matrix using the ZZFeatureMap and use the BasicAer qasm_simulator using 1024 shots.

The scikit-learn svc algorithm allows us to define a custom kernel in two ways: by providing the kernel as a callable function or by precomputing the kernel matrix. We can do either of these using the QuantumKernel class in qiskit.

In [None]:
!curl -XGET "https://705e35228a67459698fbdf4618a84821.us-central1.gcp.cloud.es.io:9243/nyc_restaurants/_search?pretty" -u elastic:{ELASTIC_CLOUD_PSSWD}

In [None]:
#how many fields?
!curl -s -XGET "https://705e35228a67459698fbdf4618a84821.us-central1.gcp.cloud.es.io:9243/nyc_restaurants/_mapping?pretty" -u elastic:{ELASTIC_CLOUD_PSSWD} | grep type | wc -l

In [None]:
## ZZFeatureMap is the function that projects the data into additional dimensions
## https://qiskit.org/documentation/stubs/qiskit.circuit.library.ZZFeatureMap.html
nyc_feature_map = ZZFeatureMap(feature_dimension=2, reps=2, entanglement="linear")

In [None]:
seed = 12345
algorithm_globals.random_seed = seed
#print(algorithm_globals.random_seed)
nyc_backend = QuantumInstance(BasicAer.get_backend("qasm_simulator"), shots=1024, seed_simulator=seed, seed_transpiler=seed)

In [None]:
print(seed)

In [None]:
print(algorithm_globals.random_seed)

<Data prep:  

train_features in the qiskit sample adhoc dataset are pairs of floats in a list (a list of 40 lists).
I see in nyc there are pairs of floats used for the "location" field. that'll do.

By searching Elasticsearch and bringing back _source and using filter_path to get to the location field I'll have a json of "location": \[ float1 float2 ]

Maybe from there i can read that into a python list with json library or something

In [None]:
jpairs=es.search(index='nyc_restaurants',filter_path=['hits.hits._source.location'],size=50)

In [None]:
jcount=es.count(index='nyc_restaurants')
print(jcount)

In [None]:
i=-1
training_features = []

In [None]:
for each in jpairs['hits']['hits']:
    i=i+1
    w = [float(x) for x in jpairs['hits']['hits'][i]['_source']['location'].split(',')]
    training_features.append(w)

ok, el is a lsit of list. each list is a pair of floats, just like train_features in adhoc.

In [None]:
print(training_features)

In [None]:
len(training_features)

train labels is a simpler vector of each entry. no commas. integers. since i forced el to be 
100 entries . the values are the 'grade' field value. so need to search for grade and do same thing.
the grades are letters. probably need to convert to numbers. o
i don't see any "D" or "F" in the list of 100
just found a "P"....  and "Z"

realizing later that the train labels are the matching labels for the train_features - duh. so need the same amount

In [None]:
tl_pairs=es.search(index='nyc_restaurants',filter_path=['hits.hits._source.grade'],size=50)

In [None]:
json_pretty(tl_pairs)

In [None]:
def flatten_list(_2d_list):
    flat_list = []
    # Iterate through the outer list
    for element in _2d_list:
        if type(element) is list:
            # If the element is of type list, iterate through the sublist
            for item in element:
                flat_list.append(item)
        else:
            flat_list.append(element)
    return flat_list

In [None]:
i=-1
el2 = []

In [None]:
for each in tl_pairs['hits']['hits']:
    i=i+1
    w = [''.join(ele) for ele in tl_pairs['hits']['hits'][i]['_source']['grade']]
    el2.append(w)

In [None]:
possibleGrades = ['A','B','C','P','Z','N','G']

In [None]:
from sklearn import preprocessing

In [None]:
le = preprocessing.LabelEncoder()

In [None]:
el_2 = flatten_list(el2)

\\> end of data prep

follow https://scikit-learn.org/stable/modules/preprocessing_targets.html#preprocessing-targets

In [None]:
attempt=le.fit(possibleGrades)

In [None]:
le.classes_

In [None]:
training_labels=le.transform(el_2)

In [None]:
len(training_labels)

so training_labels can act as a good train_labels

next is test_features
It is just like train_features, just a different bunch of samples. maybe just 40. 

In [None]:
#I'll use "from & size" to grab a different bunch
tpairs=es.search(index='nyc_restaurants',filter_path=['hits.hits._source.location'],size=40,from_=1000)

In [None]:
i=-1
test_features=[]

In [None]:
for each in tpairs['hits']['hits']:
    i=i+1
    w = [float(x) for x in tpairs['hits']['hits'][i]['_source']['location'].split(',')]
    test_features.append(w)

In [None]:
len(test_features)

test_features is a list of 40 lists and can act as our test features

Next is test_labels
very much like train_labels, just a different bunch. el4/test features is  40 . so need 40 test_labels

In [None]:
test_pairs=es.search(index='nyc_restaurants',filter_path=['hits.hits._source.grade'],size=40,from_=1000)

In [None]:
i=-1
el6=[]

In [None]:
for each in test_pairs['hits']['hits']:
    i=i+1
    w = [''.join(ele) for ele in test_pairs['hits']['hits'][i]['_source']['grade']]
    el6.append(w)

In [None]:
el7 = flatten_list(el6)

In [None]:
test_labels=le.transform(el7)

In [None]:
len(test_labels)

test_labels is my test_labels

In [None]:
IBMQ.load_account()

In [None]:
IBMQ.providers()

In [None]:
provider = IBMQ.get_provider('ibm-q')
provider.backends()

In [None]:
#to run on a real quantum computer...
from qiskit.providers.ibmq import least_busy
#... find the least busy backend q machines
device = least_busy(provider.backends(filters=lambda x: x.configuration().n_qubits >= 3 and 
                                   not x.configuration().simulator and x.status().operational==True))
print("If I use a real quantum computer, going to run on current least busy device: ", device)

In [None]:
nyc_backend2 = QuantumInstance(backend=device, shots=1024, seed_simulator=seed, seed_transpiler=seed)
#job = execute(grover_circuit, backend=device, shots=1024, optimization_level=3)

The scikit-learn svc algorithm allows us to define a custom kernel in two ways: by providing the kernel as a callable function or by precomputing the kernel matrix. We can do either of these using the QuantumKernel class in qiskit.

The following code gives the kernel as a callable function:

In [None]:
#nyc_backend is a simulator while nyc_backend2 is a real quantum computer
nyc_kernel = QuantumKernel(feature_map=nyc_feature_map, quantum_instance=nyc_backend)

Draw the circuit

In [None]:
type(training_features[0:0])

In [None]:
training_features[1]

In [None]:
type(training_features[0:2])

In [None]:
[item for item in training_features[0:20] ]

In [None]:
zz_circuit = nyc_kernel.construct_circuit(training_features[0],training_features[1])

In [None]:
zz_circuit.decompose().decompose().decompose().draw(output='mpl')

In [None]:
dir(nyc_kernel)

This code below  completes with 500 training samples and 40 test samples

In [None]:
nyc_svc = SVC(kernel=nyc_kernel.evaluate)

In [None]:
nyc_svc.fit(training_features, training_labels)

In [None]:
nyc_score = nyc_svc.score(test_features, test_labels)

In [None]:
print(f"Callable kernel classification test score: {nyc_score}")

In [None]:
print(training_features)

In [None]:
#Callable kernel classification test score: 1.0
#The following code precomputes and plots the training and testing kernel matrices before providing them to the scikit-learn svc algorithm:

nyc_matrix_train = nyc_kernel.evaluate(x_vec=training_features)
nyc_matrix_test = nyc_kernel.evaluate(x_vec=test_features, y_vec=training_features)

fig, axs = plt.subplots(1, 2, figsize=(10, 5))
axs[0].imshow(
    np.asmatrix(nyc_matrix_train), interpolation="nearest", origin="upper", cmap="Blues"
)
axs[0].set_title("NYC_Restaurant training kernel matrix")
axs[1].imshow(np.asmatrix(nyc_matrix_test), interpolation="nearest", origin="upper", cmap="Reds")
axs[1].set_title("NYC_Restaurant testing kernel matrix")
plt.show()

nyc_svc = SVC(kernel="precomputed")

In [None]:
nyc_svc.fit(nyc_matrix_train, training_labels)

In [None]:
nyc_score = nyc_svc.score(nyc_matrix_test, test_labels)
print(f"Precomputed kernel classification test score: {nyc_score}")

In [None]:
qsvc = QSVC(quantum_kernel=nyc_kernel)
qsvc.fit(training_features, training_labels)
qsvc_score = qsvc.score(test_features, test_labels)

print(f"QSVC classification test score: {qsvc_score}")

This is supervised learning where the kernel is calculated in the training phase and the support vectors obtained and again in a test or classification phase where new unlabeled data is classified according to the solution found in the training phase.