# Q2B Hackathon 2024

## Track 1 Hackathon Prompt

---

### Overview:

Last year, researchers began exploring problems that are difficult to simulate with brute-force classical methods, thanks to utility-scale quantum computers available on IBM Quantum™ Platform. But those experiments required a deep understanding of not just the quantum processing unit (QPU), but also the various error suppression and mitigation methods required to scale each individual problem.

Now, as we embark on a mission to make utility more accessible, we’re launching a platform called the **Qiskit Functions Catalog**. Using the Qiskit Functions Catalog, developers can release Qiskit Functions that unlock those capabilities for enterprise developers and quantum computational scientists.

---

### Your Goal:

The Track 1 Hackathon is about creating a business use case proposal that is built on a feature of the Qiskit Functions or Qiskit addons. You may choose to use an existing feature or create a new function to build your use case on. No matter what, your proposal must include real code as much as possible.

This hackathon prompt is aimed at the near future: 1-2 years from now. After all, Qiskit Functions exist today and are already being used by industry professionals and researchers. Please do not create a business use case around aspects of quantum computing that are further in the future than 1-2 years (i.e., fault-tolerant devices, undiscovered algorithms, etc.).

You will present your business use case to a panel of judges. The presentation is a graded part of your overall project score, so prepare accordingly.

---

### Project Requirements:

Your business use case must include at least 3 of the following, but in general, the more you are able to include, the better your team will score:

- A clear and well-detailed business use case
- Real Qiskit code snippets or examples
- Direct references to existing Qiskit functions or addons
- Consideration for devices currently on the market, both from IBM Quantum and others
- Thoughts on the market or types of customers
- Thoughts on what type of work or research your use case would empower
- Details about current limitations
- Case studies or business reports which reinforce your use case
- Roadblocks encountered while putting this use case together, and how your team overcame them

---

### Judging Criteria

#### **Technical Aspects (30 total points):**
- How well implemented is the Qiskit Function or Qiskit addon being discussed?
- Is it new and novel?
- Can the architecture serve users at a reasonable scale?
- How accessible is the end-user application? Is it easy to use and intuitive for end users?

#### **Business Aspects (25 total points):**
- Does the use case include market fit, target audience, and business goals?
- Does it properly introduce the use case and explain the interaction between business and user?

#### **Usefulness and Complexity (25 total points):**
- How useful is the project and how well-designed is it?
- Can it be used in real-world business applications or serve as a valuable tool for individuals?
- Are there ways this project could be further built out and refined upon?

#### **Presentation (20 total points):**
- How well did the team present their project?
- Were they able to explain their decisions?
- Did the entire team have a chance to speak?
- Did they tell a cohesive story?
- Does the business use case make sense when presented?

# Quantum Search on a Semantic Hilbert Space
**Team F**  
Joy Hwang Woodworth, Alex Vargas, Khouloud Alkhammassi

Install dependencies

In [1]:
# # First time downloads
# %pip install huggingface_hub
# %pip install datasets
# %pip install nltk
# nltk.download('averaged_perceptron_tagger')
# nltk.download('wordnet')
# nltk.download('omw-1.4')

In [2]:
import re
from datasets import load_dataset
from IPython.display import display, Latex
from datetime import datetime
from shss import SHS, SHS_Search_Custom, SHS_Search_Native, Corpus, SemanticHilbertSpace, SHSQuery, SemanticHilbertSpaceSearchCustom, SemanticHilbertSpaceSearchNative
from shss.semnet import *
from shss.utils import *

## 1. Load Data
### Load corpus dataset

In [3]:
DATASET = load_dataset('community-datasets/generics_kb', 'generics_kb_best')

### Load semnet

In [4]:
semnet = meta_semnet()

Semnet size: 171580
Converted semnet size: 75533
Semnet size: 423249
Converted semnet size: 383095
Semnet created with 440095 words


In [5]:
# semnet sample for reference
semnet_mini = { k:v for k,v in list(semnet.items())[110000:110005] }
print(semnet_mini)

{('dorymenia', 'n'): {'mollusk', 'worm', 'shell'}, ('dorymetaecus', 'n'): {'island', 'spider'}, ('dorymyrmex', 'n'): {'cone', 'subfamily', 'ant'}, ('doryonychus', 'n'): {'long', 'island'}, ('doryopteris', 'n'): {'fern', 'family', 'tropical'}}


***

## 2. Process dataset

### Filter dataset and extract random sample

In [6]:
# Filter for quality and length
filtered = DATASET.filter( lambda row: row['score'] > 0.6 and len(row['generic_sentence']) <= 40 )

In [7]:
# #### OPTIONAL FILTERS ####
# # Filter for specific word
# word = 'ship'
# def contains(word, sentence):
#     reg = '\\b' + word
#     m = re.search(reg, sentence)
#     return True if m else False
# filtered = DATASET.filter(lambda row: contains(word, row['generic_sentence']))

# # Filter for sentence length
# filtered = DATASET.filter( lambda row: len(row['generic_sentence']) <= 100 )

In [8]:
shuf = DATASET.shuffle()
sample = shuf['train'][0:5]
sentences = sample['generic_sentence']

# Print randomly selected sentences
print('DID\tTEXT')
for i,s in enumerate(sentences):
    print(f'{i+1}\t{s}')

DID	TEXT
1	Short fasts are usually safe for people in good health.
2	Agricultural fertilizers are the main source of nutrient pollution in downstate Illinois.
3	Human wastes are an extensively used resource in many parts of the world.
4	Obedience is the proof of love Love proves itself through actions.
5	Courtship varies between different birds and happens during migration for birds that migrate.


### Process sample
#### Convert sample to corpus data structure

In [9]:
corpus = Corpus(sentences, semnet)

#### Create Semantic Hilbert Space
Construct a Semantic Hilber Space (SHS) based on words in the corpus.

In [10]:
shs = SemanticHilbertSpace(corpus, semnet)
ts = datetime.now().strftime('%m_%d_%H%M%S')
save_shs_data(sentences, corpus, shs, ts)   # writes SHS corpus to file

***

## 3. Experiment: Query SHS Corpus for Word

### Construct queries

In [11]:
target_word = 'face'
target_query = SHSQuery(target_word, shs, semnet)
print(f'Query "{target_query.query}" consists of sememes {target_query.sememes} with target vector {target_query.state}')

Query "face" consists of sememes {'person', 'form', 'object', 'feeling', 'human', 'work'} with target vector {202, 11, 140, 114, 51, 21}


### Create Semantic Hilbert Space Search (SHSS) Circuits

In [12]:
# QUERY
shss = SemanticHilbertSpaceSearchCustom(shs.num_sememes)
shss.set_query(shs.corpus, target_query.state)

In [13]:
print('Sample circuit for query')
shss.circuits[0].decompose().draw(fold=-1)

Sample circuit for query


### Run program

In [14]:
NUM_COUNTS = 25		# show top x hits
SHOTS = 1000
results_custom = shss.run(shots=SHOTS, backend='qasm_simulator')

Backend: qasm_simulator | Num qubits: 8 | Num iterations: 11
DOCUMENT 1: transpiling circuit
DOCUMENT 1: running simulation
DOCUMENT 2: transpiling circuit
DOCUMENT 2: running simulation
DOCUMENT 3: transpiling circuit
DOCUMENT 3: running simulation
DOCUMENT 4: transpiling circuit
DOCUMENT 4: running simulation
DOCUMENT 5: transpiling circuit
DOCUMENT 5: running simulation


#### Print Results

In [15]:
print(f'Expected results for query "{target_query.query}":\n')
print_expected(shs.corpus, target_query.state)

print(f'\nExperiment results for custom gates - query "{target_query.query}":\n')
dc = get_decimal_counts(results_custom)
answers = get_answers(dc, SHOTS, len(target_query.state))
print_answers(answers)

Expected results for query "face":

DID    MATCHES
1      114
2      114, 11, 140
3      202, 11, 114
4      114, 11, 140, 21
5      51, 11

Experiment results for custom gates - query "face":

DOC 1:  []
DOC 2:  [140]
DOC 3:  [202]
DOC 4:  [140]
DOC 5:  []


In [16]:
plot_exp_counts(dec_counts=dc, top_num=NUM_COUNTS)

***

# 4. Experiment on Qiskit Native Gates
## Implement QSA without custom unitary gates

In [17]:
shss_native = SemanticHilbertSpaceSearchNative(shs.num_sememes)
shss_native.set_query(shs.corpus, target_query.state)
print('Sample circuit for sememe query')
shss_native.circuits[0].draw(fold=-1)

Sample circuit for sememe query


In [18]:
results_native = shss_native.run(shots=SHOTS, backend='qasm_simulator')

Backend: qasm_simulator | Num qubits: 8
DOCUMENT 1: transpiling circuit
DOCUMENT 1: running simulation
DOCUMENT 2: transpiling circuit
DOCUMENT 2: running simulation
DOCUMENT 3: transpiling circuit
DOCUMENT 3: running simulation
DOCUMENT 4: transpiling circuit
DOCUMENT 4: running simulation
DOCUMENT 5: transpiling circuit
DOCUMENT 5: running simulation


In [19]:
print(f'Expected results for query "{target_query.query}":\n')
print_expected(shs.corpus, target_query.state)

print(f'\nExperiment results for native gates - query "{target_query.query}":\n')
dc = get_decimal_counts(results_native)
answers = get_answers_top(dc, SHOTS)
print_answers(answers)

Expected results for query "face":

DID    MATCHES
1      114
2      114, 11, 140
3      202, 11, 114
4      114, 11, 140, 21
5      51, 11

Experiment results for native gates - query "face":

DOC 1:  {78, 87}
DOC 2:  {103}
DOC 3:  {100}
DOC 4:  {9, 1}
DOC 5:  {53}


In [20]:
plot_exp_counts(dec_counts=dc, top_num=NUM_COUNTS)

### Qiskit Functions 
* The code segment provided demonstrates one possibility of how how to integrate the IBM Runtime Circuit Function into the Semantic Hilbert Space (SHS) search process. It could allow for advanced capabilities such as automatic circuit optimization and error mitigation during the retrieval of semantic matches.

In the shown code, the prepared circuits (circuits_to_run) are associated with observables (observables), forming “publications” (pubs) that represent measurement tasks yet we receive a 404 when submitting the job, I believe due to malformed PUB. By leveraging this approach, the semantic search can be performed at scale, benefiting from the runtime environment’s services and the underlying quantum hardware.

```from qiskit_ibm_runtime import QiskitRuntimeService, Session, Sampler
from qiskit.quantum_info import SparsePauliOp
from qiskit_ibm_catalog import QiskitFunctionsCatalog
 
catalog = QiskitFunctionsCatalog()
function = catalog.load("ibm/circuit-function")

service = QiskitRuntimeService()
backend = service.least_busy(operational=True, simulator=False)

circuits_to_run = shss.circuits  # These are QuantumCircuit objects

# Define observables to measure expectation values (Optional)
n_qubits = shss.Q
observables = [SparsePauliOp.from_sparse_list([("Z", [i], 1.0)], num_qubits=n_qubits) for i in range(n_qubits)]

pubs = [(circuits_to_run, observables)]

job = function.run(
    backend_name=backend.name,
    pubs=pubs,
)```

### Qiskit Addons 

* Unable to implement into the but we believe that beyond the circuit functions, qiskit addons could further improve the SHS approach. 
* For instance, the AQC Add-On can be applied to optimize parameters or iteration counts within the Grover-like search procedure, ensuring higher fidelity results. 
* Similarly, the OBP Add-On introduces gradient-based refinement of operators, allowing continuous improvement of the semantic encodings over time.


# 5. Experiment on Cloud Compute Resources
## Run experiments on IonQ and IBM simulators and devices

#### Run on IonQ Cloud Simulator

In [22]:
# results = shss.run(shots=SHOTS, backend='ionq_simulator')

In [23]:
# dc = get_decimal_counts(results)
# answers  = get_answers_top(dc, SHOTS)
# print_answers(answers)
# plot_exp_counts(dec_counts=dc, top_num=NUM_COUNTS, answers=answers)

#### Run on IonQ Quantum Device

In [24]:
# results = term_search.run(shots=SHOTS, backend='ionq_qpu')

In [25]:
# dc = get_decimal_counts(results)
# answers  = get_answers_top(dc, SHOTS)
# print_answers(answers)
# plot_exp_counts(dec_counts=dc, top_num=NUM_COUNTS, answers=answers)

#### Run on IBM Simulator

In [26]:
# results = term_search.run(shots=SHOTS, backend='ibmq_qasm_simulator')

In [27]:
# dc = get_decimal_counts(results)
# answers  = get_answers_top(dc, SHOTS)
# print_answers(answers)
# plot_exp_counts(dec_counts=dc, top_num=NUM_COUNTS, answers=answers)