## Step 2. Index the data in a vector database
Set up the marqo client. Make sure to activate the marqo server running in docker with the correct port (8882).

In [1]:
import marqo
import json

mq = marqo.Client(url='http://localhost:8882')
index_name = "book-index"

# Check if this index has been created, if so, delete
try:
    mq.index(index_name).delete()
except:
    pass

mq.create_index(index_name, model="hf/e5-base-v2")

# Load pre-cleaned data
with open(r'docs\QGenda Whitepaper_cleaned.json', 'r', encoding='utf-8') as file:
    data = json.load(file)

# Index to marqo
mq.index(index_name).add_documents(
    data,
    tensor_fields=["text"]
)

{'errors': False,
 'processingTimeMs': 5576.420437000024,
 'index_name': 'book-index',
 'items': [{'status': 200, '_id': 'b18a8f2f-7d74-453b-b05b-82439d5f674e'},
  {'status': 200, '_id': '5077d240-5ab5-4451-aa4a-6f67c5448dda'},
  {'status': 200, '_id': '299de6dd-131a-4093-bd7b-e5b999ed6e7a'},
  {'status': 200, '_id': '1acbd09a-f7cb-49f6-af59-0afac5738c3e'},
  {'status': 200, '_id': 'ef1110d9-64b6-47c2-8ae9-b52fcd4e1980'},
  {'status': 200, '_id': 'ff8a7277-a301-4020-ae8a-1d4bffe8325a'}]}

## Step 3. Set up search prompt and query with LLM
Observe the results of a basic tensor search. This should return the most related text to the question, based on a semantic search. 

In [2]:
question="What utilization gap do executives need to focus on?"

# Return just two results
results = mq.index("book-index").search(
    q=question,
    limit=2
)
print(results)

{'hits': [{'index': 1, '_id': '5077d240-5ab5-4451-aa4a-6f67c5448dda', 'text': 'Copyright © 2024 QGenda, LLC All rights reserved. As healthcare leaders reimagine patient access and care delivery in a post-pandemic world, there is an emerging story around the need to optimize clinic exam room utilization. Better utilization can improve operating efficiency, patient satisfaction, and revenue capture. The 2020 Porter Research study of 100 health system executive leaders identified the many challenges faced today with exam room scheduling, future expectations for optimizing exam rooms, and the impact that proper exam room scheduling can have on a health system\'s P&L. Increasing Scrutiny on Clinic Exam Rooms With the limited funds available for health system capital expenditures, executives must optimize their existing physical space while simultaneously remaining adaptable to fluctuations in patient demand. In the short term, health systems may experience an influx of patients who delayed 

In [3]:
# Set up prompt for GPT
from langchain.prompts import PromptTemplate

template = """
Given the following extracted parts of a long document ("SOURCES") and a question ("QUESTION"), create a concise answer followed by a list of supporting facts in bullet point format.
- Start with a brief, one- or two-sentence answer addressing the question directly.
- Then, list key supporting facts and figures derived strictly from the SOURCES in markdown format.
- Do not make up an answer or include information not found in the SOURCES. If the SOURCES do not provide enough information to answer, state "I don't know" and explain that the SOURCES lack sufficient details.
QUESTION: {question}
=========
SOURCES:
{summaries}
=========
ANSWER:
"""
prompt = PromptTemplate(template=template, input_variables=["summaries", "question"])

In [4]:
# Run LLM
from langchain_openai import ChatOpenAI
from langchain.docstore.document import Document
from utilities import extract_text_from_highlights, get_openai_key

api_key = get_openai_key()

highlights, texts = extract_text_from_highlights(results, token_limit=150)
docs = [Document(page_content=f"Source [{ind}]:" + t) for ind, t in enumerate(texts)]
llm = ChatOpenAI(model_name='gpt-4o-mini', temperature=0.8)
# Create the chain using the RunnableSequence
chain_qa = prompt | llm
llm_results = chain_qa.invoke({"summaries": docs, "question": results['query']}, return_only_outputs=True)
ans = llm_results.content
print(ans)

OpenAI key accessed.
Executives need to focus on closing a 20-point utilization gap in exam room usage to align with optimal levels of 80 to 89%. 

- Health system executives indicated that their clinics are operating significantly below optimal utilization, with 72% reporting lower levels.
- The identified utilization gap averages 20 points from the optimal usage.
- Closing this gap could dramatically impact health system revenue; for instance, the Orthopedic Director noted potential lost revenue from being able to see more patients daily (35 to 40 instead of 25 to 30).
- Increasing exam room utilization by 20% could lead to savings of $200,000 per clinic, translating to $2.8 million in annual savings for systems with 14 clinics.


## Step 4. Evaluate model performance
See [Model Grading](https://cookbook.openai.com/examples/evaluation/getting_started_with_openai_evals?utm_source=chatgpt.com) for background. First, we will set up an eval prompt, then run a new model with the source data and the 'ans' variable from the previous step. 

In [5]:
eval_template = """
SYSTEM: You are a meticulous evaluator tasked with grading the performance of an AI model in responding to a specific task. Your role is to evaluate the accuracy, relevance, and factuality of the model’s response based on the provided source material. Follow these instructions carefully:

1. Fact-Checking:
    Compare the response (ans) against the source material (results).
    Determine if the response is factually correct based on the source. Reason step-by-step before concluding with a clear "Yes" or "No."

2. Relevance of Facts:
    Assess whether the model used the most pertinent facts from the source material to answer the question effectively. Provide reasoning to support your judgment.

3. Detection of Hallucinations:
    Look for evidence of hallucinations—information or details in the response that are not supported by the source material.
    If hallucinations are present, clearly highlight them and explain why they are incorrect or unsupported.
RESULTS: {results}
=========
ANSWER:
{ans}
"""
eval_prompt = PromptTemplate(template=eval_template, input_variables=["results", "ans"])

In [7]:
eval_llm = ChatOpenAI(model_name='gpt-4o', temperature=0.7)
chain_eval = eval_prompt | eval_llm
full_eval_results = chain_eval.invoke({"results": results, "ans": ans}, return_only_outputs=True)
eval_content = full_eval_results.content
print(eval_content)

1. Fact-Checking:
   - The response states that executives need to focus on closing a 20-point utilization gap in exam room usage, aiming for optimal levels of 80 to 89%. This is supported by the source, which mentions that executives identified a 20-point gap between current utilization and optimal levels, defined as 80 to 89%.
   - The statement that 72% of clinics operate below optimal utilization is correct as per the source material.
   - The response mentions potential lost revenue due to underutilization, citing the Orthopedic Director's comment about seeing fewer patients than possible, which aligns with the source.
   - The claim about saving $200,000 per clinic by increasing utilization by 20% is supported by the source, quoting the VP of Oncology Services.

   Conclusion: Yes, the response is factually correct based on the source material.

2. Relevance of Facts:
   - The model selects key facts from the source that directly address the utilization gap, such as the 20-point 