Set up the marqo client. Make sure to activate the marqo server running in docker with the correct port (8882).

In [2]:
import marqo
import json

mq = marqo.Client(url='http://localhost:8882')
index_name = "book-index"

# Check if this index has been created, if so, delete
try:
    mq.index(index_name).delete()
except:
    pass

mq.create_index(index_name, model="hf/e5-base-v2")

# Load pre-cleaned data
with open(r'docs\QGenda Whitepaper_cleaned.json', 'r', encoding='utf-8') as file:
    data = json.load(file)

# Index to marqo
mq.index(index_name).add_documents(
    data,
    tensor_fields=["text"]
)

{'errors': False,
 'processingTimeMs': 6632.005779996689,
 'index_name': 'book-index',
 'items': [{'status': 200, '_id': '46ca8746-65ce-4ed8-994b-6a2676893916'},
  {'status': 200, '_id': '5f0a63b5-8ef1-433d-973f-75faa648b973'},
  {'status': 200, '_id': 'dad5ee64-cb01-4118-8f38-5c143ceb0d7a'},
  {'status': 200, '_id': 'db74951b-e0f9-45e9-8d8a-d8ac1d77574b'},
  {'status': 200, '_id': 'f95597cb-fb5b-403b-8242-440b6b492827'},
  {'status': 200, '_id': 'bb682333-bfcf-47ad-b503-1b89344e720b'}]}

Observe the results of a basic tensor search. This should return the most related text to the question, based on a semantic search. 

In [3]:
question="What utilization gap do executives need to focus on?"

# Return just two results
results = mq.index("book-index").search(
    q=question,
    limit=2
)
print(results)

{'hits': [{'index': 3, '_id': 'db74951b-e0f9-45e9-8d8a-d8ac1d77574b', 'text': "\nAs a result, over half of our respondents reported experiencing major difficulties with their current exam room scheduling solutions. Management of exam rooms is considered time-intensive, and the technology is difficult to use and error-prone. The result is outdated or incorrect information and the inability to positively impact utilization. Transparency is also a key roadblock. Executives routinely cite the lack of analytics required to even understand their current exam room utilization. These issues are compounded when each department uses a separate scheduling system, further limiting transparency and a clinic's ability to fully utilize exam rooms across departments. As a result, only 31% of health system executives report being happy with their current solution. This lack of proper scheduling causes unnecessary delays and limits patient throughput. If significant wait times pervade a health system at

In [4]:
# Set up prompt for GPT
from langchain.prompts import PromptTemplate

template = """
Given the following extracted parts of a long document ("SOURCES") and a question ("QUESTION"), create a final answer one paragraph long.
Don't try to make up an answer and use the text in the SOURCES only for the answer. If you don't know the answer, just say that you don't know.
QUESTION: {question}
=========
SOURCES:
{summaries}
=========
ANSWER:
"""
prompt = PromptTemplate(template=template, input_variables=["summaries", "question"])

In [6]:
# Run LLM
from langchain_openai import OpenAI
from langchain.docstore.document import Document
from langchain.chains import LLMChain
from utilities import extract_text_from_highlights, get_openai_key

api_key = get_openai_key()

highlights, texts = extract_text_from_highlights(results, token_limit=150)
docs = [Document(page_content=f"Source [{ind}]:" + t) for ind, t in enumerate(texts)]
llm = OpenAI(temperature=0.9)
# Create the chain using the RunnableSequence
chain_qa = prompt | llm
llm_results = chain_qa.invoke({"summaries": docs, "question": results['query']}, return_only_outputs=True)
print(llm_results)

OpenAI key accessed.
According to the sources, healthcare executives should focus on closing a 20-point utilization gap in the long term, with a target utilization level of above 80%. This gap is important to address as it can lead to patient dissatisfaction, loss of revenue, and patients seeking care elsewhere. In the short term, executives should also prioritize optimizing exam room utilization to handle an influx of patients due to COVID-19. By investing in new processes and technology, and setting measurable goals, clinics can better manage patient throughput and ensure a high level of utilization.
