# Generating Questions using ChatGPT4 and retrieval augmented generation to embed new knowledge into the language model

In [1]:
from langchain.prompts import ChatPromptTemplate
from dotenv import dotenv_values
from langchain.chat_models import ChatOpenAI
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQA

## Creating a prompt and loading Embedded Data
Now, we load the data that was previously retried and embedded into a vector database

In [14]:
assistant = """
You are an autoregressive language model that has been fine-tuned with instruction-tuning and RLHF. You are helping to answer questions based on provided information.
"""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", assistant),
        ("human",
         "Use the information provided to answer the following question: {question}")
    ]
)

In [15]:
def get_openai_key(path=""):
    """Gets OpenAI API key from .env file"""
    paths = ["/Users/samisaf/openai.env", "C:/Users/samis/openai.env", "C:/Users/samisaf/openai.env"]
    if len(path) > 0:
        return dotenv_values(path)['OPENAI_API_KEY']
    else:
        for p in paths:
            if len(dotenv_values(p)) > 0:
                return dotenv_values(p)['OPENAI_API_KEY']
    return None

In [16]:
openai_api_key = get_openai_key()
model="gpt-3.5-turbo"
temperature=0
max_tokens=1000

embedding =OpenAIEmbeddings(openai_api_key=openai_api_key)
db = Chroma(persist_directory='./db-kpmp-oct-23', embedding_function=embedding)
llm = ChatOpenAI(openai_api_key=openai_api_key, model=model, temperature=0, max_tokens=max_tokens)
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=db.as_retriever(), return_source_documents=True)

## Answering Questions with ChatGPT 3.5
Now, we used the prompts created in the previous step to generate the questions using ChatGPT 4

In [17]:
# Be careful running this line as it sends all prompts to open ai (takes time and consumes money)
question1 = "At what hemoglobin drop after the biopsy does the patient need to be observed?"
result1 = qa(prompt.format(question=question1))
result1['result']


'The patient needs to be observed if there is a fall in hemoglobin of more than 2 grams/dL or more than 1 gram/dL to less than 9 grams/dL after the biopsy. In such cases, extended observation is required, with follow-up measurement of hemoglobin after an interval of at least 2 hours. Additional observation, including overnight observation or hospital admission, will be done if clinically indicated.'

In [18]:
result1['source_documents']

[Document(page_content='7.2.3. Post-Operative Monitoring\n\nHemoglobin concentration is serially measured for all participants at 4 hours and 8 hours post biopsy and then daily. A fall in hemoglobin of >2 grams/dL - OR - >1 gram/dL to less than 9 grams/dL requires a repeat measurement if attributable to bleeding from the biopsy site\n\nIn the event that significant bleeding (>2g/L drop in Hgb from baseline concentration) is identified, an ultrasound or non-contrast CT is obtained emergently, with the choice of imaging procedure at the discretion of the primary surgical service or critical care services.\n\nIf imaging shows perinephric fluid consistent with a hematoma, the biopsied kidney is further investigated with a repeat imaging within 12-24 hours. Hemoglobin concentrations are serially measured every six hours until stable.\n\nThe following circumstances prompt interventional radiology consultation for angioembolization.\n\nVolumetric enlargement of hematoma >25% from baseline mea

In [19]:
question2 = "What are the inclusion criteria?"
result2 = qa(prompt.format(question=question2))
result2['result']

'Based on the information provided, the specific inclusion criteria for participants in the study are not mentioned. However, it is stated that eligibility assessment is conducted, and lab tests may be needed to confirm eligibility prior to enrollment. The New Participant CRF is used to document pre-screening eligibility information, and participants who meet the inclusion criteria are added to REDCap. For AKI, potential participants are added to REDCap if they meet inclusion criteria, and for CKD, potential participants are added if they meet inclusion/exclusion criteria and are considered by the site investigator. Unfortunately, the exact inclusion criteria are not provided in the given context.'

In [21]:
qa(prompt.format(question="what is the creatinine threshold to enroll patient with CKD?"))['result']

'Based on the information provided, there are two criteria for enrolling a patient with CKD:\n\n1. Elevated serum creatinine: The patient can be enrolled if their serum creatinine is greater than or equal to 1.5 times the baseline value. This can be determined by either a repeat serum creatinine within 6 to 48 hours of the initial measurement that is still greater than or equal to 1.5 times the baseline, or by positive kidney injury urine biomarkers measured at the recruitment site, or by urine microscopy suggestive of acute tubular necrosis.\n\n2. High risk for acute kidney injury: The patient can also be enrolled if they meet two or more criteria for high risk of acute kidney injury. This can be determined by positive kidney injury urine biomarkers, such as NGAL level greater than or equal to 150 ng/mL, KIM1 level greater than or equal to 2.8 ng/mL, or TIMP2 x IGFBP7 greater than or equal to 2.0.\n\nPlease note that these criteria are based on the provided information and may be subj