![Query Expansion](assets/query_expansion.png "Query Expansion")

In [1]:
from helper_utils import word_wrap, project_embeddings
from pinecone import Pinecone
import os
import openai
from openai import OpenAI
from dotenv import load_dotenv
import umap

  from tqdm.autonotebook import tqdm


In [2]:
pc = Pinecone(api_key=os.environ['PINECONE_API_KEY'])
index = pc.Index("rag-kak")

In [3]:
load_dotenv()
openai.api_key = os.environ['OPENAI_API_KEY']

openai_client = OpenAI()

In [4]:
def embedding_function(text, model="text-embedding-ada-002"):
    
    response = openai_client.embeddings.create(
        input=text,
        model=model
    )
    
    embedding = response.data[0].embedding
    
    return embedding

In [9]:
def augment_query_generated(query, model="gpt-4o"):
    messages = [
        {
            "role": "system",
            "content": "You are a knowledgeable healthcare research assistant. Your users are asking questions about information contained in a healthcare document. You will be shown the user's question and the relevant information from the healthcare document. Answer the question with support of the provided document."
        },
        {"role": "user", "content": query}
    ]

    response = openai_client.chat.completions.create(
        model=model,
        messages=messages,
    )
    content = response.choices[0].message.content
    return content

In [10]:
original_query = "How can PEGylation improve recombinant drugs?"
hypothetical_answer = augment_query_generated(original_query)

joint_query = f"{original_query} {hypothetical_answer}"
print(word_wrap(joint_query))

How can PEGylation improve recombinant drugs? PEGylation can improve
recombinant drugs by increasing their solubility, decreasing
immunogenicity, protecting from rapid clearance in vivo, and reducing
proteolysis.


In [11]:
results = index.query(
    vector=embedding_function(joint_query),
    top_k=5,
    include_metadata=True
)['matches']

for dicts in results:
    print(word_wrap(dicts['metadata']['text']))
    print('')

. recently, small molecule pharmacological chaperones have been shown
to increase protein stability and cellular levels for mutant lysosomal
enzymes and have emerged as a new therapeutic strategy for the
treatment of lsds.

appropriate patient enrichment combination with chemotherapy may
ultimately prove successful in improving overall survival and novel
agents targeting multiple proangiogenic pathways may prove effective.

conclusions : pegylated liposomal doxorubicin administered alone or in
combination with tamoxifen is safe and moderately effective in patients
with recurrent high - grade glioma.

therapeutics that have proven to be highly effective include the
immunomodulatory drug thalidomide and its newer analogs, lenalidomide
and pomalidomide, as well as the proteasome inhibitors bortezomib and
carfilzomib

anti - cd3 teplizumab and anti - cd3 otelixizumab have been shown to
provide c - peptide preservation.



# Multiple Query Expansion

![Query Expansion](assets/query_expansion_multi.png "Query Expansion")

In [14]:
def augment_multiple_query(query, model="gpt-4o"):
    messages = [
        {
            "role": "system",
            "content": "You are a knowledgeable healthcare research assistant. Your users are asking questions about information contained in a healthcare document."
            "Suggest up to four additional related questions to help them find the information they need, for the provided question. "
            "Suggest only short questions without compound sentences. Suggest a variety of questions that cover different aspects of the topic."
            "Make sure they are complete questions, and that they are related to the original question."
            "Output one question per line. Do not number the questions."
        },
        {"role": "user", "content": query}
    ]

    response = openai_client.chat.completions.create(
        model=model,
        messages=messages,
    )
    content = response.choices[0].message.content
    content = content.split("\n")
    return content

In [15]:
original_query = "How can PEGylation improve recombinant drugs?"
augmented_queries = augment_multiple_query(original_query)

for query in augmented_queries:
    print(query)

What are the common methods of PEGylation?
What are the potential side effects of PEGylated drugs?
How does PEGylation enhance the stability of recombinant drugs?
Can PEGylation impact the immunogenicity of recombinant drugs?
What specific diseases can be treated more effectively with PEGylated recombinant drugs?


In [39]:
queries = [original_query] + augmented_queries
results = set()

for query in queries:
    response = index.query(
        vector=embedding_function(query),
        top_k=2,
        include_metadata=True,
    )['matches']
    
    lst_tmp = set()
    for dicts in response:
        lst_tmp.add(dicts['metadata']['text'])
    
    text_holder = ""
    for text in lst_tmp:
        text_holder += text + "\n"
    
    results.add(text_holder)
    
for doc in results:
    print(word_wrap(doc))
    print('-' * 200)

increased antigenicity following selumetinib and ifn treatment warrants
further study for immunotherapy of progressive ptc.
. the effects of
rg7112 and peg - ifnα 2a on mpn progenitor cells were dependent on
blocking p53 - mdm2 interactions and activating the p53 pathway,
thereby increasing mpn cd34 ( + ) cell apoptosis.
with envelope
proteins engineered to bind to this therapeutic antibody
the effects of
rg7112 and peg - ifnα 2a on mpn progenitor cells were dependent on
blocking p53 - mdm2 interactions and activating the p53 pathway,
thereby increasing mpn cd34 ( + ) cell apoptosis
we here delineated the
molecular and cellular mechanisms underlying novel immunomodulatory
effects triggered by bcma pyrrolobenzodiazepine ( pbd ) antibody drug
conjugate ( adc ) medi2228 which can augment efficacy of these
immunotherapies.

------------------------------------------------------------------------------------------------------------------------------------------------------------------------