<a href="https://colab.research.google.com/github/camilasalinasc/Group-30-Brightside-Health/blob/main/Group_30_Brightside_Health.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Initial setup

In [None]:
pip install openai llama_index pyvis

Collecting llama_index
  Downloading llama_index-0.12.2-py3-none-any.whl.metadata (11 kB)
Collecting pyvis
  Downloading pyvis-0.3.2-py3-none-any.whl.metadata (1.7 kB)
Collecting llama-index-agent-openai<0.5.0,>=0.4.0 (from llama_index)
  Downloading llama_index_agent_openai-0.4.0-py3-none-any.whl.metadata (726 bytes)
Collecting llama-index-cli<0.5.0,>=0.4.0 (from llama_index)
  Downloading llama_index_cli-0.4.0-py3-none-any.whl.metadata (1.5 kB)
Collecting llama-index-core<0.13.0,>=0.12.2 (from llama_index)
  Downloading llama_index_core-0.12.2-py3-none-any.whl.metadata (2.5 kB)
Collecting llama-index-embeddings-openai<0.4.0,>=0.3.0 (from llama_index)
  Downloading llama_index_embeddings_openai-0.3.1-py3-none-any.whl.metadata (684 bytes)
Collecting llama-index-indices-managed-llama-cloud>=0.4.0 (from llama_index)
  Downloading llama_index_indices_managed_llama_cloud-0.6.3-py3-none-any.whl.metadata (3.8 kB)
Collecting llama-index-legacy<0.10.0,>=0.9.48 (from llama_index)
  Downloading 

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Import the directory with all the PDF files

In [None]:
from llama_index.core import SimpleDirectoryReader
import nest_asyncio

nest_asyncio.apply()

documents = SimpleDirectoryReader("/content/drive/Shareddrives/BrightSide/papers/").load_data()

Set up schema-based extraction model

In [None]:
# Import necessary modules
import openai
from typing import Literal
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor, PropertyGraphIndex
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
import os

# Set your OpenAI API key
openai.api_key = "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"  # Replace with your OpenAI API key

# Define models and temperature settings
LLM_MODEL = "gpt-4o-mini"
TEMPERATURE = 0.3
EMBEDDING_MODEL = "text-embedding-ada-002"

# Define entities and relations for schema-based knowledge extraction
entities = Literal["CONDITION", "SYMPTOM", "TREATMENT", "SIDE EFFECT"]
relations = Literal["CAUSES", "TREATS", "TARGETS", "INTERACTS WITH", "RECOMMENDED FOR", "IS COMORBID WITH"]

schema = {
    "CONDITION": ["CAUSES", "IS COMORBID WITH"],
    "SYMPTOM": [],
    "TREATMENT": ["CAUSES", "TREATS", "TARGETS", "INTERACTS WITH", "RECOMMENDED FOR"],
    "SIDE EFFECT": ["CAUSES"],
}

# Initialize Schema-based extractor
schema_kg_extractor = SchemaLLMPathExtractor(
    llm=OpenAI(model=LLM_MODEL, temperature=TEMPERATURE),
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=schema,
    strict=True,
)

# Assuming `documents` is already a list of extracted text or document objects
# Initialize the PropertyGraphIndex with the schema-based extractor
schema_index = PropertyGraphIndex.from_documents(
    documents,  # replace `documents` with your actual document list or extracted text
    embed_model=OpenAIEmbedding(model_name=EMBEDDING_MODEL),
    show_progress=True,
    kg_extractors=[schema_kg_extractor],
)

Parsing nodes:   0%|          | 0/42 [00:00<?, ?it/s]

Extracting paths from text with schema: 100%|██████████| 103/103 [03:21<00:00,  1.96s/it]
Generating embeddings: 100%|██████████| 2/2 [00:01<00:00,  1.07it/s]
Generating embeddings: 100%|██████████| 14/14 [00:02<00:00,  5.10it/s]


Set up free-form extraction model

In [None]:
from llama_index.core.indices.property_graph import SimpleLLMPathExtractor

# Create free form extractor
free_form_kg_extractor = SimpleLLMPathExtractor(
    llm=OpenAI(model=LLM_MODEL, temperature=TEMPERATURE)
)

# Initialize the PropertyGraphIndex with documents and extractor
free_form_index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=OpenAIEmbedding(model_name=EMBEDDING_MODEL),
    show_progress=True,
    kg_extractors=[free_form_kg_extractor],
)

Parsing nodes:   0%|          | 0/42 [00:00<?, ?it/s]

Extracting paths from text: 100%|██████████| 103/103 [01:12<00:00,  1.42it/s]
Generating embeddings: 100%|██████████| 2/2 [00:01<00:00,  1.18it/s]
Generating embeddings: 100%|██████████| 20/20 [00:04<00:00,  4.59it/s]


Save both model's knowledge graphs in html files

In [None]:
path_output_storage = "/content/drive/Shareddrives/BrightSide//knowledge_graphs"
kw_extractor_names = ["schema_kg_extractor", "free_form_kg_extractor"]

# Set up the output storage path for all models
path_output_storage_kg_extractor_schema = f"{path_output_storage}/{kw_extractor_names[0]}/"
if not os.path.exists(path_output_storage_kg_extractor_schema):
  os.makedirs(path_output_storage_kg_extractor_schema)
path_output_storage_kg_extractor_free_form = f"{path_output_storage}/{kw_extractor_names[1]}/"
if not os.path.exists(path_output_storage_kg_extractor_free_form):
  os.makedirs(path_output_storage_kg_extractor_free_form)

# Persist the indexes
schema_index.storage_context.persist(persist_dir=path_output_storage_kg_extractor_schema)
free_form_index.storage_context.persist(persist_dir=path_output_storage_kg_extractor_free_form)

# Save the knowledge graphs as a NetworkX graph to a file
schema_index.property_graph_store.save_networkx_graph(name=f"{path_output_storage}/schema_knowledge_graph.html")
free_form_index.property_graph_store.save_networkx_graph(name=f"{path_output_storage}/free_form_knowledge_graph.html")

Testing the query on the API without uploading the documents

In [None]:
import openai


client = openai.OpenAI(
    api_key="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",
)
modeltype = 'gpt-4o-mini'

# Define the assistant and thread content without any files
assistant_name = "Research Assistant"
instructions = """
You are an expert clinician that is using the provided data to give treatment advice related to depression and anxiety.
"""
content = """
Treating patients with anxious depression poses challenges due to the potential for poorer treatment outcomes with antidepressant monotherapy. Patients with anxious depression may have a higher physical illness burden, lower socioeconomic status, greater severity of depression, and later onset of depression, which can contribute to a more difficult treatment process. In terms of medication performance, for nonanxious depression, citalopram, venlafaxine, sertraline, and bupropion have been identified as effective treatments. On the other hand, for anxious depression, the remission rates with bupropion, sertraline, and venlafaxine were lower compared to nonanxious depression, indicating a difference in treatment response between the two groups.
"""

# Create the assistant
assistant = client.beta.assistants.create(
    name=assistant_name,
    instructions=instructions,
    model=modeltype,
    tools=[]
)

# Create a thread without attaching any files
thread = client.beta.threads.create(
    messages=[
        {
            "role": "user",
            "content": content,
        }
    ]
)

# Run and get the assistant's answer to your message
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id, assistant_id=assistant.id
)

messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))
message_content = messages[0].content[0].text

# Process and print the response content
annotations = message_content.annotations
citations = []
for index, annotation in enumerate(annotations):
    message_content.value = message_content.value.replace(annotation.text, "").replace(".", "")
    if file_citation := getattr(annotation, "file_citation", None):
        cited_file = client.files.retrieve(file_citation.file_id)
        citations.append(f"[{index}] {cited_file.filename}")

# Final output
cand_name = message_content.value.replace(".", "").strip()
print(cand_name)

When treating patients with anxious depression, it is crucial to adopt a tailored approach due to the complex interplay of anxiety and depressive symptoms, which can affect treatment efficacy Here are some considerations and strategies based on the information provided:

1 **Combination Therapy**: Given that antidepressant monotherapy may not be as effective for anxious depression, consider a combination of antidepressants and anxiolytics (eg, SSRIs like sertraline or venlafaxine with benzodiazepines or buspirone, if appropriate) to address both depressive and anxiety symptoms

2 **Prioritize Response Over Remission**: Since remission rates for certain antidepressants like bupropion, sertraline, and venlafaxine are lower in anxious depression, focus on managing symptoms and improving overall functioning rather than solely aiming for complete remission 

3 **Medication Selection**: 
   - **SSRIs like Sertraline and Escitalopram**: These are often first-line treatments and have been show

Testing the query on the API with uploading the documents

In [None]:
import openai


client = openai.OpenAI(
    api_key="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",
)
modeltype = 'gpt-4o-mini'


assistant_name = "Research Assistant"


instructions = ""
"""
You are an expert clinician that is using the uploaded documents to give treatment advice related to depression and anxiety.
"""


content = """
What are the challenges in treating patients with anxious depression compared to nonanxious depression, and how do different medications perform across these groups?
"""

# Define a list of file paths you want to upload
file_paths = [
    "/content/drive/Shareddrives/BrightSide/papers/WJCC-9-9350.pdf",
    "/content/drive/Shareddrives/BrightSide/papers/100-Papers-in-Clinical-Psychiatry-Depressive-Disorders-Comparative-efficacy-and-acceptability-of-12-new-generation-antidepressants-a-multiple-treatments-meta-analysis.pdf",
    "/content/drive/Shareddrives/BrightSide/papers/fava-et-al-2008-difference-in-treatment-outcome-in-outpatients-with-anxious-versus-nonanxious-depression-a-star_d-report.pdf"
]

# Create the assistant
assistant = client.beta.assistants.create(
    name=assistant_name,
    instructions=instructions,
    model=modeltype,
    tools=[{"type": "file_search"}]
)

# Upload each file and store the file IDs
attachments = []
for path in file_paths:
    message_file = client.files.create(
        file=open(path, "rb"),
        purpose="assistants"
    )
    attachments.append({
        "file_id": message_file.id,
        "tools": [{"type": "file_search"}]
    })

# Create a thread and attach all files to the message
thread = client.beta.threads.create(
    messages=[
        {
            "role": "user",
            "content": content,
            # Attach the files to the message.
            "attachments": attachments
        }
    ]
)

# Run and get the assistant's answer
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id, assistant_id=assistant.id
)

messages = list(client.beta.threads.messages.list(thread_id=thread.id, run_id=run.id))
message_content = messages[0].content[0].text

# Process and print the response content
annotations = message_content.annotations
citations = []
for index, annotation in enumerate(annotations):
    message_content.value = message_content.value.replace(annotation.text, "").replace(".", "")
    if file_citation := getattr(annotation, "file_citation", None):
        cited_file = client.files.retrieve(file_citation.file_id)
        citations.append(f"[{index}] {cited_file.filename}")

# Final output
cand_name = message_content.value.replace(".", "").strip()
print(cand_name)

Treating patients with anxious depression presents specific challenges when compared to those with nonanxious depression, as evidenced by findings from the STAR*D study

### Challenges in Treatment

1 **Poorer Treatment Outcomes**:
   Patients with anxious depression tend to have a significantly lower likelihood of achieving remission from depression compared to those without anxiety symptoms For instance, in the STAR*D study, patients with anxious depression had a remission rate of 222%, compared to 334% for nonanxious depression Additionally, these patients often take longer to achieve remission and experience more severe depressive symptoms

2 **Higher Rates of Side Effects**:
   Anxious depression is associated with a greater intensity and burden of side effects from antidepressant treatment Patients often report more severe side effects and a higher incidence of serious adverse events, leading to difficulties in adhering to treatment

3 **Comorbid Conditions**:
   Individuals with

Retrieve responses using a vector context retriever on both models

In [None]:
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

from llama_index.core.indices.property_graph import VectorContextRetriever

# Create a sub-retriever using VectorContextRetriever
# This will use the property graph store and vector store from the loaded index
# The embed_model parameter specifies the model to be used for embedding queries
sub_retriever = VectorContextRetriever(
schema_index.property_graph_store,
  vector_store=schema_index.vector_store,
  embed_model=OpenAIEmbedding(model_name=EMBEDDING_MODEL),
)

# Create a retriever from the index using the previously defined sub-retriever
retriever = schema_index.as_retriever(sub_retrievers=[sub_retriever])
# Initialize the query engine using the retriever
# The query engine will use the retriever(s) to process and return responses to
query_engine = schema_index.as_query_engine(
    sub_retrievers=[retriever]
)

print(
    query_engine.query("What are the long-term considerations for using a treatment like ketamine, and how often should I assess the patient for potential dependency or side effects?").response
)

Long-term considerations for using a treatment like ketamine include defining the most effective dose, determining the optimal administration route, and establishing guidelines for therapeutic monitoring. It is important to carefully monitor patients for potential dependency and side effects regularly due to the associated risks of drug abuse and addiction.


In [None]:
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI

from llama_index.core.indices.property_graph import VectorContextRetriever

# Create a sub-retriever using VectorContextRetriever
# This will use the property graph store and vector store from the loaded index
# The embed_model parameter specifies the model to be used for embedding queries
sub_retriever_free_form = VectorContextRetriever(
free_form_index.property_graph_store,
  vector_store=free_form_index.vector_store,
  embed_model=OpenAIEmbedding(model_name=EMBEDDING_MODEL),
)

# Create a retriever from the index using the previously defined sub-retriever
sub_retriever_free_form = free_form_index.as_retriever(sub_retrievers=[sub_retriever_free_form])
# Initialize the query engine using the retriever
# The query engine will use the retriever(s) to process and return responses to
query_engine_free_form = free_form_index.as_query_engine(
    sub_retrievers=[retriever]
)

print(
    query_engine_free_form.query("Which antidepressants are associated with higher risks of severe side effects, particularly in patients with anxious depression?").response
)

Monoamine oxidase inhibitors (MAOIs) are associated with higher risks of severe side effects, particularly in patients with anxious depression.
