<a href="https://colab.research.google.com/github/louisdennington-design/decision-tree-dissertation/blob/main/the_state_manager.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Current issue needing attention: list of fields for structured JSON guideline and patient schema are currently repeated across several places here and in llm_makes_json, needing parallel updates and vulnerable to errors. Compile one list of keys that can be shared across the following?:



*   llm_makes_json -> def construct_prompt()
*   llm_makes_json -> def orchestrate_create_json()
*   the_state_manager -> def extract_patient_facts()



In [2]:
# Mount Google Drive

from google.colab import drive
drive.mount('/content/drive', force_remount = True)

Mounted at /content/drive


In [4]:
# Import packages

!pip install -q streamlit
!pip install -q streamlit-chat

import os
import json
from transformers import AutoModelForCausalLM, AutoTokenizer
import streamlit as st
from streamlit_chat import message

AttributeError: partially initialized module 'torch' has no attribute 'fx' (most likely due to a circular import)

In [5]:
# Global parameters

MODEL_NAME = "Qwen/Qwen2.5-7B-Instruct"

LOAD_PATH = "/content/drive/My Drive/Colab Notebooks/Dissertation/JSON"
LOAD_GUIDELINE_STRUCTURED = os.path.join(LOAD_PATH, "guideline_structured.json")
LOAD_PATIENT_SCHEMA = os.path.join(LOAD_PATH, "patient_schema.json")

SAVE_PATH = os.path.join(LOAD_PATH, "patient_schema.json")
os.makedirs(SAVE_PATH, exist_ok=True)
SAVE_FILE = os.path.join(SAVE_PATH, "DEFINE") # DEFINE BEFOFE RUNNING

In [None]:
# Load LLM

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype="auto",
    device_map="auto")

In [None]:
# Load JSON

def load_json(file_path):
    try:
        with open(file_path, "r", encoding="utf-8") as f:
            return json.load(f)
    except FileNotFoundError:
        raise FileNotFoundError(f'JSON file not found: {file_path}')

guideline_structured = load_json(LOAD_GUIDELINE_STRUCTURED)

print(type(guideline_structured))
print(len(guideline_structured))
print(guideline_structured)

patient_schema = load_json(LOAD_PATIENT_SCHEMA)

print(type(patient_schema))
print(len(patient_schema))
print(patient_schema[0])

In [None]:
# Create record of exact guideline version extracted with metadata
"""In case of future runs with differernt guidelines and structure
Then problems with a run can be traced precisely"""
- guideline name
- html
- scrape date

In [None]:
# Save local copy of JSON with metadata
- protect the file somehow from being overwritten or edited?

In [None]:
# Create an index or summary for the JSON to enable more reliable searching by the state manager?

In [None]:
#### NOTE
"""
If the question being asked tessellates with a particular section (e.g., a medication)
it may be worth flagging other areas (e.g., carer support.. physical health...)
The model may need to supply back to the user a list of "have you also considered..." making use of these headings
to prevent a narrow focus on the main content of the question
and ensure that other recommendation sections are also being considered
"""

In [None]:
def get_user_message():
    """
    Uses a UI like Streamlit or an API to send and receive messages
    """

    # From https://www.geeksforgeeks.org/python/create-a-chatbot-with-openai-and-streamlit-in-python/
    st.title(&quot;NICE GUIDELINE CHATBOT&quot;)
    if 'user_input' not in st.session_state:
        st.session_state['user_input'] = []

    if 'openai_response' not in st.session_state:
        st.session_state['openai_response'] = []

    def get_text():
        input_text = st.text_input(&quot;Enter your question and patient information here:&quot;, key=&quot;input&quot;)
        return input_text

    user_input = get_text()

    if user_input:
        output = api_calling(user_input)
        output = output.lstrip(&quot;\n&quot;)

        # Store the output
        st.session_state.openai_response.append(user_input)
        st.session_state.user_input.append(output)


In [None]:
def call_llm(prompt):
    """
    General function for call to the LLM
    What is passed to the LLM ("prompt") is decided by other functions
    """

    inputs = tokenizer(prompt,
                       return_tensors="pt").to(model.device)

    outputs = model.generate(**inputs,
                             max_new_tokens=500,
                             do_sample=False) # deterministic decoding without random sampling
                                            # if removed, reinstate temperature / top_p / top_k

    llm_response = tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[1]:],
                                          skip_special_tokens=True)

    return llm_response[0]

In [37]:
def extract_patient_facts(user_input):

    # At some point need to decide whether this should output patient_facts variable
    # ... or store the output in a predefined and loaded patient_schema.json

    """
    Use LLM to decompose message into fields captured by JSON-keys
    """

    prompt = f"""You are extracting structured information from USER INPUT.

                RULES:
                - output must be valid JSON only (no markdown)
                - do not invent or infer clinical information; use only what is present in the user input
                - For ALL clinical descriptor fields (e.g. phase, severity, medication): populate a value ONLY if it is explicitly stated in the user input. Do NOT infer information that is not directly stated. If not explicit, use null.
                - 'age' must be an integer
                - 'gender' must be one of: ['male', 'female', 'other', null]
                - Extract 'urgency' as 'True' if the text includes 'urgent', 'urgently', 'immediate' or 'immediately', otherwise 'False'
                - 'manic_episode_history' must be one of: ['none', 'one', 'multiple', null]
                - 'current_manic_phase' must be one of: ['mania', 'hypomania', 'bipolar_depression', 'mixed', 'rapid_cycling', 'euthymic', null]
                - 'mania_severity' must be one of: ['mild', 'moderate', 'severe', null]
                - 'current_psychosis' must be: ['present', 'absent', null]
                - 'diagnoses' must be one or more comorbid mental health diagnoses. If more than one diagnosis is mentioned, record all as a list of strings.
                - 'current_medication' must be a medication name or null. If more than one medication is mentioned, record all as a list of strings.
                - 'medication_adherence' must be one of: ['good', 'poor', null]
                - 'physical_health_longterm' must be the name of a physical disease diagnosis that affects a person for more than six months. If more than one diagnosis is mentioned, record all as a list of strings.
                - 'physical_health_recent' must be the name of a transient disease (less than six months) or physical health event from the last six months. If more than one diagnosis is mentioned, record all as a list of strings.
                - 'risk' must be one of: ['self_harm', 'risk_to_others', null]
                - 'psychological_therapy' must be one of: ['offered', null]
                - 'care_coordination' must be one of: ['current', 'offered', null]
                - 'patient_preference' is a string describing what the patient has requested or expressed a preference for. If no preference is stated, record null.
                - you MUST use 'null' if the information for any field is not explicit in the recommendation or heading
                - if there is more than one value for any field, retain all as a list of strings

                Produce JSON with exactly these keys:
                - age
                - gender
                - urgency
                - manic_episode_history
                - current_manic_phase
                - mania_severity
                - current_psychosis
                - diagnoses
                - current_medication
                - medication_adherence
                - physical_health_longterm
                - physical_health_recent
                - risk
                - psychological_therapy
                - care_coordination
                - patient_preferences

                USER INPUT: {user_input}
                """

    inputs = tokenizer(prompt,
                       return_tensors="pt").to(model.device)

    outputs = model.generate(**inputs,
                             max_new_tokens=250, # RESET TO 500 AFTER TEST RUN?

                             do_sample=False) # deterministic decoding without random sampling
                                            # if removed, reinstate temperature / top_p / top_k

    patient_facts = tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[1]:],
                                          skip_special_tokens=True)

    return patient_facts[0]

In [38]:
# Test of conversion to patient facts

user_input_test = "The patient is a 23-year-old woman with suspected bipolar disorder - though we are still waiting for the duty psychiatrist to confirm this - as well as previous diagnoses of borderline personality disorder and ADHD. She is currently under the local secondary care team after an admission eight months ago, when she was presenting as aggressive and with grandiose ideas about being a religious prophet. She was prescribed lithium but, according to the team, she has not been taking it consistently because she doesn't like the drowsiness. We're concerned about her because, recently, she has presented as more irritable again and once threatened our deputy manager with kidnap, saying that she could tell he was possessed by a demon. She says she hasn't been sleeping for one week. The GP reported that she smelt strongly of alcohol when seen there. The patient says she doesn't want to go back to hospital. Should we considering increasing her lithium?"

patient_facts_test = extract_patient_facts(user_input_test)

print(patient_facts_test)

KeyboardInterrupt: 

In [None]:
def select_relevant_recommendations(patient_facts):
    """
    Based on matches between decomposed user input and guideline_structured.json
    """

    matching_fields = [] # A data object capturing all fields shared between decomposed user input and the structured guideline

    relevant_recommendations = [] # A place to store the full text of original recommendations

    for i, v in patient_facts: # Iterate over each key-value pair in patient_facts
        if v in guideline_structured: # Check whether the value appears in the json
            matching_fields.append(i, v) # Add that value to "matching_fields"



In [None]:
# Test

select_relevant_recommendations(patient_facts)

In [None]:
def decide_next_question_or_answer():
    """
    - must record history of decisions taken for audit
    """


In [None]:
def generate_question():
    # Needed as separate step?

In [None]:
def update_patient_dict():
    """
    - what is known about the "patient" object
    - history of questions asked and user answers, or is this in the UI?
    """