# Ollma Usage (Running LLM's on your local machine)

## Setup

1. Install ollama (https://ollama.com/download)
2. from the terminal run `ollama run llama3.2` to launch llama3.2
3. To use ollama apis `pip install ollama`

In [1]:
!pip install ollama transformers==4.28.1 sentencepiece datasets ipywidgets



In [2]:
import transformers

## Usage

In [3]:
from ollama import chat
from ollama import ChatResponse

response: ChatResponse = chat(model='llama3.2', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content']) #or print(response.message.content)

The sky appears blue because of a phenomenon called Rayleigh scattering, named after the British physicist Lord Rayleigh. He discovered that when sunlight enters Earth's atmosphere, it encounters tiny molecules of gases such as nitrogen and oxygen.

These tiny molecules scatter the light in all directions, but they scatter shorter (blue) wavelengths more than longer (red) wavelengths. This is because the smaller molecules are more effective at scattering the shorter wavelengths.

As a result, when we look up at the sky, our eyes see mostly the blue light that has been scattered by the tiny molecules, giving the sky its blue appearance. The other wavelengths of light, such as red and orange, continue to travel in a straight line and reach our eyes directly from the sun, but they are less prominent because of the scattering effect.

It's worth noting that the color of the sky can change depending on various factors, such as:

* Time of day: During sunrise and sunset, the sky can take on 

In [4]:
from ollama import embeddings
response = embeddings(model='llama3.2', prompt='Why is the sky blue?')
response

EmbeddingsResponse(embedding=[-1.245145559310913, 0.6557098031044006, -2.2436630725860596, -0.9488217830657959, 0.8453716039657593, 1.267472743988037, 1.4656982421875, -0.6348537802696228, 0.021695148199796677, -1.2844456434249878, -1.1576210260391235, 1.1311951875686646, 1.018588662147522, 2.2110514640808105, -2.4925174713134766, 2.2690701484680176, 0.8534305095672607, 0.5113649964332581, 0.9391489624977112, -1.3037784099578857, -0.31355684995651245, 0.03764760494232178, 0.07037448137998581, 1.6618024110794067, 1.479368805885315, 0.5887148380279541, -0.6719256043434143, -0.887116551399231, 3.249343156814575, 0.06196758523583412, -1.4638837575912476, -0.4953858554363251, 0.31518876552581787, 1.8472554683685303, 1.5910197496414185, -1.1464338302612305, 0.8791815042495728, 0.9113344550132751, -0.6553483009338379, -0.2979761064052582, -1.1542118787765503, 0.26300621032714844, 0.9444526433944702, 0.3614429235458374, -0.7953523993492126, -1.7572691440582275, 2.146158456802368, 0.88850766420

# Dataset
### https://synthea.mitre.org/downloadsLinks  (7 MB of data with 100 patients)

# Data preparation

In [5]:
import pandas as pd
import numpy as np
from datetime import date

import glob

# Load Synthea dataset from multiple CSV files
def load_synthea_data(csv_folder_path):
    csv_files = glob.glob(csv_folder_path + '/*.csv')
    dfs = {}
    for file in csv_files:
        file_name = file.split('/')[-1].split('.')[0]
        dfs[file_name.split("\\")[1]] = pd.read_csv(file)
    return dfs

# Load Synthea dataset
csv_folder_path = 'synthea_sample_data_csv_latest'
dfs = load_synthea_data(csv_folder_path)

print(f"File names={dfs.keys()}")

File names=dict_keys(['allergies', 'careplans', 'claims', 'claims_transactions', 'conditions', 'devices', 'encounters', 'imaging_studies', 'immunizations', 'medications', 'observations', 'organizations', 'patients', 'payers', 'payer_transitions', 'procedures', 'providers', 'supplies'])


In [6]:
dfs['patients'].columns

Index(['Id', 'BIRTHDATE', 'DEATHDATE', 'SSN', 'DRIVERS', 'PASSPORT', 'PREFIX',
       'FIRST', 'MIDDLE', 'LAST', 'SUFFIX', 'MAIDEN', 'MARITAL', 'RACE',
       'ETHNICITY', 'GENDER', 'BIRTHPLACE', 'ADDRESS', 'CITY', 'STATE',
       'COUNTY', 'FIPS', 'ZIP', 'LAT', 'LON', 'HEALTHCARE_EXPENSES',
       'HEALTHCARE_COVERAGE', 'INCOME'],
      dtype='object')

In [7]:
dfs['observations'].columns  

Index(['DATE', 'PATIENT', 'ENCOUNTER', 'CATEGORY', 'CODE', 'DESCRIPTION',
       'VALUE', 'UNITS', 'TYPE'],
      dtype='object')

In [8]:
dfs['allergies'].columns

Index(['START', 'STOP', 'PATIENT', 'ENCOUNTER', 'CODE', 'SYSTEM',
       'DESCRIPTION', 'TYPE', 'CATEGORY', 'REACTION1', 'DESCRIPTION1',
       'SEVERITY1', 'REACTION2', 'DESCRIPTION2', 'SEVERITY2'],
      dtype='object')

In [9]:
dfs['conditions'].columns

Index(['START', 'STOP', 'PATIENT', 'ENCOUNTER', 'SYSTEM', 'CODE',
       'DESCRIPTION'],
      dtype='object')

In [10]:
dfs['procedures'].columns

Index(['START', 'STOP', 'PATIENT', 'ENCOUNTER', 'SYSTEM', 'CODE',
       'DESCRIPTION', 'BASE_COST', 'REASONCODE', 'REASONDESCRIPTION'],
      dtype='object')

In [11]:
dfs['encounters'].columns 

Index(['Id', 'START', 'STOP', 'PATIENT', 'ORGANIZATION', 'PROVIDER', 'PAYER',
       'ENCOUNTERCLASS', 'CODE', 'DESCRIPTION', 'BASE_ENCOUNTER_COST',
       'TOTAL_CLAIM_COST', 'PAYER_COVERAGE', 'REASONCODE',
       'REASONDESCRIPTION'],
      dtype='object')

In [12]:
merged_df = pd.merge(dfs['patients'], dfs['allergies'], left_on='Id', right_on='PATIENT', how='inner')
len(merged_df)

92

In [13]:
merged_df = pd.merge(merged_df, dfs['conditions'], left_on='Id', right_on='PATIENT', how='inner')
len(merged_df)

5376

In [14]:
merged_df = pd.merge(merged_df, dfs['procedures'], left_on='Id', right_on='PATIENT', how='inner')
len(merged_df)

3741530

In [15]:
merged_df.head()

Unnamed: 0,Id,BIRTHDATE,DEATHDATE,SSN,DRIVERS,PASSPORT,PREFIX,FIRST,MIDDLE,LAST,...,START,STOP,PATIENT,ENCOUNTER,SYSTEM,CODE,DESCRIPTION,BASE_COST,REASONCODE,REASONDESCRIPTION
0,4569671e-ed39-055f-8e78-422b96c9896b,2013-08-10,,999-40-7708,,,,Caryl47,Lelia627,Kassulke119,...,2015-07-11T15:21:41Z,2015-07-11T15:36:41Z,4569671e-ed39-055f-8e78-422b96c9896b,84bd9826-4d63-2eda-97d4-4b9e72b99af5,http://snomed.info/sct,430193006,Medication reconciliation (procedure),516.11,,
1,4569671e-ed39-055f-8e78-422b96c9896b,2013-08-10,,999-40-7708,,,,Caryl47,Lelia627,Kassulke119,...,2016-01-16T15:21:41Z,2016-01-16T15:33:28Z,4569671e-ed39-055f-8e78-422b96c9896b,d3567fed-5d28-7c34-2571-83dff0dca791,http://snomed.info/sct,103697008,Patient referral for dental care (procedure),431.4,,
2,4569671e-ed39-055f-8e78-422b96c9896b,2013-08-10,,999-40-7708,,,,Caryl47,Lelia627,Kassulke119,...,2016-01-23T15:21:41Z,2016-01-23T15:49:37Z,4569671e-ed39-055f-8e78-422b96c9896b,683b1746-f326-9578-a172-798a96f53146,http://snomed.info/sct,34043003,Dental consultation and report (procedure),431.4,103697008.0,Patient referral for dental care (procedure)
3,4569671e-ed39-055f-8e78-422b96c9896b,2013-08-10,,999-40-7708,,,,Caryl47,Lelia627,Kassulke119,...,2016-01-23T15:49:37Z,2016-01-23T16:21:38Z,4569671e-ed39-055f-8e78-422b96c9896b,683b1746-f326-9578-a172-798a96f53146,http://snomed.info/sct,225362009,Dental care (regime/therapy),431.4,103697008.0,Patient referral for dental care (procedure)
4,4569671e-ed39-055f-8e78-422b96c9896b,2013-08-10,,999-40-7708,,,,Caryl47,Lelia627,Kassulke119,...,2016-01-23T16:21:38Z,2016-01-23T16:49:01Z,4569671e-ed39-055f-8e78-422b96c9896b,683b1746-f326-9578-a172-798a96f53146,http://snomed.info/sct,1260009003,Removal of supragingival plaque and calculus f...,431.4,103697008.0,Patient referral for dental care (procedure)


In [16]:
merged_df['BIRTHDATE'] = pd.to_datetime(merged_df['BIRTHDATE'])

# Calculate the age
merged_df['AGE'] = merged_df['BIRTHDATE'].apply(lambda x: (pd.Timestamp(date.today()) - x).days // 365)
merged_df['AGE']

0          11
1          11
2          11
3          11
4          11
           ..
3741525    64
3741526    64
3741527    64
3741528    64
3741529    64
Name: AGE, Length: 3741530, dtype: int64

In [17]:
from ollama import Client

# Initialize the LLaMA API
llama_api = Client(host="http://localhost:11434")

# Define the problem
problem = "A 35-year-old patient presents with symptoms of fever, headache, and fatigue. What is the most likely diagnosis?"

# Break down the problem into simpler steps
steps = [
    "What are the common causes of fever, headache, and fatigue?",
    "What are the distinguishing features of each possible diagnosis?",
    "Which diagnosis is most consistent with the patient's symptoms?"
]

# Use LLaMA API to reason about each step
for step in steps[:1]:
    response = llama_api.generate(model="llama3.2", prompt=step)
    print(f"Step: {step} \n\n\n")
    print(f"Response: {response.response}")
    print()

Step: What are the common causes of fever, headache, and fatigue? 



Response: Fever, headache, and fatigue are common symptoms that can be caused by a variety of factors. Here are some common causes:

**Fever:**

1. Infections: Bacterial infections such as pneumonia, urinary tract infections, and skin infections can cause fever.
2. Viral infections: Common cold, flu, mononucleosis, and other viral infections can lead to fever.
3. Allergic reactions: Allergies to medications, foods, or insect bites can trigger a fever response.
4. Autoimmune disorders: Certain conditions like rheumatoid arthritis, lupus, and multiple sclerosis can cause recurring fevers.

**Headache:**

1. Tension headaches: Tight muscles in the neck and scalp can lead to tension headaches.
2. Migraines: A type of headache that can be triggered by hormonal changes, stress, or certain foods.
3. Sinus infections: Inflammation of the sinuses can cause pain and pressure, leading to a headache.
4. Dehydration: Not drinking

In [18]:
merged_df = merged_df.dropna(axis=1, how='all')
# Save the merged DataFrame to a single CSV file
merged_df.to_csv("synthea_data.csv", index=False)

In [19]:
len(list(merged_df.columns))

58

In [20]:
merged_df.columns

Index(['Id', 'BIRTHDATE', 'SSN', 'DRIVERS', 'PASSPORT', 'PREFIX', 'FIRST',
       'MIDDLE', 'LAST', 'MAIDEN', 'MARITAL', 'RACE', 'ETHNICITY', 'GENDER',
       'BIRTHPLACE', 'ADDRESS', 'CITY', 'STATE', 'COUNTY', 'FIPS', 'ZIP',
       'LAT', 'LON', 'HEALTHCARE_EXPENSES', 'HEALTHCARE_COVERAGE', 'INCOME',
       'START_x', 'PATIENT_x', 'ENCOUNTER_x', 'CODE_x', 'SYSTEM_x',
       'DESCRIPTION_x', 'TYPE', 'CATEGORY', 'REACTION1', 'DESCRIPTION1',
       'SEVERITY1', 'REACTION2', 'DESCRIPTION2', 'SEVERITY2', 'START_y',
       'STOP_y', 'PATIENT_y', 'ENCOUNTER_y', 'SYSTEM_y', 'CODE_y',
       'DESCRIPTION_y', 'START', 'STOP', 'PATIENT', 'ENCOUNTER', 'SYSTEM',
       'CODE', 'DESCRIPTION', 'BASE_COST', 'REASONCODE', 'REASONDESCRIPTION',
       'AGE'],
      dtype='object')

In [21]:
merged_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3741530 entries, 0 to 3741529
Data columns (total 58 columns):
 #   Column               Dtype         
---  ------               -----         
 0   Id                   object        
 1   BIRTHDATE            datetime64[ns]
 2   SSN                  object        
 3   DRIVERS              object        
 4   PASSPORT             object        
 5   PREFIX               object        
 6   FIRST                object        
 7   MIDDLE               object        
 8   LAST                 object        
 9   MAIDEN               object        
 10  MARITAL              object        
 11  RACE                 object        
 12  ETHNICITY            object        
 13  GENDER               object        
 14  BIRTHPLACE           object        
 15  ADDRESS              object        
 16  CITY                 object        
 17  STATE                object        
 18  COUNTY               object        
 19  FIPS                 

In [22]:
import pandas as pd
import numpy as np
from ollama import Client

client = Client(host="http://localhost:11434")

In [23]:
# Define a function to generate prompts
def generate_prompts(df):
    prompts = []
    for index, row in df.iterrows():
        prompt = f"Here's the details details about a patient:\n\n"
        prompt += f"Patient Name: {row['FIRST']}  {row['LAST']}\n"
        prompt += f"GENDER: {row['GENDER']}\n"
        prompt += f"Age: {row['AGE']}\n"
        prompt += f"Description: {row['DESCRIPTION']} CODE: {row['CODE']}\n"
        prompt += f"Reason: {row['REASONCODE']} CODE:{row['REASONDESCRIPTION']}\n"
        prompt += f"REACTIONs: REACTION: {row['REACTION1']}, DESCRIPTION: {row['DESCRIPTION1']}, SEVERITY1: {row['SEVERITY1']} \n"
        prompt += f"REACTIONs: REACTION: {row['REACTION2']}, DESCRIPTION: {row['DESCRIPTION2']}, SEVERITY1: {row['SEVERITY2']} \n"

        prompt += f"Given the medical diagnoses code, what do you infer about the patient"
        prompts.append(prompt)
    return prompts

df = merged_df
new_df = df[df['REACTION1'].notnull()]
# Generate prompts
prompts = generate_prompts(new_df[10:11])
print(f"Prompt: {prompts} \n\n")
# Generate text summaries
for prompt in prompts:
    response = client.generate(model="llama3.2", prompt=prompt)
    print(response.response)
print("response completed")

Prompt: ["Here's the details details about a patient:\n\nPatient Name: Caryl47  Kassulke119\nGENDER: F\nAge: 11\nDescription: Patient referral for dental care (procedure) CODE: 103697008\nReason: nan CODE:nan\nREACTIONs: REACTION: 878820003.0, DESCRIPTION: Rhinoconjunctivitis (disorder), SEVERITY1: MODERATE \nREACTIONs: REACTION: 247472004.0, DESCRIPTION: Wheal (finding), SEVERITY1: MILD \nGiven the medical diagnoses code, what do you infer about the patient"] 


Based on the provided information, I can make some inferences about the patient:

1. The patient is an 11-year-old female.
2. She requires dental care for a procedure with code 103697008, but the reason for this procedure is not specified.
3. She has two medical diagnoses:
   * Rhinoconjunctivitis (disorder) with moderate severity (REACTION: 878820003.0)
   * Wheal (finding) with mild severity (REACTION: 247472004.0)

The presence of both conditions, particularly the allergic reaction symptoms indicated by "Rhinoconjunctivitis

In [24]:
def zero_shot_learning(df):
    # Implement zero-shot learning
    # Generate a prompt for each row in the dataframe
    prompts = [f"What is the likely outcome for a patient with {row['DESCRIPTION']}?" for index, row in df.iterrows()]
    
    print(f"Prompt:{prompts} \n\n")
    # Generate a response for each prompt
    responses = [client.generate(model="llama3.2", prompt=prompt).response for prompt in prompts]
    
    # Return the responses
    return "\n".join(responses)

In [25]:
def analogical_reasoning(df):
    # Implement analogical reasoning
    # Generate a prompt for each row in the dataframe
    prompts = [f"What is the analogy between {row['DESCRIPTION']} and a similar medical condition?" for index, row in df.iterrows()]
    print(f"Prompt:{prompts} \n\n")
    # Generate a response for each prompt
    responses = [client.generate(model="llama3.2", prompt=prompt).response for prompt in prompts]
    
    # Return the responses
    return "\n".join(responses)

In [26]:
def common_sense_reasoning(df):
    # Implement common sense reasoning
    # Generate a prompt for each row in the dataframe
    prompts = [f"What is the common sense approach to treating a patient with {row['DESCRIPTION']}?" for index, row in df.iterrows()]
    print(f"Prompt:{prompts} \n\n")
    # Generate a response for each prompt
    responses = [client.generate(model="llama3.2", prompt=prompt).response for prompt in prompts]
    
    # Return the responses
    return "\n".join(responses)

In [27]:
def chain_of_thought_reasoning(df):
    # Implement chain-of-thought reasoning
    # Generate a prompt for each row in the dataframe
    prompts = [f"What is the likely outcome for a patient with {row['DESCRIPTION']}?" for index, row in df.iterrows()]
    print(f"Prompt:{prompts} \n\n")
    # Generate a response for each prompt
    responses = [client.generate(model="llama3.2", prompt=prompt).response for prompt in prompts]
    
    # Return the responses
    return "\n".join(responses)

In [28]:
def combine_reasoning_and_in_context_learning(df, reasoning_method):
    if reasoning_method == "zero_shot":
        # Implement zero-shot learning
        response = zero_shot_learning(df)
    elif reasoning_method == "analogical_reasoning":
        # Implement analogical reasoning
        response = analogical_reasoning(df)
    elif reasoning_method == "common_sense_reasoning":
        # Implement common sense reasoning
        response = common_sense_reasoning(df)
    else:
        # Default to chain-of-thought reasoning
        response = chain_of_thought_reasoning(df)

    return response

In [29]:
response = combine_reasoning_and_in_context_learning(df[:2], "zero_shot")
print(response)

Prompt:['What is the likely outcome for a patient with Medication reconciliation (procedure)?', 'What is the likely outcome for a patient with Patient referral for dental care (procedure)?'] 


The likely outcome for a patient undergoing medication reconciliation, also known as Medication Reconciliation (MR), is improved safety and quality of care. The goal of MR is to verify the accuracy of a patient's medication regimen and identify any discrepancies or errors.

By comparing the patient's current medications with their previous medications, medical professionals can:

1. Identify potential medication errors or omissions.
2. Verify dosages, frequencies, and routes of administration.
3. Update medication lists to ensure consistency across all healthcare providers.
4. Reduce medication-related adverse events, such as overdose or underdose.

Studies have shown that MR can lead to significant improvements in patient safety, including:

1. Reduced medication errors: A study published in th

In [30]:
response = combine_reasoning_and_in_context_learning(df[:2], "analogical_reasoning")
print(response)

Prompt:['What is the analogy between Medication reconciliation (procedure) and a similar medical condition?', 'What is the analogy between Patient referral for dental care (procedure) and a similar medical condition?'] 


One possible analogy between Medication Reconciliation (MR) procedure and a similar medical condition is the concept of "Misadventure" in Critical Care.

In Critical Care, Misadventure refers to an adverse event or near miss that occurs during patient care, often due to a combination of factors such as human error, system failures, or equipment malfunctions. Like Medication Reconciliation, Misadventure can have significant consequences for patients, including harm, injury, or even death.

In both Medication Reconciliation and Misadventure, the following elements are present:

1. **Identification of errors**: In MR, errors in medication orders or administration are identified through a systematic process. Similarly, in Misadventure, errors or near misses are identified

In [31]:
response = combine_reasoning_and_in_context_learning(df[:2], "common_sense_reasoning")
print(response)

Prompt:['What is the common sense approach to treating a patient with Medication reconciliation (procedure)?', 'What is the common sense approach to treating a patient with Patient referral for dental care (procedure)?'] 


The common sense approach to treating a patient with medication reconciliation involves several steps:

1. **Review the patient's medical history**: Gather all relevant information about the patient's current and past medications, including dosages, frequencies, and potential allergies or sensitivities.
2. **Identify discrepancies**: Compare the patient's current medication list with their previous lists, laboratory results, and other healthcare records to identify any discrepancies or gaps in care.
3. **Prioritize the most critical medications**: Focus on identifying and resolving issues related to high-risk medications, such as warfarin or anticoagulants, opioids, or sedatives.
4. **Use a standardized process**: Follow a systematic approach to medication reconcili

In [32]:
# Define the Chain of Thought method
def chain_of_thought(patient_data):
    # Step 1: Analyze medical history
    medical_history = patient_data["DESCRIPTION_x"]
    prompt = f"Analyze the medical history: {medical_history}. Determine the risk of adverse reaction."
    response = client.generate(model="llama3.2", prompt=prompt)
    result = response.response
    
    # Step 2: Analyze demographic information
    age = patient_data["AGE"]
    prompt = f"Analyze the demographic information: age {age}. Determine the risk of adverse reaction."
    response = client.generate(model="llama3.2", prompt=prompt)
    result += " " + response.response
    
    # Step 3: Analyze medication history
    medication_history = patient_data["DESCRIPTION_y"]
    prompt = f"Analyze the medication history: {medication_history}. Determine the risk of adverse reaction."
    response = client.generate(model="llama3.2", prompt=prompt)
    result += " " + response.response
    
    return result

# Define the Tree of Thought method
def tree_of_thought(patient_data):
    # Node 1: Medical history
    medical_history = patient_data["DESCRIPTION_x"]
    prompt = f"Analyze the medical history: {medical_history}. Determine the risk of adverse reaction."
    response = client.generate(model="llama3.2", prompt=prompt)
    result = response.response
    
    # Node 1.1: Allergy type
    allergy_type = patient_data["DESCRIPTION1"]
    if pd.notna(allergy_type):
        prompt = f"Analyze the allergy type: {allergy_type}. Determine the risk of adverse reaction."
        response = client.generate(model="llama3.2", prompt=prompt)
        result += " " + response.response
    
    # Node 2: Demographic information
    age = patient_data["AGE"]
    prompt = f"Analyze the demographic information: age {age}. Determine the risk of adverse reaction."
    response = client.generate(model="llama3.2", prompt=prompt)
    result += " " + response.response
    
    return result

In [33]:
df= merged_df[:2] # to speed up the execution
len(df)

2

In [34]:
# Apply the Chain of Thought method
df.loc[:, "CHAIN_OF_THOUGHT_RISK"] = df.apply(chain_of_thought, axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.loc[:, "CHAIN_OF_THOUGHT_RISK"] = df.apply(chain_of_thought, axis=1)


In [35]:
# Apply the Tree of Thought method
df.loc[:, "TREE_OF_THOUGHT_RISK"] = df.apply(tree_of_thought, axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.loc[:, "TREE_OF_THOUGHT_RISK"] = df.apply(tree_of_thought, axis=1)


In [36]:
pd.set_option('display.max_colwidth', None)  # Set to None

# Print the results
print(df[["CHAIN_OF_THOUGHT_RISK"]].head())

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        

In [37]:
print(df[["TREE_OF_THOUGHT_RISK"]].head())

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        

# Bonus

In [38]:
df = merged_df[merged_df['DESCRIPTION'].notnull()][:100]
len(df)

100

In [39]:
pd.set_option('display.max_colwidth', None)

In [40]:
df[['AGE','REASONDESCRIPTION', 'DESCRIPTION', 'REASONCODE']]

Unnamed: 0,AGE,REASONDESCRIPTION,DESCRIPTION,REASONCODE
0,11,,Medication reconciliation (procedure),
1,11,,Patient referral for dental care (procedure),
2,11,Patient referral for dental care (procedure),Dental consultation and report (procedure),103697008.0
3,11,Patient referral for dental care (procedure),Dental care (regime/therapy),103697008.0
4,11,Patient referral for dental care (procedure),Removal of supragingival plaque and calculus from all teeth using dental instrument (procedure),103697008.0
...,...,...,...,...
95,11,,Patient referral for dental care (procedure),
96,11,Patient referral for dental care (procedure),Dental consultation and report (procedure),103697008.0
97,11,Patient referral for dental care (procedure),Dental care (regime/therapy),103697008.0
98,11,Patient referral for dental care (procedure),Removal of supragingival plaque and calculus from all teeth using dental instrument (procedure),103697008.0


In [41]:
from ollama import Client

# Initialize the Ollama client
client = Client(host="http://localhost:11434")

def classify_patient(patient_data):
    # Use the generate method to generate text based on the patient's description
    prompt = f"You are a medical risk evaluator, given a patient age: {patient_data['AGE']}, Procedure description: {patient_data['DESCRIPTION']}, "
    prompt += f"reason code: {patient_data['REASONCODE']}, reason Description: {patient_data['REASONDESCRIPTION']}"
    prompt += " you need to classify the patient as 'high-risk' or 'low-risk' To make responses fast, Please limit your response to high-risk or low-risk"        
    response = client.generate(model="llama3.2", prompt=prompt)

    # Use a simple classification approach to classify the generated text
    generated_text = response.response.lower()
#     print(f"{generated_text} \n")
    if "high-risk" in generated_text:
        return "high-risk"
    elif "low-risk" in generated_text:
        return "low-risk"
    else:
        return "unknown"

# Apply the classification function
df.loc[:, "CLASSIFICATION"] = df.apply(classify_patient, axis=1)

# Print the results
print(df[["AGE", "DESCRIPTION", "REASONCODE","REASONDESCRIPTION", "CLASSIFICATION"]].head())

   AGE  \
0   11   
1   11   
2   11   
3   11   
4   11   

                                                                                       DESCRIPTION  \
0                                                            Medication reconciliation (procedure)   
1                                                     Patient referral for dental care (procedure)   
2                                                       Dental consultation and report (procedure)   
3                                                                     Dental care (regime/therapy)   
4  Removal of supragingival plaque and calculus from all teeth using dental instrument (procedure)   

    REASONCODE                             REASONDESCRIPTION CLASSIFICATION  
0          NaN                                           NaN      high-risk  
1          NaN                                           NaN       low-risk  
2  103697008.0  Patient referral for dental care (procedure)       low-risk  
3  103697008.0

In [42]:
df['CLASSIFICATION'].value_counts()

CLASSIFICATION
low-risk     58
high-risk    42
Name: count, dtype: int64

In [43]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.impute import SimpleImputer
import numpy as np

# Handle missing values
df['REASONCODE'] = df['REASONCODE'].fillna('Unknown')
df['REASONDESCRIPTION'] = df['REASONDESCRIPTION'].fillna('Unknown')

# Split the data into training and validation sets
train_df, val_df = train_test_split(df, test_size=0.8, random_state=42)

# Define the feature columns
numeric_features = ['AGE']
text_features = ['DESCRIPTION', 'REASONCODE', 'REASONDESCRIPTION']
target_variable = 'CLASSIFICATION'

# Create separate dataframes for numeric and text features
numeric_train_df = train_df[numeric_features]
text_train_df = train_df[text_features]

numeric_val_df = val_df[numeric_features]
text_val_df = val_df[text_features]

# Create a preprocessing pipeline for numeric features
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),  # Impute missing values with the median
    ('scaler', StandardScaler())
])

# Create a preprocessing pipeline for text features
text_transformer = Pipeline(steps=[
    ('vectorizer', TfidfVectorizer()),
])

# Create a pipeline that includes the preprocessing step and the logistic regression model
numeric_model = Pipeline(steps=[
    ('preprocessor', numeric_transformer),
    ('classifier', LogisticRegression())
])

text_model = Pipeline(steps=[
    ('preprocessor', text_transformer),
    ('classifier', LogisticRegression())
])

# Ensure that text_train_df and y_train have the same number of rows
text_train_df = text_train_df.dropna(subset=text_features)  # Drop rows with missing values
y_train = train_df.loc[text_train_df.index, target_variable]  # Select rows with matching index

# Train the model on the training data
X_text_train = text_train_df.values.tolist()
X_text_train = [' '.join(map(str, row)) for row in X_text_train]

numeric_model.fit(numeric_train_df, y_train)
text_model.fit(X_text_train, y_train)

# Make predictions on the validation data
y_val = val_df[target_variable]
numeric_pred = numeric_model.predict(numeric_val_df)

X_text_val = text_val_df.values.tolist()
X_text_val = [' '.join(map(str, row)) for row in X_text_val]
text_pred = text_model.predict(X_text_val)

# Combine the predictions from the numeric and text models
y_pred = np.where(numeric_pred == text_pred, numeric_pred, numeric_pred)

# Evaluate the model's performance
accuracy = accuracy_score(y_val, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:")
print(classification_report(y_val, y_pred))
print("Confusion Matrix:")
print(confusion_matrix(y_val, y_pred))

Accuracy: 0.5625
Classification Report:
              precision    recall  f1-score   support

   high-risk       0.00      0.00      0.00        35
    low-risk       0.56      1.00      0.72        45

    accuracy                           0.56        80
   macro avg       0.28      0.50      0.36        80
weighted avg       0.32      0.56      0.40        80

Confusion Matrix:
[[ 0 35]
 [ 0 45]]


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


# PMC_LLaMA_13B

In [2]:
import transformers
tokenizer = transformers.LlamaTokenizer.from_pretrained('axiong/PMC_LLaMA_13B',force_download=True)
model = transformers.LlamaForCausalLM.from_pretrained('axiong/PMC_LLaMA_13B',force_download=True)



tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/21.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/96.0 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/735 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/601 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/33.4k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/6 [00:00<?, ?it/s]

pytorch_model-00001-of-00006.bin:   0%|          | 0.00/9.96G [00:00<?, ?B/s]

pytorch_model-00002-of-00006.bin:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

pytorch_model-00003-of-00006.bin:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

pytorch_model-00004-of-00006.bin:   0%|          | 0.00/9.87G [00:00<?, ?B/s]

pytorch_model-00005-of-00006.bin:   0%|          | 0.00/9.87G [00:00<?, ?B/s]

pytorch_model-00006-of-00006.bin:   0%|          | 0.00/2.49G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/6 [00:00<?, ?it/s]

  return torch.load(checkpoint_file, map_location="cpu")


generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

In [3]:
prompt_input = (
    'Below is an instruction that describes a task, paired with an input that provides further context.'
    'Write a response that appropriately completes the request.\n\n'
    '### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:'
)

example = {
    "instruction": "You're a doctor, kindly address the medical queries according to the patient's account. Answer with the best option directly.",
    "input": (
        "###Question: A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. "
        "She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. "
        "She otherwise feels well and is followed by a doctor for her pregnancy. "
        "Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air."
        "Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. "
        "Which of the following is the best treatment for this patient?"
        "###Options: A. Ampicillin B. Ceftriaxone C. Doxycycline D. Nitrofurantoin"
    )
}

input_str = [prompt_input.format_map(example)]

model_inputs = tokenizer(
    input_str,
    return_tensors='pt',
    padding=True,
)
print( f"\033[32mmodel_inputs\033[0m: { model_inputs }" )

[32mmodel_inputs[0m: {'input_ids': tensor([[    1, 13866,   338,   385, 15278,   393, 16612,   263,  3414, 29892,
          3300,  2859,   411,   385,  1881,   393,  8128,  4340,  3030, 29889,
          6113,   263,  2933,   393,  7128,  2486,  1614,  2167,   278,  2009,
         29889,    13,    13,  2277, 29937,  2799,  4080, 29901,    13,  3492,
         29915,   276,   263, 11619, 29892, 25036,  3211,   278, 16083,  9365,
          5034,   304,   278, 16500, 29915, 29879,  3633, 29889,   673,   411,
           278,  1900,  2984,  4153, 29889,    13,    13,  2277, 29937, 10567,
         29901,    13,  2277, 29937, 16492, 29901,   319, 29871, 29906, 29941,
         29899,  6360, 29899,  1025,   758,  5138,   424,  6114,   472, 29871,
         29906, 29906, 11405,  7737,   362, 22981,   411, 25535,  2501,  5065,
          3381, 29889,  2296,  5922,   372,  4687, 29871, 29896,  2462,  8020,
           322,   756,  1063,   281,   943,  8333, 15020, 13748,   292,   901,
          4094,

In [4]:
%%time
topk_output = model.generate(
    model_inputs.input_ids, #.cuda(),
    max_new_tokens=1000,
    top_k=50
)
output_str = tokenizer.batch_decode(topk_output)
print('model predict: ', output_str[0])



model predict:  <s> Below is an instruction that describes a task, paired with an input that provides further context.Write a response that appropriately completes the request.

### Instruction:
You're a doctor, kindly address the medical queries according to the patient's account. Answer with the best option directly.

### Input:
###Question: A 23-year-old pregnant woman at 22 weeks gestation presents with burning upon urination. She states it started 1 day ago and has been worsening despite drinking more water and taking cranberry extract. She otherwise feels well and is followed by a doctor for her pregnancy. Her temperature is 97.7°F (36.5°C), blood pressure is 122/77 mmHg, pulse is 80/min, respirations are 19/min, and oxygen saturation is 98% on room air.Physical exam is notable for an absence of costovertebral angle tenderness and a gravid uterus. Which of the following is the best treatment for this patient?###Options: A. Ampicillin B. Ceftriaxone C. Doxycycline D. Nitrofurantoi