## Extract medications

This example uses an LM Studio and an off-the-shelf LLM to extract medications from patient-doctor dialogue into structured machine readable format (i.e. JSON).

### Load data

In [1]:
import pandas as pd
ds = pd.read_csv(
    filepath_or_buffer="clinical_notes_corpus/data/mts-dialog/MTS_Dataset_TrainingSet.csv",
    index_col=0
)
ds.head(5)


Unnamed: 0_level_0,section_header,section_text,dialogue
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,GENHX,The patient is a 76-year-old white female who ...,Doctor: What brings you back into the clinic t...
1,GENHX,The patient is a 25-year-old right-handed Cauc...,Doctor: How're you feeling today? \nPatient: ...
2,GENHX,"This is a 22-year-old female, who presented to...","Doctor: Hello, miss. What is the reason for yo..."
3,MEDICATIONS,Prescribed medications were Salmeterol inhaler...,Doctor: Are you taking any over the counter me...
4,CC,"Burn, right arm.","Doctor: Hi, how are you? \nPatient: I burned m..."


In [2]:
ds["section_header"].unique()


array(['GENHX', 'MEDICATIONS', 'CC', 'PASTMEDICALHX', 'ALLERGY',
       'FAM/SOCHX', 'PASTSURGICAL', 'OTHER_HISTORY', 'ASSESSMENT', 'ROS',
       'DISPOSITION', 'EXAM', 'PLAN', 'DIAGNOSIS', 'EDCOURSE',
       'IMMUNIZATIONS', 'LABS', 'IMAGING', 'PROCEDURES', 'GYNHX'],
      dtype=object)

In [3]:
header = "MEDICATIONS"
record = 4
dialogue = ds.query(f"section_header=='{header}'")["dialogue"].iloc[record]
for i, line in enumerate(dialogue.splitlines()):
    print(f"{i:03d} {line}")


000 Doctor: I am looking at the list of your medication.
001 Patient: Okay.
002 Doctor: All I can see is Risperdal and Invega.
003 Patient: I think that is correct.
004 Doctor: Are you not taking anything else?
005 Patient: Yes, I take some multivitamins, but do I need to mention it?
006 Doctor: Yes, you should. We need to know everything.
007 Patient: Okay.


### Load LLM model

In [4]:
import lmstudio as lms
model = lms.llm("llama-3.3-70b-instruct")


### Using LLM

The **system_prompt** creates the instructions for the LLM to extract information into JSON format.

In [5]:
system_prompt = "Your job is to extract information of medications from a conversation between a Doctor and Patient (tagged [DIALOGUE]) into structured JSON format. For each medication found, populate the keys 'medication_name', 'line'. 'line' refers to the dialogue line number which the medication was found. Return only the JSON string without additional tags."


Here we will send our instructions and dialogue to the LLM and retrieve our report.

In [6]:
prompt = (
    f"{system_prompt}\n"
    f"[DIALOGUE]\n{dialogue}"
)

output = ""
for fragment in model.respond_stream(prompt):
    print(fragment.content, end="", flush=True)
    output += fragment.content


[
  {
    "medication_name": "Risperdal",
    "line": 2
  },
  {
    "medication_name": "Invega",
    "line": 2
  },
  {
    "medication_name": "multivitamins",
    "line": 5
  }
]

The final step is to convert JSON in text format into structured data (i.e. list of dictionaries).

In [7]:
import json
from pprint import pprint

output_json = json.loads(output)
pprint(output_json)


[{'line': 2, 'medication_name': 'Risperdal'},
 {'line': 2, 'medication_name': 'Invega'},
 {'line': 5, 'medication_name': 'multivitamins'}]
