## Extract medications and match to SNOMED CT-AU (AMT)

This example uses LM Studio and an off-the-shelf large language model (LLM) to extract medications from patient-doctor dialogue into structured machine readable format (i.e. JSON).

Building on previous notebook [02_extract_medications](examples/02_extract_medications.ipynb), we will attempt to match the medications to SNOMED concept codes.

Tools used:
- Python Data Analysis Library (Pandas) ([link](https://pandas.pydata.org/))
- LM Studio Python SDK ([link](https://lmstudio.ai/docs/python)). Note that the LM Studio desktop application must be running and the server is enabled within the application.

Data used:
- Clinical visit note summarisation corpus ([link](https://github.com/microsoft/clinical_visit_note_summarization_corpus))
- SNOMED CT-AU with AMT ([link](https://www.digitalhealth.gov.au/healthcare-providers/product-releases/snomed-ct-au-with-australian-medicines-terminology-amt-march-2025-release))

New to SNOMED CT-AU? Check out CSIRO's Shrimp Browser ([link](https://ontoserver.csiro.au/shrimp/?concept=138875005&valueset=http://snomed.info/sct?fhir_vs&fhir=https://tx.ontoserver.csiro.au/fhir))

In [1]:
# This cell loads required packages
import pandas as pd
import lmstudio as lms


### Step 1: Load data as a Pandas DataFrame

In [2]:
df = pd.read_csv(
    filepath_or_buffer="clinical_notes_corpus/data/mts-dialog/MTS_Dataset_TrainingSet.csv",
    index_col=0
)
df.head(5)


Unnamed: 0_level_0,section_header,section_text,dialogue
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,GENHX,The patient is a 76-year-old white female who ...,Doctor: What brings you back into the clinic t...
1,GENHX,The patient is a 25-year-old right-handed Cauc...,Doctor: How're you feeling today? \nPatient: ...
2,GENHX,"This is a 22-year-old female, who presented to...","Doctor: Hello, miss. What is the reason for yo..."
3,MEDICATIONS,Prescribed medications were Salmeterol inhaler...,Doctor: Are you taking any over the counter me...
4,CC,"Burn, right arm.","Doctor: Hi, how are you? \nPatient: I burned m..."


### Step 2: Select data

The dataset contains a number of different types of dialogues.

In [3]:
df["section_header"].unique()


array(['GENHX', 'MEDICATIONS', 'CC', 'PASTMEDICALHX', 'ALLERGY',
       'FAM/SOCHX', 'PASTSURGICAL', 'OTHER_HISTORY', 'ASSESSMENT', 'ROS',
       'DISPOSITION', 'EXAM', 'PLAN', 'DIAGNOSIS', 'EDCOURSE',
       'IMMUNIZATIONS', 'LABS', 'IMAGING', 'PROCEDURES', 'GYNHX'],
      dtype=object)

In this cell, we select an example dialogue from the dataset.

In [4]:
# Specify the type of data we need
header = "MEDICATIONS"

# Specify what record number we are going to use
record = 4

# This line fetches the dialogue text
dialogue = df.query(f"section_header=='{header}'")["dialogue"].iloc[record]

# This prints out the dialogue line by line, while adding line numbers
for i, line in enumerate(dialogue.splitlines()):
    print(f"{i:03d} {line}")


000 Doctor: I am looking at the list of your medication.
001 Patient: Okay.
002 Doctor: All I can see is Risperdal and Invega.
003 Patient: I think that is correct.
004 Doctor: Are you not taking anything else?
005 Patient: Yes, I take some multivitamins, but do I need to mention it?
006 Doctor: Yes, you should. We need to know everything.
007 Patient: Okay.


### Step 3: Load LLM model

In this cell, we link to a model that has been loaded in the desktop version of LM Studio. Make sure that LM Studio application is running and the server is turned on!

In [5]:
model = lms.llm()


### Step 4: Using LLM

The **system_prompt** creates the instructions for the LLM to extract information into JSON format.

In [6]:
system_prompt = "Your job is to extract information of medications from a conversation between a Doctor and Patient (tagged [DIALOGUE]) into structured JSON format. For each medication found, populate the keys 'medication_name', 'source'. 'source' is the line number from the dialogue in the format '(xx line yy)' where xx is either Patient or Doctor and yy is the line number in the dialogue where the medication was found. Return only the JSON string without additional tags."


Here we will send our instructions and dialogue to the LLM and retrieve our report.

In [7]:
prompt = (
    f"{system_prompt}\n"
    f"[DIALOGUE]\n{dialogue}"
)

output = ""
for fragment in model.respond_stream(prompt, config={"temperature": 0.0}):
    print(fragment.content, end="", flush=True)
    output += fragment.content


[
  {
    "medication_name": "Risperdal",
    "source": "(Doctor line 3)"
  },
  {
    "medication_name": "Invega",
    "source": "(Doctor line 3)"
  },
  {
    "medication_name": "multivitamins",
    "source": "(Patient line 6)"
  }
]

The final step is to convert JSON in text format into machine readable format (i.e. list of dictionaries).

In [8]:
import json
from pprint import pprint

medications = json.loads(output)
pprint(medications)


[{'medication_name': 'Risperdal', 'source': '(Doctor line 3)'},
 {'medication_name': 'Invega', 'source': '(Doctor line 3)'},
 {'medication_name': 'multivitamins', 'source': '(Patient line 6)'}]


### Step 5: Load SNOMED concept codes

In [9]:
from src.snomed_ct import get_concept_codes, CONCEPT_GROUP

# We use the same process as described in the notebook:
# examples/03_snomed_ct_amt.ipynb
concept_codes = get_concept_codes(
    desc_path="ncts_sct_rf2/Full/Terminology/sct2_Description_Full-en-au_AU1000036_20250331.txt",
    rel_path="ncts_sct_rf2/Full/Terminology/sct2_Relationship_Full_AU1000036_20250331.txt",
    group=CONCEPT_GROUP.PRODUCT_NAME,
    limit=None # Use none to find everything
)

print(concept_codes.shape)
print(concept_codes.columns)
concept_codes.head(5)


(13684, 2)
Index(['conceptId', 'term'], dtype='object')


Unnamed: 0,conceptId,term
39007292,391000036103,Anastrozole (Chemmart) (trade product)
39007484,401000036100,Letrozole (Chemmart) (trade product)
39007634,421000036105,Quetiapine (Chemmart) (trade product)
39007696,431000036107,Anzole (trade product)
39007770,421000168103,Wart Removal System (Scholl) (trade product)


### Step 6: Match medication names to concept codes

In this cell, we:
- Loop through each of the extracted medications,
- Use fuzzy matching to extract the most similar concept term, then
- Look up the concept ID.

In [10]:
from rapidfuzz.process import extract

# Loop through each medication
for medication in medications:
    query = medication["medication_name"]
    print(f"Extracted medication: {query}")

    # Use fuzzy matching to extract the most similar concept term
    matches = extract(
        query=query,
        choices=concept_codes["term"].tolist(), # We are only using concept term
        limit=5
    )
    # Returns [choice, similarity score, index of choice]

    # Match to concept code
    matches = pd.DataFrame([
        {
            "term": term,
            "similarity_score": similarity_score,
            "concept_code": str(concept_codes.iloc[index]["conceptId"])
        }
        for term, similarity_score, index in matches
    ])

    display(matches)
    print()


Extracted medication: Risperdal


Unnamed: 0,term,similarity_score,concept_code
0,Risperdal Quicklet (trade product),90.0,10371000168103
1,Risperdal (trade product),90.0,15481000168100
2,Risperdal Consta (trade product),90.0,37741000168104
3,Risperidone (Apo) (trade product),74.117647,631000168102
4,Risperidone (Actavis) (trade product),74.117647,9261000168107



Extracted medication: Invega


Unnamed: 0,term,similarity_score,concept_code
0,Invega (trade product),90.0,2081000168101
1,Invega Sustenna (trade product),90.0,26461000168105
2,Invega Trinza (trade product),90.0,857421000168101
3,Invega Hafyera (trade product),90.0,1625181000168106
4,Invast (trade product),72.0,945591000168100



Extracted medication: multivitamins


Unnamed: 0,term,similarity_score,concept_code
0,Penta-vite Multivitamins with Iron for Kids 1 ...,83.076923,750311000168100
1,Penta-vite Multivitamins for Infants 0 to 3 Ye...,83.076923,751271000168106
2,Multivitamin (Health Plus Vitamins) (trade pro...,79.2,1113431000168107
3,Multivitamin (Voost) (trade product),79.2,1444461000168105
4,Multivitamin plus Iron Kids 2 to 12 Years (Pen...,79.2,1523181000168101



