![electronic_medical_records](electronic_medical_records.png)

Medical professionals often summarize patient encounters in transcripts written in natural language, which include details about symptoms, diagnosis, and treatments. These transcripts can be used for other medical documentation, such as for insurance purposes, but as they are densely packed with medical information, extracting the key data accurately can be challenging.  

You and your team at Lakeside Healthcare Network have decided to leverage the OpenAI API to automatically extract medical information from these transcripts and automate the matching with the appropriate ICD-10 codes. ICD-10 codes are a standardized system used worldwide for diagnosing and billing purposes, such as insurance claims processing.

## The Data
The dataset contains anonymized medical transcriptions categorized by specialty.

## transcriptions.csv
| Column     | Description              |
|------------|--------------------------|
| `"medical_specialty"` | The medical specialty associated with each transcription.  |
| `"transcription"` | Detailed medical transcription texts, with insights into the medical case. |

In [5]:
# Import the necessary libraries
import pandas as pd
from openai import OpenAI
import json

In [6]:
# Load the data
df = pd.read_csv("data/transcriptions.csv")
df.head()

Unnamed: 0,medical_specialty,transcription
0,Allergy / Immunology,"SUBJECTIVE:, This 23-year-old white female pr..."
1,Orthopedic,"CHIEF COMPLAINT:, Achilles ruptured tendon.,H..."
2,Bariatrics,"PREOPERATIVE DIAGNOSIS: , Morbid obesity.,POST..."
3,Cardiovascular / Pulmonary,"PREOPERATIVE DIAGNOSES,Airway obstruction seco..."
4,Urology,"CHIEF COMPLAINT:, Urinary retention.,HISTORY ..."


In [7]:
# Initialize the OpenAI client
client = OpenAI()

## Start coding here, use as many cells as you need

In [8]:
# Define the function for extracting age and recommended treatment
extract_data_tool = [
    {
        "type": "function",
        "function": {
            "name": "extract_patient_info",
            "description": "Extracts the patient's age and recommended treatment or procedure from a medical transcription.",
            "parameters": {
                "type": "object",
                "properties": {
                    "age": {
                        "type": "integer",
                        "description": "The age of the patient in years.",
                    },
                    "recommended_treatment": {
                        "type": "string",
                        "description": "The recommended treatment or procedure for the patient.",
                    },
                },
                "required": ["age", "recommended_treatment"],
            },
        },
    }
]

# List to store extracted data
structured_data = []

# Iterate through each row of the DataFrame
for index, row in df.iterrows():
    transcription = row["transcription"]
    medical_specialty = row["medical_specialty"]

    # Message for extracting age and recommended treatment
    messages_extract_data = [
        {"role": "system", "content": "You are a highly skilled medical assistant tasked with extracting key information from patient transcripts."},
        {"role": "user", "content": f"Extract the patient's age and the recommended treatment or procedure from the following medical transcription: {transcription}"}
    ]

    # Call the API with function calling for data extraction
    response_extract = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=messages_extract_data,
        tools=extract_data_tool,
        tool_choice={"type": "function", "function": {"name": "extract_patient_info"}}
    )

    # Extract the arguments from the tool call
    tool_calls = response_extract.choices[0].message.tool_calls
    if tool_calls:
        function_args = json.loads(tool_calls[0].function.arguments)
        age = function_args.get("age")
        recommended_treatment = function_args.get("recommended_treatment")
    else:
        age = None
        recommended_treatment = None

    # Message for ICD-10 code matching
    if recommended_treatment:
        messages_icd_code = [
            {"role": "system", "content": "You are a medical coding assistant. Your task is to provide the most appropriate ICD-10 code for a given medical treatment or procedure. Provide only the ICD-10 code, nothing else."},
            {"role": "user", "content": f"What is the ICD-10 code for: {recommended_treatment}"}
        ]

        # Call the API without function calling for ICD-10 code
        response_icd = client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages_icd_code,
            max_tokens=10 # ICD-10 codes are typically short
        )
        icd_code = response_icd.choices[0].message.content.strip()
    else:
        icd_code = None

    # Append the extracted data to the list
    structured_data.append({
        "age": age,
        "medical_specialty": medical_specialty,
        "recommended_treatment": recommended_treatment,
        "icd_code": icd_code
    })

# Create the final DataFrame
df_structured = pd.DataFrame(structured_data)

# Display the head of the structured DataFrame
print(df_structured.head())

   age  ...  icd_code
0   23  ...    Z71.89
1   41  ...  S86.011A
2   30  ...   0DB74Z8
3   50  ...   0B120F4
4   66  ...    Z51.81

[5 rows x 4 columns]
