# Generate clinical notes with AI using AWS HealthScribe

> *This notebook is compatible with SageMaker kernels `Data Science 3.0` or `conda_python3` on an `ml.t3.medium` instance*

## Introduction

This notebook shows how to use AWS HealthScribe Python APIs to invoke the service and how to integrate it with other AWS services.

## Setup

Update boto3 SDK to version **`1.33.0`** or higher. This is the minimum version that supports HealthScribe APIs.

In [1]:
!pip install botocore boto3 awscli --upgrade



Verify that the correct boto3 version is installed. Expected version is **`1.33.0`** or higher.

In [2]:
!python3 -c "import boto3; print(boto3.__version__)"

1.34.158


## 1. Batch Transcription Using Python SDK

#### 1.1. Starting an AWS HealthScribe job
Invoking **`start_medical_scribe_job`** API to start a transcription job:

In [3]:
import os
import time
import boto3
import json

transcribe = boto3.client('transcribe', 'us-east-1') # connect to the HealthScribe API service in the us-east-1 region

This variable defines the name of the transcription job that will be created in HealthScribe.

In [11]:
job_name = "LowerBackPain"

The s3_input_uri variable defines the S3 URI where the input audio is located. In the cell below, replace the following constants with the appropriate details for your environment:
- **`[S3_BUCKET_NAME]`**: input S3 bucket URI
- **`[OBJECT_NAME]`**: file name including the extension (e.g. knee-consultation.m4a)
- **`[IAM_ROLE]`**: arn of the IAM role that will be used by HealthScribe.

In [5]:
s3_bucket = 'sample-data-024848443355-22hcl401'
object_name = 'lower-back-consultation.m4a'
iam_role = 'arn:aws:iam::024848443355:role/generate-clinical-notes-SageMakerNotebookRole-JGe7ip5jK3Gl'

More settings can be defined for the `start_medical_transcribe_job`, such as `VocabularyName` which is the name of a custom vocabulary to include in the transcription job, `VocabularyFilterName` the name of the custom vocabulary with words to be filtered in the transcription job along with `VocabularyFilterMethod`, which is how the custom vocabulary will be filtered (replace, delete, flag).

The following job will create 2 files within the output S3 bucket, `transcript.json` which is the whole transcript and `summary.json` which is a brief summary. Both these files names will be created within a folder with the name `job_name` variable we have defined.

In [12]:
s3_input_uri = f"s3://{s3_bucket}/{object_name}"

output_bucket_name = s3_bucket

response = transcribe.start_medical_scribe_job(
    MedicalScribeJobName = job_name,
    Media = {
      'MediaFileUri': s3_input_uri
    },
    OutputBucketName = output_bucket_name,
    DataAccessRoleArn = f"{iam_role}",
    Settings = {
      'ShowSpeakerLabels': True, # show speaker labels
      'MaxSpeakerLabels': 2, # maximum amount of speakers in audio (mandatory to set because of ShowSpeakerLabels)
      'ChannelIdentification': False # since ShowSpeakerLabels is set to true, ChannelIdentification should be set to false according to documentation
    }
)
print(response)

{'MedicalScribeJob': {'MedicalScribeJobName': 'LowerBackPain-v2', 'MedicalScribeJobStatus': 'IN_PROGRESS', 'Media': {'MediaFileUri': 's3://sample-data-024848443355-22hcl401/lower-back-consultation.m4a'}, 'StartTime': datetime.datetime(2024, 8, 10, 15, 24, 44, 64000, tzinfo=tzlocal()), 'CreationTime': datetime.datetime(2024, 8, 10, 15, 24, 44, 47000, tzinfo=tzlocal()), 'Settings': {'ShowSpeakerLabels': True, 'MaxSpeakerLabels': 2, 'ChannelIdentification': False}, 'DataAccessRoleArn': 'arn:aws:iam::024848443355:role/generate-clinical-notes-SageMakerNotebookRole-JGe7ip5jK3Gl'}, 'ResponseMetadata': {'RequestId': '99a09de5-2c0e-443a-8c93-1b3ce5ae4bb1', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '99a09de5-2c0e-443a-8c93-1b3ce5ae4bb1', 'content-type': 'application/x-amz-json-1.1', 'content-length': '459', 'date': 'Sat, 10 Aug 2024 15:24:43 GMT'}, 'RetryAttempts': 0}}


#### 1.2. Checking job status

The code below will invoke HealthScribe's **`get_medical_scribe_job`** API to retrieve the status of the job we started in the previous step. If the status is not Completed or Failed, the code waits 5 seconds to retry until the job reaches a final state.

In [13]:
while True:
    status = transcribe.get_medical_scribe_job(MedicalScribeJobName = job_name) # get the status of the previous medical scribe job
    if status['MedicalScribeJob']['MedicalScribeJobStatus'] in ['COMPLETED', 'FAILED']: # check if completed or failed
        break
    print("Not ready yet...")
    time.sleep(5)
   
print("Job status: " + status.get('MedicalScribeJob').get('MedicalScribeJobStatus'))

# Get time metadata for the medical scibe job
start_time = status.get('MedicalScribeJob').get('StartTime')
completion_time = status.get('MedicalScribeJob').get('CompletionTime')
diff = completion_time - start_time

print("Job duration: " + str(diff))
print("Transcription file: " + status.get('MedicalScribeJob').get('MedicalScribeOutput').get('TranscriptFileUri'))
print("Summary file: " + status.get('MedicalScribeJob').get('MedicalScribeOutput').get('ClinicalDocumentUri'))

Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Not ready yet...
Job status: COMPLETED
Job duration: 0:02:33.186000
Transcription file: https://s3.us-east-1.amazonaws.com/sample-data-024848443355-22hcl401/LowerBackPain-v2/transcript.json
Summary file: https://s3.us-east-1.amazonaws.com/sample-data-024848443355-22hcl401/LowerBackPain-v2/summary.json


#### 1.3. Analysing the scribe results
The code below will download the **`summary.json`** file generated by HealthScribe, will parse the file and extract the treatment plan.

In [14]:
s3 = boto3.client('s3', 'us-east-1') # connect to s3 service in us-east-1 region

bucket = output_bucket_name
transcription_file = job_name + "/transcript.json" # set transcript file name variable
summary_file = job_name + "/summary.json"# set summary file name variable

obj = s3.get_object(Bucket=bucket, Key=summary_file) # get S3 bucket
summary_json = json.loads(obj['Body'].read())
plan_list = summary_json.get("ClinicalDocumentation").get("Sections")[5].get("Summary")

print("Plan:")
plan = ""
for item in plan_list:
    plan = plan + "\n" + item.get("SummarizedSegment")
print(plan)

Plan:

Lower back pain
  - Start physical therapy for a minimum of 4 weeks, ideally 6 weeks, to help relieve the lower back pain without waiting for the X-ray results. 

  - Take an X-ray of the lower back to check for any underlying issues causing the pain.

  - Continue taking over-the-counter pain medications as needed in the meantime.

  - Use a chair with lumbar support or position a pillow for lumbar support while sitting for long periods.

Sleep
  - Improve sleep habits by separating from the toddler at night to allow uninterrupted sleep and a more comfortable sleeping position.

Exercise
  - Start light exercises like walking or yoga to improve posture and handle the pain.

Follow up
  - Reassess the situation after physical therapy - if pain does not improve, order a CT scan of the lower back to further investigate the likely cause of sciatica (pinched nerve in the lower back radiating pain to legs).



Store the plan as environment variable to be used later:

In [15]:
# save plan to be used later with Bedrock in a different notebook
%store plan

Stored 'plan' (str)


There are several sections within the summary, such as `PAST_MEDICAL_HISTORY`, `ASSESSMENT`, `PAST_FAMILY_HISTORY`, `DIAGNOSTIC_TESTING` and more. These could be especially useful when creating a patient file and want to keep track of this information. 

In [17]:
history_list = summary_json.get("ClinicalDocumentation").get("Sections")[3].get("Summary")

print("History:")
plan = ""
for item in history_list:
    plan = plan + "\n" + item.get("SummarizedSegment")
print(plan)

History:

- Lower back pain for the past six weeks, radiating down the legs, making it hard to stand or sleep
- No other significant past medical history reported
