# Cancer Patient  
- Multiple office visit notes are read
- Compiled in a string (no particular chronological order)
- chatGPT parses and list chronologically with
    - Clinical Findings
    - Medications
    - Lab Values
    
    
#### Two Years After Coding Changes Sought to Decrease Documentation, Notes Remain ‘Bloated’

<br>
<div style="text-align: center;"><IMG SRC="Pics/Note_Bloat_Epic.png"  width="600" height="600" ALT="image">  
<em>1.7 Billion Notes</em></div>
<br>


    
**Key Findings:**
- The average length of clinical notes has continued to increase over the last three years despite coding changes enacted in 2021 meant to reduce documentation burden.
- However, *time spent writing notes by providers has slightly decreased over the same time*.
- While overall average note length increased, *around 40% of providers reduced their average note length*.
- <span style="color:red">**1.7 billion**</span> clinical notes written by <span style="color:red">**166,318**</span> outpatient providers in the U.S. from May 2020 to April 2023 
- Average note length across all clinical notes has increased 8.1%
    - 4,628 characters in May 2020
    - 5,002 characters in April 2023

In [1]:
import os
import openai
import webbrowser
import datetime 
import re
from datetime import datetime
import subprocess


from chatGPT_utilities import setup_OpenAI_API
from chatGPT_utilities import json_to_dict
from chatGPT_utilities import get_completion
from chatGPT_utilities import get_completion_from_messages_FULL
from chatGPT_utilities import read_file
from chatGPT_utilities import token_usage
from chatGPT_utilities import print_token_usage
from chatGPT_utilities import write_to_file

openai.api_key = setup_OpenAI_API()
print(f'Current Directory:\n{os.getcwd()}')
starting_directory = os.getcwd()


starting_directory /Users/michael/Coding
Directory:/Users/michael/Coding
Current Directory:
/Users/michael/Coding


In [2]:
# url_chatGPT_usage = 'https://platform.openai.com/account/usage'
# webbrowser.open_new(url_chatGPT_usage)


In [3]:
def read_patient_notes():
    Cancer_Patient_Notes_file = 'Cancer_Patient_Notes'
    os.chdir(Cancer_Patient_Notes_file)

    list_of_notes = [i for i in os.listdir() if i.endswith('txt') and '_20' in i]

    compilation_of_notes = ''
    print(f'Processing')
    for num, name in enumerate(list_of_notes, start=1): 
        print("{0:3}: {1}".format(num, name))
    print('Note : The files are not necessarily in chronological order\n\n\n')

    for file_name in list_of_notes:

    # for date in ['Aug_20']:
        print(f'\033[91m                                     File Name: \033[1m{file_name}\033[0m') 
        interim_note = read_file(file_name)
        print(interim_note)
        compilation_of_notes= compilation_of_notes + '  \n\n' + interim_note
    print('\n\n\n\n\n\n\n\033[91m'+ '\033[1m\t\t\t--------------------------COMPILATION of NOTES--------------------------\033[0m') 
    print(compilation_of_notes)
    os.chdir('..')

    if os.getcwd() != starting_directory:
        os.chdir(starting_directory)
    assert os.getcwd() == starting_directory , 'We should be in starting directory'
    return compilation_of_notes

# Read clinical visit notes  

- Notes will not necessarily be retrieved in chronological order

![alt text](patient_notes_1.png "Here is Title")


In [4]:
compilation_of_notes = read_patient_notes()

Processing
  1: Nov_20.txt
  2: Oct_20.txt
  3: Sept_20.txt
  4: Aug_20.txt
Note : The files are not necessarily in chronological order



[91m                                     File Name: [1mNov_20.txt[0m
Date: November 20


Following the completion of induction chemotherapy, he is experiencing a mix of physical and emotional symptoms. Today his ANC 900, HCT 30

Physically, he is still recovering from the side effects of chemotherapy. Reports fatigue, weakness, and a general sense of low energy. Nausea and vomiting gradually subsiding. Has reduced appetite, and has lost 2 kg since initial diagnosis. Emotionally, the patient feeling some uncertainty as awaiting the results of the treatment response evaluation and bone marrow assessment. Will continue to provide supportive care, address any ongoing symptoms, and offer emotional support prn.


[91m                                     File Name: [1mOct_20.txt[0m
Date: Oct 20
This 65-year-old white male was 
diagnosed with acute my

# Extract Diagnosis and Synopsis of Visits   

From the compilation of notes below, extract the information relevant to patient's condition. 

Patient's Diagnosis:  
Date of Diagnosis:


List the dates of visits in chronological order and include:  
- Clinical findings
- lab values   
- Medications


For each visit:
- Date # 1 : 
    - Clinical Findings
    - Medications: 
    - HCT :
    - ANC : 


- Date # 2 : 
    - Clinical Findings
    - Medications: 
    - HCT :
    - ANC : 


- Date # 3 : 
    - Clinical Findings
    - Medications: 
    - HCT :
    - ANC : 


In [5]:
prompt_1 = f"""
Your task is to extract relevant information from 
clinical note. 

From the compilation of notes below, delimited by triple quotes \
extract the information relevant to patient's condition. 

Patient's Diagnosis:
Date of Diagnosis:


List the dates of visits in chronological order and 
include clinical findings, lab values and any medications (including any chemotherapy medications) mentioned during that visit
 in the following format:

Date # 1 - 
    Clinical Findings    
    Medications:
    Lab Values:
    
Date # 2 - 
    Clinical Findings   
    Medications:…
Date # N - 
    Clinical Findings
    Medications:


Review: ```{compilation_of_notes}```
"""

print('\n\n\n\n\033[91m'+ '\033[1mSTART of QUERY\033[0m') 
start_time = datetime.now() 
print('start_time:',  start_time.strftime("%H:%M:%S"))
messages = [{"role": "user", "content": prompt_1}]
response_details = get_completion_from_messages_FULL(messages)
response = response_details.choices[0].message["content"]
    # print(response)

# response = get_completion(prompt)


end_time = datetime.now() 
print(response, '\n'*2)
print('end_time  :', end_time.strftime("%H:%M:%S"))
elapsed_time = end_time - start_time
print('Elapsed_time: ',elapsed_time.seconds,'.',int(elapsed_time.microseconds/1000),' seconds',sep='')
cost = print_token_usage(response_details)
print(f'Cost to run query: ${cost}')






[91m[1mSTART of QUERY[0m
start_time: 15:46:59
Patient's Diagnosis: Acute myeloid leukemia (AML)
Date of Diagnosis: August 2020

List of visits in chronological order:

Date: August 20
    Clinical Findings: Fatigue, lethargy
    Medications: Statin, finasteride
    Lab Values: ANC 1500, HCT 27

Date: September 20
    Clinical Findings: Fatigue, lethargy
    Medications: 7+3 regimen (cytarabine, daunorubicin)
    Lab Values: Not mentioned

Date: October 20
    Clinical Findings: Fatigue, lethargy, decreased energy levels, mild nausea, occasional vomiting, decreased appetite, weight loss
    Medications: Arabinosylcytosine, midostaurin
    Lab Values: ANC 1250, HCT 29

Date: November 20
    Clinical Findings: Fatigue, weakness, low energy, nausea, vomiting subsiding, reduced appetite, weight loss
    Medications: Not mentioned
    Lab Values: ANC 900, HCT 30 


end_time  : 15:47:05
Elapsed_time: 6.443 seconds
[91m[1m

Token Usage:[0m
Cost to run query is based on token usage, w

# Extract only the ANC & HCT


Patient's Diagnosis:  
Date of Diagnosis:


For each visit:
- Date # 1 : 
    - HCT :
    - ANC : 


- Date # 2 : 
    - HCT :
    - ANC : 


- Date # 3 : 
    - HCT :
    - ANC : 


In [6]:
def get_completion_from_messages_FULL_and_costs(prompt, ShowTimes = True):
    """
    : Input   : prompt, which is a string
    : Output  : response and token usage
    
    """

    
    start_time = datetime.now() 
    messages = [{"role": "user", "content": prompt}]
    response_details = get_completion_from_messages_FULL(messages)
    response = response_details.choices[0].message["content"]
    end_time = datetime.now() 
    elapsed_time = end_time - start_time
    print(response, '\n'*2)
    if ShowTimes:
        print('\033[91m'+ '\033[1m' + 'Time to Execute Task'+ '\033[0m')
        print('Start  :', start_time.strftime("%H:%M:%S"))
        print('End    :', end_time.strftime("%H:%M:%S"))
        print('Elapsed: ',elapsed_time.seconds,'.',int(elapsed_time.microseconds/1000),' seconds',sep='')
    cost = print_token_usage(response_details)
    print(f'Cost to run query: ${cost}')
    return response

In [7]:
prompt_2 = f"""
Your task is to extract relevant information from clinical note. 

From the compilation of notes below, delimited by triple quotes \
extract the information relevant to patient's condition. 

Patient's Diagnosis:
Date of Diagnosis:


List the dates of visits in chronological order and include the lab values in the following format
Date # 1 : 
   HCT :
   ANC : 
    
Date # 2 : 
   HCT :
   ANC : 

Date # N :
   HCT :
   ANC : 

Do not list any additional information in the synopsis other than lab values.
If no lab values are listed for a given date, indicate 'N/A'

Review: ```{compilation_of_notes}```
"""

In [8]:
response = get_completion_from_messages_FULL_and_costs(prompt_2)

Patient's Diagnosis: Acute myeloid leukemia (AML)
Date of Diagnosis: August 2020

List of visits and lab values:

Date # 1: August 20
   HCT: 27
   ANC: 1500

Date # 2: September 20
   HCT: N/A
   ANC: N/A

Date # 3: October 20
   HCT: 29
   ANC: 1250

Date # 4: November 20
   HCT: 30
   ANC: 900 


[91m[1mTime to Execute Task[0m
Start  : 15:47:05
End    : 15:47:08
Elapsed: 3.145 seconds
[91m[1m

Token Usage:[0m
Cost to run query is based on token usage, which is $0.002 / 1K tokens
Completion tokens:        118
Prompt tokens    :        736
Total tokens     :        854
Cost to run query: $0.002


# Summary Paragraph

In [9]:
prompt_3 = f"""
As a medical chart abstractor, 
your task is to summarize the compilation of clinic notes.


List the dates of visits in chronological order and include the lab values in the following format
Date # 1 : 
summary paragraph
ANC:    
HCT:

Date # 2 : 
summary paragraph
ANC:    
HCT:


Be sure to include the dates of each clinic visit
For each clinic visit, list the Summary first and the ANC HCT below that if it is mentioned
Be sure to list each visit in chronological order


Review: ```{compilation_of_notes}```

"""

response = get_completion_from_messages_FULL_and_costs(prompt_3)

Dates of visits in chronological order:

1. August 20
2. September 20
3. October 20
4. November 20

Date 1: August 20
Summary: Juan Sanchez, a 65-year-old Hispanic male, presented with fatigue and lethargy. He is currently taking statin and finasteride. ANC: 1500, HCT: 27.

Date 2: September 20
Summary: After undergoing induction chemotherapy with the "7+3" regimen, Juan Sanchez achieved complete remission. He is scheduled for consolidation therapy.

Date 3: October 20
Summary: Juan Sanchez, currently undergoing consolidation therapy, is experiencing common side effects of chemotherapy. These include fatigue, lethargy, decreased energy levels, mild nausea, occasional vomiting, decreased appetite, and weight loss. ANC: 1250, HCT: 29. He is on Arabinosylcytosine and midostaurin.

Date 4: November 20
Summary: Following the completion of induction chemotherapy, Juan Sanchez is experiencing physical and emotional symptoms. Physically, he is still recovering from chemotherapy side effects, i

In [10]:
file_name = 'ANC_and_HCT'
write_to_file(file_name,compilation_of_notes,response)

_ = subprocess.call(['open',   file_name + '.txt'])

# Assemble Prompts in a List of Prompts

In [11]:
prompt_list = [prompt_1,prompt_2,prompt_3]

for prompt in prompt_list:
    relevant_details = prompt.split("\n\nReview:")[0]
    print(f'\033[91m\033[1m {relevant_details}\033[0m') 
    response = get_completion_from_messages_FULL_and_costs(prompt)

[91m[1m 
Your task is to extract relevant information from 
clinical note. 

From the compilation of notes below, delimited by triple quotes extract the information relevant to patient's condition. 

Patient's Diagnosis:
Date of Diagnosis:


List the dates of visits in chronological order and 
include clinical findings, lab values and any medications (including any chemotherapy medications) mentioned during that visit
 in the following format:

Date # 1 - 
    Clinical Findings    
    Medications:
    Lab Values:
    
Date # 2 - 
    Clinical Findings   
    Medications:…
Date # N - 
    Clinical Findings
    Medications:
[0m
Patient's Diagnosis: Acute myeloid leukemia (AML)
Date of Diagnosis: August 2020

Dates of visits in chronological order with clinical findings, medications, and lab values:

Date # 1 - August 20
    Clinical Findings: Fatigue, lethargy
    Medications: Statin, finasteride
    Lab Values: ANC 1500, HCT 27

Date # 2 - September 20
    Clinical Findings: Fatigue

In [13]:
def find_substrings(text):
    pattern = r'Date[\s\S]*?\n+'
    matches = re.findall(pattern, text)
    return matches


substrings = find_substrings(compilation_of_notes)
if substrings:
    for substring in substrings:
        print(substring.strip())
else:
    print("Substrings not found.")

Date: November 20
Date: Oct 20
Date : Sept 20
Date : August 20


In [15]:
# for substr in substrings:
#     compilation_of_notes = compilation_of_notes.replace(substr, '\033[1m' + substr + '\033[0m')

    
# print(compilation_of_notes)