# Notebook for extracting whether features are present from chatbot transcripts


### How it works: 

This takes: 
- a specification 'assessment_template.xlsx' that has a column identifying each piece of information you'd like to assess for in the transcript. This file must be in an ASSESSMENT_DIR, which is where the output will be written. Each sheet in the assessment_template should have information features pertinent to a single diagnosis (e.g. Cardiac chest pain, esophageal dysphagia, etc.)
- a folder that contains all the transcripts (TRANSCRIPT_DIR)

The script then extracts the text from each transcripts and feeds it to an LLM, asking it to evaluate whether each piece of information was assessed by the doctor, and if so whether they patient responded that it was present or not. 

It then feeds the response text to an LLM, reformats it, and outputs it back to a new excel spreadsheet. 

In [53]:
import pandas as pd
import numpy as np
from markitdown import MarkItDown
import llm
from openai import OpenAI
from pydantic import BaseModel
import os
from typing import List, Optional
from tabulate import tabulate
from IPython.display import display
from dotenv import load_dotenv

load_dotenv()  # looks for a .env file in the current dir by default
print(os.getenv("OPENAI_API_KEY"))

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


Setup

In [54]:
# This script will read in all transcripts in this directory to be analyzed. 
TRANSCRIPT_DIR = r'/Users/reblocke/Research/dx_chat_entropy/Chatbot Transcripts/'

# The script will pull in the features from this template (which should be in the assessment dir), 
# and output the resulting assessments here. 
ASSESSMENT_DIR = r'/Users/reblocke/Research/dx_chat_entropy/Assessments/'
ASSESSMENT_TEMPLATE = os.path.join(ASSESSMENT_DIR, r'asssessment_template.xlsx')

# Note, you also need an OpenAI API key that should be saved using the LLM package. 
# Create a file ".env" in the working folder that contains OPENAI_API_KEY=your-secret-key

In [55]:
# Make array of all transcript file names to be ingested
pdf_filepaths = [
    os.path.join(TRANSCRIPT_DIR, f)
    for f in os.listdir(TRANSCRIPT_DIR)
    if f.endswith(".pdf")
]
print("PDF Filepaths:")
for filepath in pdf_filepaths:
    print(filepath)

PDF Filepaths:
/Users/reblocke/Research/dx_chat_entropy/Chatbot Transcripts/Intermtn MS4 2 Transcript.pdf
/Users/reblocke/Research/dx_chat_entropy/Chatbot Transcripts/transcript FM PGY1.pdf
/Users/reblocke/Research/dx_chat_entropy/Chatbot Transcripts/Intermtn MS4 1 Transcript.pdf


In [56]:
# Pipeline for ingesting transcript PDFs
md = MarkItDown()
transcripts = []

for filepath in pdf_filepaths:
    # Convert the PDF to text using MarkItDown
    result = md.convert(filepath)
    extracted_text = result.text_content
    transcripts.append({
        'filename': os.path.basename(filepath),
        'text_content': extracted_text
    })
transcripts_df = pd.DataFrame(transcripts)

display(transcripts_df)
#print(tabulate(transcripts_df, headers = 'keys', tablefmt = 'fancy_grid'))
# If desired, you can save it to a CSV
# df.to_csv(os.path.join(TRANSCRIPT_DIR, "transcripts.csv"), index=False)

Unnamed: 0,filename,text_content
0,Intermtn MS4 2 Transcript.pdf,PATIENT DOOR CHART and Learner Instructions\n\...
1,transcript FM PGY1.pdf,Patient Case\n\nPATIENT DOOR CHART and Learner...
2,Intermtn MS4 1 Transcript.pdf,PATIENT DOOR CHART and Learner Instructions\n\...


In [57]:
# Create instruction part of the prompt
instruction_prompt = """You are a research assistant who is meticulously reviewing transcripts of interview for a research project. 

Task: You will be given a list of pieces of information that a doctor might ask a patient about. 
Your goal is to read the transcript and score it by whether the doctor collected each piece of information. 
To do this, you must read the transcript carefully, understanding what each question and response meant. 
If an answer is asked obliquely - but a reasonable person would understand what was intended - this should count.

Return format: for each piece of information, you should answer in the following way - 

a. If the doctor asked about the piece of information, and the patient responded that the feature is present, answer "<Information>, YES". 
For example, if the information is: Chest Pain? and the doctor asked "Do you have chest pain?" and the patient answered "I do", you should respond 'Chest Pain?, YES'
If the information is: Nausea? doctor asked "Did you have nausea?" and the patient answers "I did", you should respond 'Nausea?, YES'.

b. If the doctor asked about the piece of information, and the patient responded that the feature was not present, answer '<Information>, NO' 
For example, if the question is Shortness of Breath? and the doctor asked "Are you dyspneic?" and the patient answers they are not, the answer should be 'Shortness of Breath?,NO'

c. If the doctor did not ask about the piece of information, you should return '<Information>, MISSING'

Warning: There are ONLY three ways you should ever answer for each piece of information: '<Information>, YES', '<Information>, NO', and '<Information>, NOT ASKED'. 
Never answer in other ways. 

Here are some examples:
1.
Information: 'Pain not worse with exertion (requires they clarify exercise 1hr after meal)'
Doctor at some point asks: Does the pain worsen after a meal? 
Patient: yes, it's worse
Response: "'Pain not worse with exertion (requires they clarify exercise 1hr after meal)', YES", because this is close enough for a reasonable person. 

2. 
Information: 'no prior CAD'
Doctor at some point asks: Have you ever had a heart attack? 
Patient: never
Response: "'no prior CAD', NO", because CAD stands for coronary artery disease and a heart attack is the most common manifestation.

3. 
Information: 'no diaphoresis'
Doctor never asks anything that clarifies if the patient was sweaty and never assessed it on examination
Response: "'no diaphoresis', MISSING" , because exam findings that are discussed should also count.  

Putting it all together, the response should follow this format: 
1. Pain not worse with exertion (requires they clarify exercise 1hr after meal), YES
2. "Do you have any PMHx?" (counts as 2 independent minor features), MISSING
3. no tobacco, NO
4. no associated shortness of breath, YES
5. no radiation to the neck, arm, or jaw?, MISSING
... and so on, through the entire list.

Remember, NOTHING ELSE should be in the final output. Just the information, and YES/NO/MISSING 

Here is the list of pieces of information I would like you to look for: """

In [58]:
# Processing of the specification for what assessments we want the LLM to look for

def process_sheet(sheet_data):
    """
    Processes a sheet to extract non-empty information as a list.
    """
    # TODO: this might be made more robust to mis-specification
    # Drop the Y/N column if it exists and then combine all other columns
    relevant_columns = sheet_data.drop(columns=["Y/N"], errors="ignore")
    # Flatten all non-empty values into a single list
    info_list = relevant_columns.stack().dropna().tolist()
    return info_list

diagnosis_info = {}
with pd.ExcelFile(ASSESSMENT_TEMPLATE) as spreadsheet_data:
    for sheet_name in spreadsheet_data.sheet_names:
        try:
            sheet_data = pd.read_excel(ASSESSMENT_TEMPLATE, sheet_name=sheet_name)

            if sheet_data.empty:
                print(f"Skipping empty sheet: {sheet_name}")
                continue

            # Process the sheet to extract information
            diagnosis_info[sheet_name] = process_sheet(sheet_data)
        except Exception as e:
            print(f"Error processing sheet '{sheet_name}': {e}")

for key, value in diagnosis_info.items():
    print(f"Diagnosis: {key}, Information List: {value}")

Diagnosis: Cardiac, Information List: ['Pain not worse with exertion (requires they clarify exercise 1hr after meal)', '"Do you have any PMHx?" (counts as 2 independent minor features)', 'no tobacco', 'no associated shortness of breath', 'no radiation to the neck, arm, or jaw? ', 'positional chest pain (worse when laying down)', 'What were you doing when the chest pain started? (eating)', 'Alternative cause of esoph dysphagia becomes obvious(food gets stuck or relieved by regurgitation of food)', 'no prior CAD', 'no PAD', 'no HLD', 'no prior MI', 'no DM2', 'no obesity', 'no history of stroke', 'no diaphoresis', 'Pain worse with exertion (without clarifying that it only occurs soley within an hour of eating)', 'Decreased exercise x 3 months without clarifying post-prandial food fear', 'How would you describe the pain? (tightness)', 'Pain location behind the sternum', 'FHx of heart disease (father)', 'HTN']
Diagnosis: GERD, Information List: ['Heartburn (Postprandial burning or pain)', '

In [59]:
# Iterate through all the pieces of info to make prompts for each disease (LLM called separately)
info_prompts = {}
for diagnosis, info_list in diagnosis_info.items():
    # Create the long string with the specified format
    info_prompt = "\n".join([f"Information: {info}" for info in info_list])
    info_prompts[diagnosis] = info_prompt

for key, prompt in info_prompts.items():
    print(f"Diagnosis: {key}. Prompt:\n{prompt}")

Diagnosis: Cardiac. Prompt:
Information: Pain not worse with exertion (requires they clarify exercise 1hr after meal)
Information: "Do you have any PMHx?" (counts as 2 independent minor features)
Information: no tobacco
Information: no associated shortness of breath
Information: no radiation to the neck, arm, or jaw? 
Information: positional chest pain (worse when laying down)
Information: What were you doing when the chest pain started? (eating)
Information: Alternative cause of esoph dysphagia becomes obvious(food gets stuck or relieved by regurgitation of food)
Information: no prior CAD
Information: no PAD
Information: no HLD
Information: no prior MI
Information: no DM2
Information: no obesity
Information: no history of stroke
Information: no diaphoresis
Information: Pain worse with exertion (without clarifying that it only occurs soley within an hour of eating)
Information: Decreased exercise x 3 months without clarifying post-prandial food fear
Information: How would you describe 

In [60]:
# Create the full prompts for each disease for each transcript

# Initialize a new column in transcripts_df to store the prompts
transcripts_df["full_prompts"] = None

# Iterate through each transcript in transcripts_df
for i, transcript in enumerate(transcripts_df['text_content']):
    # Create a dictionary for the current transcript's prompts
    transcript_prompts = {}
    
    # Iterate through each diagnosis and its associated info list
    for diagnosis, info_list in diagnosis_info.items():
        # Create the long string with the specified format
        info_prompt = "\n".join([f"Information: {info}" for info in info_list])
        
        # Prefix the instruction_prompt to the disease-specific prompt, and add the transcript to the end.
        full_prompt = (
            f"{instruction_prompt}\n{info_prompt}\n\n and here is the transcript to assess:\n{transcript}"
        )
        
        # Store the prompt for the current diagnosis
        transcript_prompts[diagnosis] = full_prompt
    
    # Assign the dictionary of prompts to the new column for this transcript
    transcripts_df.at[i, "full_prompts"] = transcript_prompts

display(transcripts_df)
#print(tabulate(transcripts_df, headers = 'keys', tablefmt = 'fancy_grid'))

Unnamed: 0,filename,text_content,full_prompts
0,Intermtn MS4 2 Transcript.pdf,PATIENT DOOR CHART and Learner Instructions\n\...,{'Cardiac': 'You are a research assistant who ...
1,transcript FM PGY1.pdf,Patient Case\n\nPATIENT DOOR CHART and Learner...,{'Cardiac': 'You are a research assistant who ...
2,Intermtn MS4 1 Transcript.pdf,PATIENT DOOR CHART and Learner Instructions\n\...,{'Cardiac': 'You are a research assistant who ...


Available models to use

In [61]:
for model in llm.get_models():
    print(model.model_id)

gpt-4o
gpt-4o-mini
gpt-4o-audio-preview
gpt-3.5-turbo
gpt-3.5-turbo-16k
gpt-4
gpt-4-32k
gpt-4-1106-preview
gpt-4-0125-preview
gpt-4-turbo-2024-04-09
gpt-4-turbo
o1-preview
o1-mini
gpt-3.5-turbo-instruct
hf.co/unsloth/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q8_0
hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q8_0


In [62]:
%%time
# Ollama Version - requires ollama to be installed (Mac only) and ~>16gb ram, 8gb of disk space.
# Local - too verbose and doesn't quite get the instructions right (openAI has better prompt engineering)
"""
model = llm.get_model("hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q8_0") 

# Create a new column to store the responses
transcripts_df["responses"] = None

# Iterate through each transcript in the DataFrame
for i, row in transcripts_df.iterrows():
    full_prompts = row["full_prompts"]  # Get the full_prompts dictionary for this transcript
    transcript_responses = {}  # Dictionary to store responses for this transcript
    
    # Iterate through each diagnosis and its associated prompt
    for diagnosis, prompt in full_prompts.items():
        response = model.prompt(prompt)  # Get the model's response
        transcript_responses[diagnosis] = response.text()  # Store the response text
    
    # Save the responses back into the DataFrame
    transcripts_df.at[i, "responses"] = transcript_responses

display(transcripts_df)
#print(tabulate(transcripts_df, headers = 'keys', tablefmt = 'fancy_grid'))
"""

CPU times: user 1e+03 ns, sys: 1 μs, total: 2 μs
Wall time: 3.1 μs


'\nmodel = llm.get_model("hf.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF:Q8_0") \n\n# Create a new column to store the responses\ntranscripts_df["responses"] = None\n\n# Iterate through each transcript in the DataFrame\nfor i, row in transcripts_df.iterrows():\n    full_prompts = row["full_prompts"]  # Get the full_prompts dictionary for this transcript\n    transcript_responses = {}  # Dictionary to store responses for this transcript\n    \n    # Iterate through each diagnosis and its associated prompt\n    for diagnosis, prompt in full_prompts.items():\n        response = model.prompt(prompt)  # Get the model\'s response\n        transcript_responses[diagnosis] = response.text()  # Store the response text\n    \n    # Save the responses back into the DataFrame\n    transcripts_df.at[i, "responses"] = transcript_responses\n\ndisplay(transcripts_df)\n#print(tabulate(transcripts_df, headers = \'keys\', tablefmt = \'fancy_grid\'))\n'

In [63]:
%%time
# Initialize the model
#model = llm.get_model("gpt-4o") # costs a bit more - 
model = llm.get_model("gpt-4o-mini")
model.key = os.environ["OPENAI_API_KEY"]

# Create a new column to store the responses
transcripts_df["responses"] = None

# Iterate through each transcript in the DataFrame
for i, row in transcripts_df.iterrows():
    full_prompts = row["full_prompts"]  # Get the full_prompts dictionary for this transcript
    transcript_responses = {}  # Dictionary to store responses for this transcript
    
    # Iterate through each diagnosis and its associated prompt
    for diagnosis, prompt in full_prompts.items():
        response = model.prompt(prompt)  # Get the model's response
        transcript_responses[diagnosis] = response.text()  # Store the response text
    
    # Save the responses back into the DataFrame
    transcripts_df.at[i, "responses"] = transcript_responses

display(transcripts_df)
#print(tabulate(transcripts_df, headers = 'keys', tablefmt = 'fancy_grid'))

  self._start_utcnow = datetime.datetime.utcnow()
/opt/anaconda3/lib/python3.12/site-packages/llm/default_plugins/openai_models.py:624: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/
  usage = item.usage.dict()


Unnamed: 0,filename,text_content,full_prompts,responses
0,Intermtn MS4 2 Transcript.pdf,PATIENT DOOR CHART and Learner Instructions\n\...,{'Cardiac': 'You are a research assistant who ...,{'Cardiac': '1. Pain not worse with exertion (...
1,transcript FM PGY1.pdf,Patient Case\n\nPATIENT DOOR CHART and Learner...,{'Cardiac': 'You are a research assistant who ...,{'Cardiac': '1. Pain not worse with exertion (...
2,Intermtn MS4 1 Transcript.pdf,PATIENT DOOR CHART and Learner Instructions\n\...,{'Cardiac': 'You are a research assistant who ...,{'Cardiac': '1. Pain not worse with exertion (...


CPU times: user 1.64 s, sys: 151 ms, total: 1.79 s
Wall time: 45 s


In [64]:
%%time
# Separate Call to OpenAI with structured outputs to parse into JSON format

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

# Define a class for a single question-answer pair
class InfoAnswer(BaseModel):
    info: str
    answer: Optional[bool]

# Define a container class that holds a list of question-answer pairs
class InfoAnswerList(BaseModel):
    pairs: List[InfoAnswer]

# Add a new column for storing info_answers
transcripts_df["info_answers"] = None

# Iterate through each transcript's responses
for i, row in transcripts_df.iterrows():
    # Get the responses for this transcript
    transcript_responses = row["responses"]  # Assumes "responses" is already populated as a dictionary
    
    # Create a dictionary to hold the parsed info answers for all diagnoses
    transcript_info_answers = {}
    
    # Iterate through each diagnosis and its response
    for diagnosis, response_text in transcript_responses.items():
        # Generate messages for each diagnosis
        completion = client.beta.chat.completions.parse(
            model="gpt-4o-mini",
            messages=[
                {
                    "role": "system",
                    "content": "Extract a list of pieces of information (info) and whether or not that piece of information was present (answer) as a boolean (YES = True, NO = False, MISSING = None).",
                },
                {"role": "user", "content": response_text},
            ],
            response_format=InfoAnswerList,
        )
        
        # Parse the response into the structured InfoAnswerList
        info_answer_list = completion.choices[0].message.parsed
        
        # Store the parsed result for this diagnosis
        transcript_info_answers[diagnosis] = info_answer_list
    
    # Store the parsed info answers for this transcript in the DataFrame
    transcripts_df.at[i, "info_answers"] = transcript_info_answers

display(transcripts_df)
#print(tabulate(transcripts_df, headers = 'keys', tablefmt = 'fancy_grid'))

Unnamed: 0,filename,text_content,full_prompts,responses,info_answers
0,Intermtn MS4 2 Transcript.pdf,PATIENT DOOR CHART and Learner Instructions\n\...,{'Cardiac': 'You are a research assistant who ...,{'Cardiac': '1. Pain not worse with exertion (...,{'Cardiac': pairs=[InfoAnswer(info='Pain not w...
1,transcript FM PGY1.pdf,Patient Case\n\nPATIENT DOOR CHART and Learner...,{'Cardiac': 'You are a research assistant who ...,{'Cardiac': '1. Pain not worse with exertion (...,{'Cardiac': pairs=[InfoAnswer(info='Pain not w...
2,Intermtn MS4 1 Transcript.pdf,PATIENT DOOR CHART and Learner Instructions\n\...,{'Cardiac': 'You are a research assistant who ...,{'Cardiac': '1. Pain not worse with exertion (...,{'Cardiac': pairs=[InfoAnswer(info='Pain not w...


CPU times: user 167 ms, sys: 10.1 ms, total: 177 ms
Wall time: 31.3 s


In [65]:
# Iterate through each row in transcripts_df
for i, row in transcripts_df.iterrows():
    # Get the info_answers dictionary for this transcript
    info_answers = row["info_answers"]
    transcript_filename = row["filename"]  # Get the original filename
    
    # Construct the output file name
    output_file = os.path.join(
        ASSESSMENT_DIR,
        f"answers_{transcript_filename.replace('.pdf', '.xlsx')}"
    )
    
    # Create a writer object to handle multiple sheets
    with pd.ExcelWriter(output_file, engine="openpyxl") as writer:
        # Iterate through each diagnosis in info_answers
        for diagnosis, info_answer_list in info_answers.items():
            # Create a DataFrame for the current diagnosis
            data = [{"answer": pair.answer, "info": pair.info} for pair in info_answer_list.pairs]
            df = pd.DataFrame(data)
            display(df)
            # Write the DataFrame to a sheet named after the diagnosis
            df.to_excel(writer, sheet_name=diagnosis[:31], index=False)  # Sheet name max length is 31 characters
    
    print(f"Info-Answer pairs have been written to {output_file}")

Unnamed: 0,answer,info
0,False,Pain not worse with exertion (requires they cl...
1,,Do you have any PMHx? (counts as 2 independent...
2,False,no tobacco
3,True,no associated shortness of breath
4,,"no radiation to the neck, arm, or jaw?"
5,True,positional chest pain (worse when laying down)
6,,What were you doing when the chest pain starte...
7,True,Alternative cause of esoph dysphagia becomes o...
8,,no prior CAD
9,,no PAD


Unnamed: 0,answer,info
0,,Heartburn (Postprandial burning or pain)
1,True,Reflux / regurgitation
2,,Pain location behind sternum
3,True,Positional (worse when laying down)
4,True,Alternative cause becomes obvious: esoph dysph...
5,,How would you describe the pain? (tightness… n...
6,,Do antacids help with your chest pain?
7,,No hoarse voice
8,,No dry cough
9,,No globus


Unnamed: 0,answer,info
0,,Food gets stuck
1,True,Regurgitation provides relief
2,,Pain location behind sternum
3,True,Positional chest pain (worse when laying down)
4,,How would you describe the pain? (tightness… n...
5,,Difficulty swallowing liquids
6,,Weight loss
7,,No FHx of cancer
8,True,Does not use alcohol


Unnamed: 0,answer,info
0,,Pattern of hand pain: multiple symmetric joint...
1,,Hand predominance disproportionate to other jo...
2,True,FHx of RA
3,,No morning stiffness
4,,Lack of joint swelling
5,,"No enlargement of knuckles, finger deformities..."
6,,No rheumatoid nodules


Unnamed: 0,answer,info
0,True,Alternative cause becomes obvious: esoph dysph...
1,,Raynauds phenomenon
2,,Rash (telangiectasias)
3,True,Hand pain out of proportion to other joints (m...
4,,Current heartburn or reflux
5,,Long-standing heartburn and reflux (duration o...
6,True,Difficulty swallowing liquids
7,,Weight loss
8,True,FHx of RA
9,True,no associated shortness of breath


Info-Answer pairs have been written to /Users/reblocke/Research/dx_chat_entropy/Assessments/answers_Intermtn MS4 2 Transcript.xlsx


Unnamed: 0,answer,info
0,False,Pain not worse with exertion (requires they cl...
1,True,Do you have any PMHx? (counts as 2 independent...
2,False,no tobacco
3,,no associated shortness of breath
4,True,"no radiation to the neck, arm, or jaw?"
5,,positional chest pain (worse when laying down)
6,True,What were you doing when the chest pain starte...
7,True,Alternative cause of esoph dysphagia becomes o...
8,,no prior CAD
9,,no PAD


Unnamed: 0,answer,info
0,,Heartburn (Postprandial burning or pain)
1,True,Reflux / regurgitation
2,True,Pain location behind sternum
3,,Positional (worse when laying down)
4,True,Alternative cause becomes obvious: esoph dysph...
5,True,How would you describe the pain? (tightness… n...
6,,Do antacids help with your chest pain?
7,,No hoarse voice
8,True,No dry cough
9,,No globus


Unnamed: 0,answer,info
0,True,Food gets stuck
1,True,Regurgitation provides relief
2,True,Pain location behind sternum
3,,Positional chest pain (worse when laying down)
4,True,How would you describe the pain? (tightness… n...
5,True,Difficulty swallowing liquids
6,,Weight loss
7,,No FHx of cancer
8,True,Does not use alcohol


Unnamed: 0,answer,info
0,,Pattern of hand pain: multiple symmetric joint...
1,,Hand predominance disproportionate to other jo...
2,,FHx of RA
3,,No morning stiffness
4,,Lack of joint swelling
5,,"No enlargement of knuckles, finger deformities..."
6,,No rheumatoid nodules


Unnamed: 0,answer,info
0,True,Alternative cause becomes obvious: esoph dysph...
1,,Raynauds phenomenon
2,,Rash (telangiectasias)
3,True,Hand pain out of proportion to other joints (m...
4,,Current heartburn or reflux
5,,Long-standing heartburn and reflux (duration o...
6,True,Difficulty swallowing liquids
7,,Weight loss
8,,FHx of RA
9,,no associated shortness of breath


Info-Answer pairs have been written to /Users/reblocke/Research/dx_chat_entropy/Assessments/answers_transcript FM PGY1.xlsx


Unnamed: 0,answer,info
0,False,Pain not worse with exertion (requires they cl...
1,,Do you have any PMHx? (counts as 2 independent...
2,,no tobacco
3,,no associated shortness of breath
4,False,"no radiation to the neck, arm, or jaw?"
5,True,positional chest pain (worse when laying down)
6,,What were you doing when the chest pain starte...
7,True,Alternative cause of esoph dysphagia becomes o...
8,,no prior CAD
9,,no PAD


Unnamed: 0,answer,info
0,True,Heartburn (Postprandial burning or pain)
1,True,Reflux / regurgitation
2,True,Pain location behind sternum
3,True,Positional (worse when laying down)
4,True,Alternative cause becomes obvious: esoph dysph...
5,True,How would you describe the pain? (tightness… n...
6,,Do antacids help with your chest pain?
7,,No hoarse voice
8,True,No dry cough
9,,No globus


Unnamed: 0,answer,info
0,,Food gets stuck
1,True,Regurgitation provides relief
2,True,Pain location behind sternum
3,True,Positional chest pain (worse when laying down)
4,True,How would you describe the pain? (tightness… n...
5,,Difficulty swallowing liquids
6,,Weight loss
7,,No FHx of cancer
8,,Does not use alcohol


Unnamed: 0,answer,info
0,,Pattern of hand pain: multiple symmetric joint...
1,,Hand predominance disproportionate to other jo...
2,,FHx of RA
3,,No morning stiffness
4,,Lack of joint swelling
5,,"No enlargement of knuckles, finger deformities..."
6,,No rheumatoid nodules


Unnamed: 0,answer,info
0,True,Alternative cause becomes obvious: esoph dysph...
1,,Raynauds phenomenon
2,,Rash (telangiectasias)
3,,Hand pain out of proportion to other joints (m...
4,True,Current heartburn or reflux
5,True,Long-standing heartburn and reflux (duration o...
6,,Difficulty swallowing liquids
7,,Weight loss
8,,FHx of RA
9,,no associated shortness of breath


Info-Answer pairs have been written to /Users/reblocke/Research/dx_chat_entropy/Assessments/answers_Intermtn MS4 1 Transcript.xlsx


In [None]:
# End