In [1]:
# Import libraries
import os
import re
import random
import pickle
import subprocess
import numpy as np
import pandas as pd
import datetime as dt

from tqdm import tqdm
from datetime import datetime
from collections import Counter

# 1. Setup concept extractors

Some options were [MetaMap](https://metamap.nlm.nih.gov/) and [spaCy](https://spacy.io/). 

[MetaMap](https://metamap.nlm.nih.gov/) is specific to recognizing UMLS concepts. There is a [Python wrapper](https://github.com/AnthonyMRios/pymetamap), but known to be slow and bad.

[spaCy](https://spacy.io/) is a popular NLP Python package with an extensive library for named entity recognition. It has a wide variety of [extensions](https://spacy.io/universe) and models to choose from. We're going with the following.

* [scispaCy](https://spacy.io/universe/project/scispacy) contains spaCy models for processing biomedical, scientific or clinical text. It seems easy to use and has a wide variety of concepts it can recognize, including UMLS, RxNorm, etc.

* [negspaCy](https://spacy.io/universe/project/negspacy) identifies negations using some extension of regEx. Probably useful for things like, "this pt is diabetic" v. "this pt is not diabetic." [todo: negation identification of medspacy might be better, https://github.com/medspacy/medspacy]

* [Med7](https://github.com/kormilitzin/med7) is a model trained for recognizing entities in prescription text, e.g. identifies drug name, dosage, duration, etc., which could be useful stuff to check for conflicts. 

We're going with spaCy for this.. and coming up with a coherent way to integrate entities picked up by these three extensions/models.

## i) Installations

In [2]:
import sys; sys.executable

'/opt/conda/envs/opennotes/bin/python'

In [3]:
import spacy
import scispacy

from pprint import pprint
from collections import OrderedDict

from spacy import displacy
# from scispacy.abbreviation import AbbreviationDetector # UMLS already contains abbrev. detect
from scispacy.umls_linking import UmlsEntityLinker

# should be 2.3.5 and >=0.3.0
spacy.__version__, scispacy.__version__

('2.3.5', '0.3.0')

## ii) Setting up the model

The model is used to form word/sentence embeddings for the NER task. Thus, it's important to choose model that has been tuned for our specific use case (e.g. clinical text, prescription information) so the embeddings are useful for naming the entity.

[Note to self:] one potential idea to look into if we have time remaining, something about using custom model for spacy pipeline (could we do smth with the romanov models since they've been trained specifically for conflict detection?) -- https://spacy.io/usage/v3

### a) scispaCy

For scispaCy, we set up one of their models that has been trained on biomedical data. Other models can be found [here](https://allenai.github.io/scispacy/). 

We load two models since we will be linking different entity linkers (knowledge bases that link text to named entites) later.

In [4]:
## uncomment to install model if not already installed
# !/opt/conda/envs/opennotes/bin/python -m pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.5/en_core_sci_sm-0.2.5.tar.gz

In [5]:
# for umls (general biomedical concepts)
umls_nlp   = spacy.load("en_core_sci_sm")

# for rxnorm (prescriptions)
rxnorm_nlp = spacy.load("en_core_sci_sm")

### b) Med7

For Med7, we set up their model that has been trained specifically for NER of medication-related concepts: dosage, drug names, duration, form, frequency, route of administration, and strength. The model is trained on MIMIC-III, so it should work well for us.

In [6]:
# # installs Med7 model
# !pip install https://www.dropbox.com/s/xbgsy6tyctvrqz3/en_core_med7_lg.tar.gz?dl=1

In [7]:
med7_nlp = spacy.load("en_core_med7_lg")

## iii) Adding an entity linker

The EntityLinker is a spaCy component that links to a knowledge base. The linker compares words with the concepts in the specified knowledge base (e.g. scispaCy's UMLS does some form of character overlap-based nearest neighbor search, has option to resolve abbreviations first).

[Note: Entities generally get resolved to a list of different entities. This [blog post](http://sujitpal.blogspot.com/2020/08/disambiguating-scispacy-umls-entities.html) describes one potential way to disambiguate this by figuring out "most likely" set of entities. Gonna start off with just resolving to the 1st entity tho... hopefully that's sufficient.]

### a) scispaCy

#### UMLS Linker

UMLS linker maps entities to the UMLS concept. Main parts we'll be interested in are: semantic type and concept (mainly the common name, maybe the CUI might become important later).

* _Semantic type_ is the broader category that the entity falls under, e.g. disease, pharmacologic substance, etc. See [this](https://metamap.nlm.nih.gov/Docs/SemanticTypes_2018AB.txt) for a full list.

* _Concepts_ refer to the more fundamental entity itself, e.g. pneumothorax, ventillator, etc. Many concepts can fall under a semantic type.

More info on `UmlsEntityLinker` ([source code](https://github.com/allenai/scispacy/blob/4ade4ec897fa48c2ecf3187caa08a949920d126d/scispacy/linking.py#L9))

See source code for `.jsonl` file with the knowledge base.

In [8]:
from scispacy.umls_linking import UmlsEntityLinker

# abbreviation_pipe = AbbreviationDetector(nlp) # automatically included with UMLS linker
# nlp.add_pipe(abbreviation_pipe)
umls_linker = UmlsEntityLinker(k=10,                          # number of nearest neighbors to look up from
                               threshold=0.7,                 # confidence threshold to be added as candidate
                               max_entities_per_mention=1,    # number of entities returned per concept (todo: tune)
                               filter_for_definitions=False,  # no definition is OK
                               resolve_abbreviations=True)    # resolve abbreviations before linking
umls_nlp.add_pipe(umls_linker)



#### RxNorm Linker

RxNorm linker maps entities to RxNorm, an ontology for clinical drug names. It contains about 100k concepts for normalized names for clinical drugs. It is comprised of several other drug vocabularies commonly used in pharmacy management and drug interaction, including First Databank, Micromedex, and the Gold Standard Drug Database.

More info on `RxNorm` ([NIH page](https://www.nlm.nih.gov/research/umls/rxnorm/index.html), [source code](https://github.com/allenai/scispacy/blob/2290a80cfe0948e48d8ecfbd60064019d57a6874/scispacy/linking_utils.py#L120))

See source code for `.jsonl` file with the knowledge base.

In [9]:
from scispacy.linking import EntityLinker

# rxnorm_linker = EntityLinker(resolve_abbreviations=True, name="rxnorm")
rxnorm_linker = EntityLinker(k=10,                          # number of nearest neighbors to look up from
                             threshold=0.7,                 # confidence threshold to be added as candidate
                             max_entities_per_mention=1,    # number of entities returned per concept (todo: tune)
                             filter_for_definitions=False,  # no definition is OK
                             resolve_abbreviations=True,    # resolve abbreviations before linking
                             name="rxnorm")                 # RxNorm ontology

rxnorm_nlp.add_pipe(rxnorm_linker)



### b) Med7 

No need for entity linker

### c) Negspacy [TODO]

# 2. Setup data structures

## Categorizing type of conflict

The first larger task is to categorize by the type of conflict to check for since our method will likely be different (at least for the rule based). We wrote up a short list [here](https://docs.google.com/document/d/1fEBk0JHeyQWshYWW5w_VTkaYyRfm9MBxJ9DAGoVa8Yw/edit?usp=sharing). 

To do this, we're using the semantic type that is identified by the UMLS linker. Here's a table of the semantic types we're filtering for, and which conflict they'll be used for.

Here's a [full list](https://metamap.nlm.nih.gov/Docs/SemanticTypes_2018AB.txt) of semantic types. You can look up definitions of semantic types [here](http://linkedlifedata.com/resource/umls-semnetwork/T033).

| Conflict | Semantic Type |
| --- | ----------- |
| Diagnoses-related errors | Disease or Syndrome (T047), Diagnostic Procedure(T060) |
| Inaccurate description of medical history (symptoms) | Sign or Symptom (T184) |
| Inaccurate description of medical history (operations) | Therapeutic or Preventive Procedure (T061) |
| Inaccurate description of medical history (other) | [all of the above and below] |
| Medication or allergies | Clinical Drug (T200), Pharmacologic Substance (T121) |
| Test procedures or results | Laboratory Procedure (T059), Laboratory or Test Result (T034) | 


For clarity, the concepts we'll keep from the UMLS linker are anything falling into these semantic types (which we will then categorize by type of conflict using the table above):

* T047 - Disease or Syndrome
* T121 - Pharmacologic Substance
* T023 - Body Part, Organ, or Organ Component
* T061 - Therapeutic or Preventive Procedure 
* T060 - Diagnostic Procedure
* T059 - Laboratory Procedure
* T034 - Laboratory or Test Result 
* T184 - Sign or Symptom 
* T200 - Clinical Drug

We'll store this info into a dictionary now.

<!-- Some useful def's 
Finding - 
That which is discovered by direct observation or measurement of an organism attribute or condition, including the clinical history of the patient. The history of the presence of a disease is a 'Finding' and is distinguished from the disease itself.  -->

In [10]:
SEMANTIC_TYPES = ['T047', 'T121', 'T023', 'T061', 'T060', 'T059', 'T034', 'T184', 'T200']
SEMANTIC_NAMES = ['Disease or Syndrome', 'Pharmacologic Substance', 'Body Part, Organ, or Organ Component', \
                  'Therapeutic or Preventive Procedure', 'Diagnostic Procedure', 'Laboratory Procedure', \
                  'Laboratory or Test Result', 'Sign or Symptom', 'Clinical Drug']
SEMANTIC_TYPE_TO_NAME = dict(zip(SEMANTIC_TYPES, SEMANTIC_NAMES))

SEMANTIC_TYPE_TO_NAME

{'T047': 'Disease or Syndrome',
 'T121': 'Pharmacologic Substance',
 'T023': 'Body Part, Organ, or Organ Component',
 'T061': 'Therapeutic or Preventive Procedure',
 'T060': 'Diagnostic Procedure',
 'T059': 'Laboratory Procedure',
 'T034': 'Laboratory or Test Result',
 'T184': 'Sign or Symptom',
 'T200': 'Clinical Drug'}

In [11]:
CONFLICT_TO_SEMANTIC_TYPE = {
    "diagnosis": {'T047', 'T060'},
    "med_history_symptom": {'T184'},
    "med_history_operation": {'T061'},
    "med_history_other": set(SEMANTIC_TYPES),
    "med_allergy": {'T200', 'T121'},
    "test_results": {'T059', 'T034'}
}

CONFLICT_TO_SEMANTIC_TYPE

{'diagnosis': {'T047', 'T060'},
 'med_history_symptom': {'T184'},
 'med_history_operation': {'T061'},
 'med_history_other': {'T023',
  'T034',
  'T047',
  'T059',
  'T060',
  'T061',
  'T121',
  'T184',
  'T200'},
 'med_allergy': {'T121', 'T200'},
 'test_results': {'T034', 'T059'}}

In [12]:
from data_structures import Patient,\
                            Note, PrescriptionOrders, LabResults,\
                            Sentence, Prescription, Lab

In [13]:
# from importlib import reload # python 2.7 does not require this
# import data_structures
# reload(data_structures)
# from data_structures import Patient,\
#                             Note, PrescriptionOrders, LabResults,\
#                             Sentence, Prescription, Lab

# 3. Load and process data

In [14]:
# Load MIMIC tables
notes_df  = pd.read_csv('NOTEEVENTS.csv.gz',    compression='gzip', error_bad_lines=False)
drug_df   = pd.read_csv('PRESCRIPTIONS.csv.gz', compression='gzip', error_bad_lines=False)
lab_df    = pd.read_csv('LABEVENTS.csv.gz',     compression='gzip', error_bad_lines=False)
d_lab_df  = pd.read_csv('D_LABITEMS.csv.gz',    compression='gzip', error_bad_lines=False)

  interactivity=interactivity, compiler=compiler, result=result)
  interactivity=interactivity, compiler=compiler, result=result)


#### Updated script for processing HADM ID's with consecutive physician notes (does not count the autosaves)

In [15]:
# Load HADM ID's with consecutive physician notes
if os.path.exists("hadm_ids.pkl"):
    with open("hadm_ids.pkl", "rb") as f:
        hadm_ids = pickle.load(f)
else:
    hadm_ids = []
    for hadm_id in tqdm(notes_df.HADM_ID.unique()):
        hadm_data = notes_df.loc[notes_df.HADM_ID == hadm_id]
        hadm_phys_notes = hadm_data.loc[hadm_data.CATEGORY == "Physician "]

        if len(hadm_phys_notes.CHARTTIME.unique()) > 1: # ensure > 1 unique notes (not counting autosave)
            hadm_ids.append(hadm_id)

    with open("hadm_ids.pkl", "wb") as f:
        pickle.dump(hadm_ids, f)
        
print(f"There are {len(hadm_ids)} patients with consecutive physician notes.")

There are 8158 patients with consecutive physician notes.


# 4. Generating Contradictions

Generate 25-50 examples of positive and negative contradictions, each.

For lab values: 

* Find 50-100 total data pairs (about 2-4 per patient) and insert contradiction, or label as not a contradiction

In [16]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [17]:
pd.set_option("display.max_colwidth", -1) # prints full text

  """Entry point for launching an IPython kernel.


In [18]:
from importlib import reload # python 2.7 does not require this
import data_structures
reload(data_structures)
from data_structures import Patient,\
                            Note, PrescriptionOrders, LabResults,\
                            Sentence, Prescription, Lab

In [19]:
def is_comparable_type(data_i, data_j):
    """ We only want to compare note-to-note OR note-to-structured data. 
    
    Comparable types:
    - sentence v. sentence
    - sentence v. prescription
    - sentence v. lab
    
    Uncomparable types:
    - lab v. lab 
    - lab v. prescription
    - prescription v. prescription
    """
    return (data_i.type == "sentence"     and data_j.type == "sentence") or \
           (data_i.type == "sentence"     and data_j.type == "prescription") or \
           (data_i.type == "prescription" and data_j.type == "sentence") or \
           (data_i.type == "sentence"     and data_j.type == "lab") or \
           (data_i.type == "lab"          and data_j.type == "sentence")

In [20]:
def generate_data_pairs(pat):
    processed_pairs = []  # for dataframe + csv
    data_inst_pairs = []  # for pipeline, list of tuples: ((Data 1, Data 2), label)
    pair_idx = 0

    # Iterate over all of the patient's DailyData instances (e.g. note, prescription order, lab results for same day)
    ## pat.dailydata = {[date]: [DailyData instance from that date], ...}
    for day, pat_dailydatas in pat.dailydata.items(): # pat_dailydatas is list of all DailyData instances for `day`
        print(f"********** Processing data for {day} **********")
        # Collect all the daily datas (note, prescription orders, lab results) for current day
        current_dds = []
        current_dds_features = []
        current_dds_txts = []
        current_dds_sem_types = []
        current_dds_sem_names = []
        for dd in pat_dailydatas: # iterating over DailyData instances, e.g. dd=physician note taken on `day`
            current_dds.extend(dd.datas)
            current_dds_features.extend(dd.datas_features)
            current_dds_txts.extend(dd.datas_txts)
            current_dds_sem_types.extend(dd.datas_semantic_types)
            current_dds_sem_names.extend(dd.datas_semantic_names)

        current_dds           = np.array(current_dds)
        current_dds_features  = np.array(current_dds_features)
        current_dds_txts      = np.array(current_dds_txts)
        current_dds_sem_types = np.array(current_dds_sem_types)
        current_dds_sem_names = np.array(current_dds_sem_names)

        # extract similar sentences for each semantic type
        for sem_type in SEMANTIC_TYPES:
            # data for this semantic type
            sem_type_bools   = [sem_type in x for x in current_dds_sem_types]
            sem_type_indices = np.where(sem_type_bools)[0]
            indices_map = dict(
                            zip(range(len(sem_type_indices)), 
                                sem_type_indices)
                          )  # maps regular indices in sem_type_current_dds_* lists to indices in current_dds_* lists

            sem_type_current_dds           = current_dds[sem_type_indices]
            sem_type_current_dds_features  = current_dds_features[sem_type_indices]
            sem_type_current_dds_txts      = current_dds_txts[sem_type_indices]
            sem_type_current_dds_sem_types = current_dds_sem_types[sem_type_indices]
            sem_type_current_dds_sem_names = current_dds_sem_names[sem_type_indices]

            # current_dds_featuresfor features (umls + rxnorm concepts)
            vectorizer = CountVectorizer()
            corpus = list(map(lambda x: ' '.join(x), sem_type_current_dds_features))
            if len(corpus) == 0: # skip rest if no candidate sentences exist
                continue
            X = vectorizer.fit_transform(corpus)
            X = X.toarray()

            # get cosine similarity using umls + rxnorm concepts
            similarity = cosine_similarity(X)     # larger=more similar
            sim_is, sim_js = np.where(similarity>0.5) # all pairs with at least 0.5 similarity

            for i, j in zip(sim_is, sim_js):
                data_i = sem_type_current_dds[i]
                data_j = sem_type_current_dds[j]
                # removing same sentence pairs, checking dates
                if i>j and is_comparable_type(data_i, data_j):
                    print(f"***** PAIR INDEX {pair_idx} *****")
                    print(f"Cosine similarity: {similarity[i, j]}")
                    print(f"----- Data i -----")
                    print(f">> Time: {data_i.time}\n" +\
                          f">> Type: {data_i.type}\n" +\
                          f">> Concepts: {data_i.features}\n" +\
                          f">> {data_i.txt}")
                    print(f"----- Data j -----")
                    print(f">> Time: {data_j.time}\n" +\
                          f">> Type: {data_j.type}\n" +\
                          f">> Concepts: {data_j.features}\n" +\
                          f">> {data_j.txt}")
                    print("**********************************")

                    # save
                    processed_pairs.append([data_i.txt,      data_j.txt, \
                                            data_i.time,     data_j.time, \
                                            data_i.type,     data_j.type, \
                                            data_i.features, data_j.features, \
                                            similarity[i, j], SEMANTIC_TYPE_TO_NAME[sem_type]])
            #                                 SEMANTIC_TYPE_TO_NAME[semantic_type]])

                    data_inst_pairs.append(((data_i, data_j), None))
                    pair_idx += 1

    ###############
    #### Final ####
    ###############        
    df = \
    pd.DataFrame(np.array(processed_pairs), \
                 columns=["sentence 1", "sentence 2", \
                          "time 1", "time 2", \
                          "type 1", "type 2", \
                          "concepts 1", "concepts 2", \
                          "cosine similarity", "semantic type"])
    
    return df, data_inst_pairs

## README: Store generated data here

In [205]:
generated_data_dict = {}

## Patient 1

In [None]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[0] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

In [207]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about lab values
semantic_type_ids   = CONFLICT_TO_SEMANTIC_TYPE['test_results']
semantic_type_names = [SEMANTIC_TYPE_TO_NAME[st_id] for st_id in semantic_type_ids]

is_lab = df['semantic type'].apply(lambda x: x in semantic_type_names)
lab_pairs_df = df.loc[(df['type 1'] == "lab") | (df['type 2'] == "lab") | is_lab]

lab_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
42,"Patient's Calcium, Total lab came back 7.8 mg/dL , which is abnormal.","- Renal aware - HD MWF - Monitor potassium; no kayexalate for K < 6.0 per renal - Continue epogen, nephrocaps, calcitriol, calcium # Anemia: Baseline mid-20s; at baseline.'",2131-12-23,2131-12-23 22:56:00,lab,sentence,"{CALCIUM SUPPLEMENTS, Calcium}","{kalium, Nephrocaps, CALCIUM SUPPLEMENTS, Anemia, Kidney, Huntington Disease, 1,25-dihydroxycholecalciferol, calcitriol, Potassium supplement, Calcium}",0.559017,Pharmacologic Substance
43,"Patient's Potassium lab came back 5.6 mEq/L , which is abnormal.",Potassium up slightly.',2131-12-23,2131-12-23 23:51:00,lab,sentence,"{kalium, Potassium supplement}","{kalium, Potassium supplement}",1.0,Pharmacologic Substance


In [208]:
lab_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
42,"Patient's Calcium, Total lab came back 7.8 mg/dL , which is abnormal.","- Renal aware - HD MWF - Monitor potassium; no kayexalate for K < 6.0 per renal - Continue epogen, nephrocaps, calcitriol, calcium # Anemia: Baseline mid-20s; at baseline.'",2131-12-23 00:00:00,2131-12-23 22:56:00,lab,sentence,"{CALCIUM SUPPLEMENTS, Calcium}","{kalium, Nephrocaps, CALCIUM SUPPLEMENTS, Anemia, Kidney, Huntington Disease, 1,25-dihydroxycholecalciferol, calcitriol, Potassium supplement, Calcium}",0.559017,Pharmacologic Substance
43,"Patient's Potassium lab came back 5.6 mEq/L , which is abnormal.",Potassium up slightly.',2131-12-23 00:00:00,2131-12-23 23:51:00,lab,sentence,"{kalium, Potassium supplement}","{kalium, Potassium supplement}",1.0,Pharmacologic Substance
44,"Patient's Red Blood Cells lab came back 3.22 m/uL , which is abnormal.","Height: 65 Inch Total In: 1,219 mL PO: TF: IVF: 19 mL Blood products: Total out: 0 mL 25 mL Urine: 25 mL NG: Stool: Drains: Balance: 0 mL 1,194 mL Respiratory Ventilator mode: CMV/ASSIST/AutoFlow Vt (Set): 400 (400 - 400) mL RR (Set): 18 RR (Spontaneous): 0'",2131-12-23 00:00:00,2131-12-23 23:51:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
45,"Patient's Red Blood Cells lab came back 3.22 m/uL , which is abnormal.","Height: 65 Inch Total In: 1,216 mL PO: TF: IVF: 16 mL Blood products: Total out: 0 mL 25 mL Urine: 25 mL NG: Stool: Drains: Balance: 0 mL 1,191 mL Respiratory Ventilator mode: CMV/ASSIST/AutoFlow Vt (Set): 400 (400 - 400) mL RR (Set): 18 RR (Spontaneous): 0'",2131-12-23 00:00:00,2131-12-23 22:56:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
46,Patient's Urine Appearance lab came back Hazy nan.,"Patient took trazodone two days prior to admission, which has caused confusion in the past - appears to be hepatically cleared to active metabolite that is excreted in urine; patient anuric.'",2131-12-23 00:00:00,2131-12-23 22:56:00,lab,sentence,{Uridine},"{Active Q, Uridine}",0.707107,Pharmacologic Substance
47,"Patient's Calcium, Total lab came back 7.5 mg/dL , which is abnormal.","- Renal aware - HD MWF - Monitor potassium; no kayexalate for K < 6.0 per renal - Continue epogen, nephrocaps, calcitriol, calcium # Anemia: Baseline mid-20s; at baseline.'",2131-12-23 00:00:00,2131-12-23 22:56:00,lab,sentence,"{CALCIUM SUPPLEMENTS, Calcium}","{kalium, Nephrocaps, CALCIUM SUPPLEMENTS, Anemia, Kidney, Huntington Disease, 1,25-dihydroxycholecalciferol, calcitriol, Potassium supplement, Calcium}",0.559017,Pharmacologic Substance
48,Patient's Potassium lab came back 5.0 mEq/L.,Potassium up slightly.',2131-12-23 00:00:00,2131-12-23 23:51:00,lab,sentence,"{kalium, Potassium supplement}","{kalium, Potassium supplement}",1.0,Pharmacologic Substance
74,ABG: 7.21/66/92.',"Will wean levophed; given fluids if needed.', ""Patient's last ABG showed acute respiratory acidosis.""",2131-12-23 23:51:00,2131-12-23 23:51:00,sentence,sentence,{Analysis of arterial blood gases and pH},"{Acute respiratory acidosis, Levophed, Analysis of arterial blood gases and pH}",0.797724,Laboratory Procedure
75,Repeat ABG pending.',"Will wean levophed; given fluids if needed.', ""Patient's last ABG showed acute respiratory acidosis.""",2131-12-23 23:51:00,2131-12-23 23:51:00,sentence,sentence,{Analysis of arterial blood gases and pH},"{Acute respiratory acidosis, Levophed, Analysis of arterial blood gases and pH}",0.797724,Laboratory Procedure
76,Repeat ABG pending.',ABG: 7.21/66/92.',2131-12-23 23:51:00,2131-12-23 23:51:00,sentence,sentence,{Analysis of arterial blood gases and pH},{Analysis of arterial blood gases and pH},1.0,Laboratory Procedure


In [209]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [210]:
pair_idx = 43
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "Potassium stable and normal range."
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

lab 1:	Patient's Potassium lab came back 5.6 mEq/L , which is abnormal.
sentence 2:	Potassium up slightly.'

New contradicting sentence: Potassium stable and normal range.


In [211]:
no_contradiction_pair_idx = [115, 172]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")
    
# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
sentence 1:	   Ve: 8.4 L/min    PaO2 / FiO2: 202    Physical Examination'
sentence 2:	   Ve: 9.4 L/min    PaO2 / FiO2: 202    Physical Examination'
*****************************
sentence 1:	- INR supratherapeutic, but will need heparin gtt if drifts <2    # Hypothyroidism: No acute issues.'
sentence 2:	- Hold metoprolol for now    - Hold heparin gtt given INR supratherapeutic    - continue to follow coags    # History of transient ischemic attack: Notes state the patient had a    TIA which occurred when coumadin held during a previous admission.'
*****************************


## Patient 2

In [212]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[1] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 129414


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

********** Processing data for 2174-02-12 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.9086882225022428
----- Data i -----
>> Time: 2174-02-12 19:14:00
>> Type: sentence
>> Concepts: {'Structure of middle lobe of right lung', 'Pneumonia'}
>> He had a CXR which was    inconclusive, possible RML pneumonia.'
----- Data j -----
>> Time: 2174-02-12 19:14:00
>> Type: sentence
>> Concepts: {'Structure of right lower lobe of lung', 'Pneumonia', 'Structure of right upper lobe of lung'}
>> Got a CTA for concern of PE-but no PE, but some    increased interstitial markings in RUL and RLL, more likely chronic    process vs. pneumonia.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.5000000000000001
----- Data i -----
>> Time: 2174-02-12 19:14:00
>> Type: sentence
>> Concepts: {'Positive pressure therapy', 'Sleep Apnea, Obstructive'}
>> Obstructive sleep apnea:  Declined CPAP therapy in the past but    will likely benefit from positive pressure as he is 

In [213]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about lab values
semantic_type_ids   = CONFLICT_TO_SEMANTIC_TYPE['test_results']
semantic_type_names = [SEMANTIC_TYPE_TO_NAME[st_id] for st_id in semantic_type_ids]

is_lab = df['semantic type'].apply(lambda x: x in semantic_type_names)
lab_pairs_df = df.loc[(df['type 1'] == "lab") | (df['type 2'] == "lab") | is_lab]

lab_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
12,Patient's Potassium lab came back 4.9 mEq/L.,Hyperkalemia: intermittent elevations of his potassium.',2174-02-12,2174-02-12 19:14:00,lab,sentence,"{kalium, Potassium supplement}","{kalium, Potassium supplement}",1.0,Pharmacologic Substance
13,"Patient's Red Blood Cells lab came back 3.81 m/uL , which is abnormal.",Height: 64 Inch Total In: 16 mL PO: TF: IVF: 16 mL Blood products: Total out: 0 mL 0 mL Urine: NG: Stool: Drains: Balance: 0 mL 16 mL Respiratory O2 Delivery Device: Venti mask SpO2: 97%',2174-02-12,2174-02-12 19:14:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance


In [214]:
lab_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
12,Patient's Potassium lab came back 4.9 mEq/L.,Hyperkalemia: intermittent elevations of his potassium.',2174-02-12 00:00:00,2174-02-12 19:14:00,lab,sentence,"{kalium, Potassium supplement}","{kalium, Potassium supplement}",1.0,Pharmacologic Substance
13,"Patient's Red Blood Cells lab came back 3.81 m/uL , which is abnormal.",Height: 64 Inch Total In: 16 mL PO: TF: IVF: 16 mL Blood products: Total out: 0 mL 0 mL Urine: NG: Stool: Drains: Balance: 0 mL 16 mL Respiratory O2 Delivery Device: Venti mask SpO2: 97%',2174-02-12 00:00:00,2174-02-12 19:14:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
14,Patient's Folate lab came back 10.2 ng/mL.,-recheck HCT -send iron studies and b12 and folate .',2174-02-12 00:00:00,2174-02-12 19:14:00,lab,sentence,{folate},"{Hematocrit procedure, folate, Folate, Scientific Study}",0.707107,Pharmacologic Substance
15,"Patient's Potassium lab came back 5.4 mEq/L , which is abnormal.",Hyperkalemia: intermittent elevations of his potassium.',2174-02-12 00:00:00,2174-02-12 19:14:00,lab,sentence,"{kalium, Potassium supplement}","{kalium, Potassium supplement}",1.0,Pharmacologic Substance
16,"Patient's Red Blood Cells lab came back 3.96 m/uL , which is abnormal.",Height: 64 Inch Total In: 16 mL PO: TF: IVF: 16 mL Blood products: Total out: 0 mL 0 mL Urine: NG: Stool: Drains: Balance: 0 mL 16 mL Respiratory O2 Delivery Device: Venti mask SpO2: 97%',2174-02-12 00:00:00,2174-02-12 19:14:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
20,Patient's White Blood Cells lab came back 5.5 K/uL.,"Labs notable for WBC of 5, HCT 34.5, sodium of 131 and creatinine of 1.0.'",2174-02-12 00:00:00,2174-02-12 19:14:00,lab,sentence,{White Blood Cell Count procedure},"{Hematocrit procedure, White Blood Cell Count procedure}",0.894427,Laboratory Procedure
21,Patient's White Blood Cells lab came back 6.0 K/uL.,"Labs notable for WBC of 5, HCT 34.5, sodium of 131 and creatinine of 1.0.'",2174-02-12 00:00:00,2174-02-12 19:14:00,lab,sentence,{White Blood Cell Count procedure},"{Hematocrit procedure, White Blood Cell Count procedure}",0.894427,Laboratory Procedure
29,"Patient's Red Blood Cells lab came back 3.83 m/uL , which is abnormal.","Height: 64 Inch Total In: 1,068 mL 255 mL PO: 100 mL TF: IVF: 1,068 mL 155 mL Blood products: Total out: 600 mL 625 mL Urine: 600 mL 625 mL NG: Stool: Drains: Balance: 468 mL -370 mL Respiratory support O2 Delivery Device: None SpO2: 95%'",2174-02-13 00:00:00,2174-02-13 10:57:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
30,"Patient's Red Blood Cells lab came back 3.83 m/uL , which is abnormal.","Height: 64 Inch Total In: 1,068 mL 75 mL PO: TF: IVF: 1,068 mL 75 mL Blood products: Total out: 600 mL 625 mL Urine: 600 mL 625 mL NG: Stool: Drains: Balance: 468 mL -550 mL Respiratory support O2 Delivery Device: Bipap mask SpO2: 99%'",2174-02-13 00:00:00,2174-02-13 07:34:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
35,ABG: ///22/ Physical Examination',ABG: ///22/ Physical Examination',2174-02-13 07:34:00,2174-02-13 10:57:00,sentence,sentence,{Analysis of arterial blood gases and pH},{Analysis of arterial blood gases and pH},1.0,Laboratory Procedure


In [215]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [216]:
pair_idx = 12
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "Consistently low potassium levels."
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

lab 1:	Patient's Potassium lab came back 4.9 mEq/L.
sentence 2:	   Hyperkalemia:  intermittent elevations of his potassium.'

New contradicting sentence: Consistently low potassium levels.


In [217]:
no_contradiction_pair_idx = [15, 36]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")

# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
lab 1:	Patient's Potassium lab came back 5.4 mEq/L , which is abnormal.
sentence 2:	   Hyperkalemia:  intermittent elevations of his potassium.'
*****************************
sentence 1:	   Neurologic: Responds to: Not assessed, Movement: Not assessed, Tone:    Not assessed    Labs / Radiology    393 K/uL'
sentence 2:	   Neurologic: No(t) Attentive, No(t) Follows simple commands, Responds    to: Verbal stimuli, Movement: Not assessed, Tone: Not assessed    Labs / Radiology'
*****************************


In [218]:
"""
Todo: ask Dr. Saenz
"""
potential_contradiction_pair_indices = [21]

print("Potential examples of contradictions")
print("*****************************")
for pair_idx in potential_contradiction_pair_indices:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")

Potential examples of contradictions
*****************************
lab 1:	Patient's White Blood Cells lab came back 6.0 K/uL.
sentence 2:	Labs notable for WBC of 5, HCT 34.5,    sodium of 131 and creatinine of 1.0.'
*****************************


## Patient 3

In [219]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[2] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 133623


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)
A value is trying to be set on a copy of a slice from a

********** Processing data for 2145-11-30 **********
********** Processing data for 2145-12-01 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.6030226891555273
----- Data i -----
>> Time: 2145-12-01 03:47:00
>> Type: sentence
>> Concepts: {'Ting AF', 'Atrial Fibrillation'}
>> Upon assessment the pt reportedly became    combative with a HR revealing AF with RVR with rates in 160-170s.'
----- Data j -----
>> Time: 2145-12-01 03:47:00
>> Type: sentence
>> Concepts: {'Infantile Neuroaxonal Dystrophy', 'Chest Pain', 'Ting AF', 'ethanol', 'Hepatitis C', 'Atrial Fibrillation'}
>>    ASSESSMENT AND PLAN: 54M with hx of ETOH abuse, HCV, presenting with AF    with RVR, Chest Pain in the setting of ETOH intoxication.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.7559289460184544
----- Data i -----
>> Time: 2145-12-01 03:47:00
>> Type: sentence
>> Concepts: {'Ting AF', 'Atrial Fibrillation'}
>> Upon assessment the pt reportedly became    combative with 

In [220]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about lab values
semantic_type_ids   = CONFLICT_TO_SEMANTIC_TYPE['test_results']
semantic_type_names = [SEMANTIC_TYPE_TO_NAME[st_id] for st_id in semantic_type_ids]

is_lab = df['semantic type'].apply(lambda x: x in semantic_type_names)
lab_pairs_df = df.loc[(df['type 1'] == "lab") | (df['type 2'] == "lab") | is_lab]

lab_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
46,Patient's Lactate lab came back 1.8 mmol/L.,Lactate elevated.',2145-12-01,2145-12-01 03:47:00,lab,sentence,{Lactate},{Lactate},1.0,Pharmacologic Substance
47,"Patient's Red Blood Cells lab came back 3.66 m/uL , which is abnormal.","RR: 24 (13 - 29) insp/min SpO2: 100% Heart rhythm: SR (Sinus Rhythm) Wgt (current): 117.4 kg (admission): 117.4 kg Total In: 2,303 mL PO: 1,200 mL TF: IVF: 1,103 mL Blood products: Total out: 0 mL 800 mL Urine: 650 mL NG: Stool: 150 mL Drains: Balance: 0 mL 1,503 mL Respiratory O2 Delivery Device: None SpO2: 100%'",2145-12-01,2145-12-01 10:17:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance


In [221]:
lab_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
46,Patient's Lactate lab came back 1.8 mmol/L.,Lactate elevated.',2145-12-01 00:00:00,2145-12-01 03:47:00,lab,sentence,{Lactate},{Lactate},1.0,Pharmacologic Substance
47,"Patient's Red Blood Cells lab came back 3.66 m/uL , which is abnormal.","RR: 24 (13 - 29) insp/min SpO2: 100% Heart rhythm: SR (Sinus Rhythm) Wgt (current): 117.4 kg (admission): 117.4 kg Total In: 2,303 mL PO: 1,200 mL TF: IVF: 1,103 mL Blood products: Total out: 0 mL 800 mL Urine: 650 mL NG: Stool: 150 mL Drains: Balance: 0 mL 1,503 mL Respiratory O2 Delivery Device: None SpO2: 100%'",2145-12-01 00:00:00,2145-12-01 10:17:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
48,"Patient's Ethanol lab came back 308 mg/dL , which is abnormal.",ETOH heavy at times although denies daily use.',2145-12-01 00:00:00,2145-12-01 03:47:00,lab,sentence,"{ethanol, Ethanol}",{ethanol},1.0,Pharmacologic Substance
49,"Patient's Ethanol lab came back 308 mg/dL , which is abnormal.",HPI: Patient with excessive EtOH ingestion which he states is an event [**2-3**] times per month.',2145-12-01 00:00:00,2145-12-01 10:17:00,lab,sentence,"{ethanol, Ethanol}",{ethanol},1.0,Pharmacologic Substance
50,"Patient's Red Blood Cells lab came back 3.81 m/uL , which is abnormal.","RR: 24 (13 - 29) insp/min SpO2: 100% Heart rhythm: SR (Sinus Rhythm) Wgt (current): 117.4 kg (admission): 117.4 kg Total In: 2,303 mL PO: 1,200 mL TF: IVF: 1,103 mL Blood products: Total out: 0 mL 800 mL Urine: 650 mL NG: Stool: 150 mL Drains: Balance: 0 mL 1,503 mL Respiratory O2 Delivery Device: None SpO2: 100%'",2145-12-01 00:00:00,2145-12-01 10:17:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
63,# Chest Pain: Pt reported left sided chest pain while in the ED.',Pt evaluated by Cards while in ED.',2145-12-01 03:47:00,2145-12-01 03:47:00,sentence,sentence,"{Prothrombin time assay, Chest Pain}","{Prothrombin time assay, Respiratory Distress Syndrome, Adult}",0.507093,Laboratory Procedure
64,Pt currently in AF with rates in 120s.',# Chest Pain: Pt reported left sided chest pain while in the ED.',2145-12-01 03:47:00,2145-12-01 03:47:00,sentence,sentence,"{Prothrombin time assay, Atrial Fibrillation, Ting AF}","{Prothrombin time assay, Chest Pain}",0.507093,Laboratory Procedure
65,Upon arrival the pt was noted to have slurred speech and decreased responsiveness.',Pt evaluated by Cards while in ED.',2145-12-01 03:47:00,2145-12-01 03:47:00,sentence,sentence,{Prothrombin time assay},"{Prothrombin time assay, Respiratory Distress Syndrome, Adult}",0.654654,Laboratory Procedure
66,Upon arrival the pt was noted to have slurred speech and decreased responsiveness.',# Chest Pain: Pt reported left sided chest pain while in the ED.',2145-12-01 03:47:00,2145-12-01 03:47:00,sentence,sentence,{Prothrombin time assay},"{Prothrombin time assay, Chest Pain}",0.774597,Laboratory Procedure
67,Upon arrival the pt was noted to have slurred speech and decreased responsiveness.',Pt currently in AF with rates in 120s.',2145-12-01 03:47:00,2145-12-01 03:47:00,sentence,sentence,{Prothrombin time assay},"{Prothrombin time assay, Atrial Fibrillation, Ting AF}",0.654654,Laboratory Procedure


In [222]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [223]:
pair_idx = 46
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "Pt lactate stable and normal range."
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

lab 1:	Patient's Lactate lab came back 1.8 mmol/L.
sentence 2:	Lactate    elevated.'

New contradicting sentence: Pt lactate stable and normal range.


In [224]:
no_contradiction_pair_idx = [48, 76]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")
    
# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
lab 1:	Patient's Ethanol lab came back 308 mg/dL , which is abnormal.
sentence 2:	ETOH heavy at times    although denies daily use.'
*****************************
sentence 1:	   ABG: ///23/    Physical Examination    Head, Ears, Nose, Throat: Normocephalic'
sentence 2:	   ABG: ///23/    Physical Examination'
*****************************


## Patient 4

In [225]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[3] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 197325


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

********** Processing data for 2157-02-01 **********
********** Processing data for 2157-02-02 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.5477225575051663
----- Data i -----
>> Time: 2157-02-02 01:37:00
>> Type: sentence
>> Concepts: {'Infantile Neuroaxonal Dystrophy'}
>> Remainder of plan as outlined    above.'
----- Data j -----
>> Time: 2157-02-02 01:37:00
>> Type: sentence
>> Concepts: {'Infantile Neuroaxonal Dystrophy', 'carbidopa', 'Flagyl', 'BCX 34', 'Acute cholangitis', 'Mapap'}
>> Agree with plan to manage acute cholangitis with obstructing CBD stone    with broad abx coverage with vanco / zosyn / flagyl while awaiting BCx    and continuing hydration based on MAP / UOP.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.7745966692414834
----- Data i -----
>> Time: 2157-02-02 01:37:00
>> Type: sentence
>> Concepts: {'Ulcerative Colitis', 'Urinalysis'}
>> Getting UA and UC given urinary frequency.'
----- Data j -----
>> Time: 2157-02-

In [226]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about lab values
semantic_type_ids   = CONFLICT_TO_SEMANTIC_TYPE['test_results']
semantic_type_names = [SEMANTIC_TYPE_TO_NAME[st_id] for st_id in semantic_type_ids]

is_lab = df['semantic type'].apply(lambda x: x in semantic_type_names)
lab_pairs_df = df.loc[(df['type 1'] == "lab") | (df['type 2'] == "lab") | is_lab]

lab_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
39,Patient's Lipase lab came back 23 IU/L.,IVF bolus at 500cc this morning -Follow LFT s at this time and follow Lipase in the setting of modest epigastric discomfort ICU Care',2157-02-02,2157-02-02 10:02:00,lab,sentence,"{Lipase, lipase}","{Lipase, Assisted Reproductive Technologies, Liver Function Tests, lipase}",0.632456,Pharmacologic Substance
40,"Patient's Red Blood Cells lab came back 3.15 m/uL , which is abnormal.","RR: 15 (12 - 24) insp/min SpO2: 98% Heart rhythm: SB (Sinus Bradycardia) Total In: 564 mL 1,915 mL PO: TF: IVF: 564 mL 1,915 mL Blood products: Total out: 160 mL 545 mL Urine: 160 mL 545 mL NG: Stool: Drains: Balance: 404 mL 1,370 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 98%'",2157-02-02,2157-02-02 10:02:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance


In [227]:
lab_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
39,Patient's Lipase lab came back 23 IU/L.,IVF bolus at 500cc this morning -Follow LFT s at this time and follow Lipase in the setting of modest epigastric discomfort ICU Care',2157-02-02 00:00:00,2157-02-02 10:02:00,lab,sentence,"{Lipase, lipase}","{Lipase, Assisted Reproductive Technologies, Liver Function Tests, lipase}",0.632456,Pharmacologic Substance
40,"Patient's Red Blood Cells lab came back 3.15 m/uL , which is abnormal.","RR: 15 (12 - 24) insp/min SpO2: 98% Heart rhythm: SB (Sinus Bradycardia) Total In: 564 mL 1,915 mL PO: TF: IVF: 564 mL 1,915 mL Blood products: Total out: 160 mL 545 mL Urine: 160 mL 545 mL NG: Stool: Drains: Balance: 404 mL 1,370 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 98%'",2157-02-02 00:00:00,2157-02-02 10:02:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
41,"Patient's Red Blood Cells lab came back 3.15 m/uL , which is abnormal.","RR: 24 (12 - 24) insp/min SpO2: 98% Heart rhythm: SB (Sinus Bradycardia) Total In: 563 mL 1,051 mL PO: TF: IVF: 563 mL 1,051 mL Blood products: Total out: 160 mL 475 mL Urine: 160 mL 475 mL NG: Stool: Drains: Balance: 403 mL 576 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 98%'",2157-02-02 00:00:00,2157-02-02 07:51:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
42,Patient's Lactate lab came back 1.6 mmol/L.,- lactate in AM .',2157-02-02 00:00:00,2157-02-02 01:37:00,lab,sentence,{Lactate},{Lactate},1.0,Pharmacologic Substance
43,"Patient's Red Blood Cells lab came back 3.26 m/uL , which is abnormal.","RR: 15 (12 - 24) insp/min SpO2: 98% Heart rhythm: SB (Sinus Bradycardia) Total In: 564 mL 1,915 mL PO: TF: IVF: 564 mL 1,915 mL Blood products: Total out: 160 mL 545 mL Urine: 160 mL 545 mL NG: Stool: Drains: Balance: 404 mL 1,370 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 98%'",2157-02-02 00:00:00,2157-02-02 10:02:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
44,"Patient's Red Blood Cells lab came back 3.26 m/uL , which is abnormal.","RR: 24 (12 - 24) insp/min SpO2: 98% Heart rhythm: SB (Sinus Bradycardia) Total In: 563 mL 1,051 mL PO: TF: IVF: 563 mL 1,051 mL Blood products: Total out: 160 mL 475 mL Urine: 160 mL 475 mL NG: Stool: Drains: Balance: 403 mL 576 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 98%'",2157-02-02 00:00:00,2157-02-02 07:51:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
67,Right: Carotid 2+ Femoral 2+ Popliteal 2+ DP 2+ PT 2+',Hypertension - Pt with normal renal function coming with cholangitis.',2157-02-02 01:37:00,2157-02-02 01:37:00,sentence,sentence,"{carotid, Femur, Prothrombin time assay}","{Prothrombin time assay, Hypertet}",0.67082,Laboratory Procedure
68,"Left: Carotid 2+ Femoral 2+ Popliteal 2+ DP 2+ PT 2+ Labs / Radiology [image002.jpg] Other labs: Lactic Acid:4.3 mmol/L Assessment and Plan Mrs. [**Known firstname 12011**] [**Known lastname 12012**] is a very nice 85 year-old woman with significant past medical history of hypertension, cholecystectomy and ampullar stenosis wh ocomes with cholangitis now s/p stent removal.'",Right: Carotid 2+ Femoral 2+ Popliteal 2+ DP 2+ PT 2+',2157-02-02 01:37:00,2157-02-02 01:37:00,sentence,sentence,"{Femur, Prothrombin time assay, Infantile Neuroaxonal Dystrophy, Cholangitis, Excision, Laboratory test finding, Cholecystectomy procedure, Hypertet, carotid}","{carotid, Femur, Prothrombin time assay}",0.559017,Laboratory Procedure
69,"Elevated lactate - Pt with stable VS, cholangitis as above.'",Hypertension - Pt with normal renal function coming with cholangitis.',2157-02-02 01:37:00,2157-02-02 01:37:00,sentence,sentence,"{Prothrombin time assay, Cholangitis}","{Prothrombin time assay, Hypertet}",0.75,Laboratory Procedure
70,"Elevated lactate - Pt with stable VS, cholangitis as above.'",Right: Carotid 2+ Femoral 2+ Popliteal 2+ DP 2+ PT 2+',2157-02-02 01:37:00,2157-02-02 01:37:00,sentence,sentence,"{Prothrombin time assay, Cholangitis}","{carotid, Femur, Prothrombin time assay}",0.67082,Laboratory Procedure


In [228]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [229]:
no_contradiction_pair_idx = [39, 42, 72]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")
    
# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
lab 1:	Patient's Lipase lab came back 23 IU/L.
sentence 2:	IVF bolus    at 500cc this morning    -Follow LFT s at this time and follow Lipase in the setting of modest    epigastric discomfort    ICU Care'
*****************************
lab 1:	Patient's Lactate lab came back 1.6 mmol/L.
sentence 2:	- lactate in AM    .'
*****************************
sentence 1:	Diverticulosis: Stable HCT, no signs of bleeding or inflammation.'
sentence 2:	Diverticulosis - Stable HCT, no signs of bleeding or inflammation.'
*****************************


## Patient 5

In [230]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[4] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 186291


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

********** Processing data for 2151-09-21 **********
********** Processing data for 2151-09-22 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.6324555320336759
----- Data i -----
>> Time: 2151-09-22 05:00:00
>> Type: sentence
>> Concepts: {'Chronic Obstructive Airway Disease'}
>> COPD -- no evidence for significant COPD exacerbation.'
----- Data j -----
>> Time: 2151-09-22 05:00:00
>> Type: sentence
>> Concepts: {'Chronic Obstructive Airway Disease', 'Microbicides', 'MICROCEPHALY, EPILEPSY, AND DIABETES SYNDROME'}
>> Continue usual    meds, and low threshold for antimicrobials if evidence for COPD    exacerbation.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.533001790889026
----- Data i -----
>> Time: 2151-09-22 05:00:00
>> Type: sentence
>> Concepts: {'Pharmaceutical Preparations', 'Excision', 'Hypertensive disease', 'Papain', 'Resectisol', 'ileum', 'Chronic Obstructive Airway Disease', 'Antiretroviral Therapy, Highly Active', 'HIV Vaccine

In [231]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about lab values
semantic_type_ids   = CONFLICT_TO_SEMANTIC_TYPE['test_results']
semantic_type_names = [SEMANTIC_TYPE_TO_NAME[st_id] for st_id in semantic_type_ids]

is_lab = df['semantic type'].apply(lambda x: x in semantic_type_names)
lab_pairs_df = df.loc[(df['type 1'] == "lab") | (df['type 2'] == "lab") | is_lab]

lab_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
89,"Patient's Glucose lab came back 113 mg/dL , which is abnormal.","Neurologic: Attentive, Follows simple commands, Responds to: Verbal stimuli, Oriented (to): person, place, Movement: Purposeful, No(t) Sedated, No(t) Paralyzed, Tone: Normal Labs / Radiology 230 K/uL 30.9 % 10.4 g/dL 113 mg/dL 0.9 mg/dL 13 mg/dL 31 mEq/L 103 mEq/L 4.0 mEq/L 141 mEq/L 7.3 K/uL [image002.jpg] [**2151-9-22**] 03:06 AM WBC 7.3 Hct 30.9 Plt 230 Cr 0.9 Glucose 113 Other labs: ALT / AST:16/26, Alk Phos / T Bili:71/0.4, Amylase / Lipase:/30, Ca++:7.9 mg/dL, Mg++:2.0 mg/dL, PO4:2.7 mg/dL'",2151-09-22,2151-09-22 05:00:00,lab,sentence,"{glucose, Glucose}","{Lipase, Glucose, White Blood Cell Count procedure, Amylases, glucose, Laboratory test finding, amylase}",0.516398,Pharmacologic Substance
90,"Patient's Red Blood Cells lab came back 3.55 m/uL , which is abnormal.",RR: 20 (16 - 20) insp/min SpO2: 98% Heart rhythm: SR (Sinus Rhythm) Total In: 243 mL PO: 100 mL TF: IVF: 143 mL Blood products: Total out: 0 mL 500 mL Urine: 100 mL NG: Stool: Drains: Balance: 0 mL -257 mL Respiratory O2 Delivery Device: Nasal cannula SpO2: 98%',2151-09-22,2151-09-22 05:00:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance


In [232]:
lab_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
89,"Patient's Glucose lab came back 113 mg/dL , which is abnormal.","Neurologic: Attentive, Follows simple commands, Responds to: Verbal stimuli, Oriented (to): person, place, Movement: Purposeful, No(t) Sedated, No(t) Paralyzed, Tone: Normal Labs / Radiology 230 K/uL 30.9 % 10.4 g/dL 113 mg/dL 0.9 mg/dL 13 mg/dL 31 mEq/L 103 mEq/L 4.0 mEq/L 141 mEq/L 7.3 K/uL [image002.jpg] [**2151-9-22**] 03:06 AM WBC 7.3 Hct 30.9 Plt 230 Cr 0.9 Glucose 113 Other labs: ALT / AST:16/26, Alk Phos / T Bili:71/0.4, Amylase / Lipase:/30, Ca++:7.9 mg/dL, Mg++:2.0 mg/dL, PO4:2.7 mg/dL'",2151-09-22 00:00:00,2151-09-22 05:00:00,lab,sentence,"{glucose, Glucose}","{Lipase, Glucose, White Blood Cell Count procedure, Amylases, glucose, Laboratory test finding, amylase}",0.516398,Pharmacologic Substance
90,"Patient's Red Blood Cells lab came back 3.55 m/uL , which is abnormal.",RR: 20 (16 - 20) insp/min SpO2: 98% Heart rhythm: SR (Sinus Rhythm) Total In: 243 mL PO: 100 mL TF: IVF: 143 mL Blood products: Total out: 0 mL 500 mL Urine: 100 mL NG: Stool: Drains: Balance: 0 mL -257 mL Respiratory O2 Delivery Device: Nasal cannula SpO2: 98%',2151-09-22 00:00:00,2151-09-22 05:00:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
91,"Patient's Red Blood Cells lab came back 3.55 m/uL , which is abnormal.",RR: 28 (13 - 28) insp/min SpO2: 100% Heart rhythm: ST (Sinus Tachycardia) Total In: 578 mL PO: 100 mL TF: IVF: 478 mL Blood products: Total out: 0 mL 750 mL Urine: 350 mL NG: Stool: Drains: Balance: 0 mL -172 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 100%',2151-09-22 00:00:00,2151-09-22 08:03:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Sinus Tachycardia, Blood product, Drainage procedure}",0.507093,Pharmacologic Substance
106,"- LFTs, lipase normal - monitor abdominal exam - f/u final CT read - zofran for nausea - continue ranitidine, cannot have PPI due to ART # Fever: without a clear source in HIV+ pt.'","- LFTs, lipase normal - monitor abdominal exam - f/u final CT read - zofran for nausea - continue ranitidine, cannot have PPI due to ART # Fever: without a clear source in HIV+ pt.'",2151-09-22 08:03:00,2151-09-22 03:58:00,sentence,sentence,"{Nausea, HIV Seropositivity, Fever, Lipase, Feverall, Nucleotide Sequence Read, hiVETIC, Zofran, Ranitidine, ranitidine}","{Nausea, HIV Seropositivity, Fever, Lipase, Feverall, Nucleotide Sequence Read, hiVETIC, Zofran, Ranitidine, ranitidine}",1.0,Laboratory Procedure
107,"0.9 mg/dL 31 mEq/L 4.0 mEq/L 13 mg/dL 103 mEq/L 141 mEq/L 30.9 % 7.3 K/uL [image002.jpg] [**2151-9-22**] 03:06 AM WBC 7.3 Hct 30.9 Plt 230 Cr 0.9 Glucose 113 Other labs: Ca++:7.9 mg/dL, Mg++:2.0 mg/dL, PO4:2.7 mg/dL Assessment and Plan NAUSEA / VOMITING .H/O HYPERTENSION, BENIGN CHRONIC OBSTRUCTIVE PULMONARY DISEASE (COPD, BRONCHITIS, EMPHYSEMA) WITH ACUTE EXACERBATION home O2 requirement HIV (HUMAN IMMUNODEFICIENCY VIRUS, ACQUIRED IMMUNODEFICIENCY SYNDROME, AIDS)'","Neurologic: Attentive, Follows simple commands, Responds to: Verbal stimuli, Oriented (to): person, place, Movement: Purposeful, No(t) Sedated, No(t) Paralyzed, Tone: Normal Labs / Radiology 230 K/uL 30.9 % 10.4 g/dL 113 mg/dL 0.9 mg/dL 13 mg/dL 31 mEq/L 103 mEq/L 4.0 mEq/L 141 mEq/L 7.3 K/uL [image002.jpg] [**2151-9-22**] 03:06 AM WBC 7.3 Hct 30.9 Plt 230 Cr 0.9 Glucose 113 Other labs: ALT / AST:16/26, Alk Phos / T Bili:71/0.4, Amylase / Lipase:/30, Ca++:7.9 mg/dL, Mg++:2.0 mg/dL, PO4:2.7 mg/dL'",2151-09-22 08:03:00,2151-09-22 05:00:00,sentence,sentence,"{Bronchitis, Glucose, Infantile Neuroaxonal Dystrophy, Nausea, White Blood Cell Count procedure, brain-derived neurotrophic factor, human, Primed lymphocyte test, Laboratory test finding, Pulmonary Emphysema, Chronic Obstructive Airway Disease, Hypertet, HIV Vaccine, Acquired Immunodeficiency Syndrome, glucose}","{Lipase, Glucose, White Blood Cell Count procedure, Amylases, glucose, Laboratory test finding, amylase}",0.537484,Laboratory Procedure
108,"- u/a negative - blood cx pending - CXR clear and no sx of PNA/bronchitis - f/u cx and final CT read - monitor for now, hold on Abx # HTN: hypertensive urgency in the ED; currently well controlled w/dose of labetalol he received in the ED.'","- u/a negative - blood cx pending - CXR clear and no sx of PNA/bronchitis - f/u cx and final CT read - monitor for now, hold on Abx # HTN: hypertensive urgency in the ED; currently well controlled w/dose of labetalol he received in the ED.'",2151-09-22 08:03:00,2151-09-22 03:58:00,sentence,sentence,"{Blood Stop, All Clear, Nucleotide Sequence Read, labetalol, Hypertensive disease, Labetalol}","{Blood Stop, All Clear, Nucleotide Sequence Read, labetalol, Hypertensive disease, Labetalol}",1.0,Laboratory Procedure
109,ABG: ///31/ Physical Examination',ABG: ///31/ Physical Examination',2151-09-22 08:03:00,2151-09-22 05:00:00,sentence,sentence,{Analysis of arterial blood gases and pH},{Analysis of arterial blood gases and pH},1.0,Laboratory Procedure
110,Patient's White Blood Cells lab came back 7.3 K/uL.,"Neurologic: Attentive, Follows simple commands, Responds to: Verbal stimuli, Oriented (to): person, place, Movement: Purposeful, No(t) Sedated, No(t) Paralyzed, Tone: Normal Labs / Radiology 230 K/uL 30.9 % 10.4 g/dL 113 mg/dL 0.9 mg/dL 13 mg/dL 31 mEq/L 103 mEq/L 4.0 mEq/L 141 mEq/L 7.3 K/uL [image002.jpg] [**2151-9-22**] 03:06 AM WBC 7.3 Hct 30.9 Plt 230 Cr 0.9 Glucose 113 Other labs: ALT / AST:16/26, Alk Phos / T Bili:71/0.4, Amylase / Lipase:/30, Ca++:7.9 mg/dL, Mg++:2.0 mg/dL, PO4:2.7 mg/dL'",2151-09-22 00:00:00,2151-09-22 05:00:00,lab,sentence,{White Blood Cell Count procedure},"{Lipase, Glucose, White Blood Cell Count procedure, Amylases, glucose, Laboratory test finding, amylase}",0.57735,Laboratory Procedure
111,"- LFTs, lipase normal - monitor abdominal exam - f/u final CT read - zofran for nausea - continue ranitidine, cannot have PPI due to ART # Fever: without a clear source in HIV+ pt.'","- LFTs, lipase normal - monitor abdominal exam - f/u final CT read - zofran for nausea - continue ranitidine, cannot have PPI due to ART # Fever: without a clear source in HIV+ pt.'",2151-09-22 08:03:00,2151-09-22 03:58:00,sentence,sentence,"{Nausea, HIV Seropositivity, Fever, Lipase, Feverall, Nucleotide Sequence Read, hiVETIC, Zofran, Ranitidine, ranitidine}","{Nausea, HIV Seropositivity, Fever, Lipase, Feverall, Nucleotide Sequence Read, hiVETIC, Zofran, Ranitidine, ranitidine}",1.0,Laboratory or Test Result
112,"0.9 mg/dL 31 mEq/L 4.0 mEq/L 13 mg/dL 103 mEq/L 141 mEq/L 30.9 % 7.3 K/uL [image002.jpg] [**2151-9-22**] 03:06 AM WBC 7.3 Hct 30.9 Plt 230 Cr 0.9 Glucose 113 Other labs: Ca++:7.9 mg/dL, Mg++:2.0 mg/dL, PO4:2.7 mg/dL Assessment and Plan NAUSEA / VOMITING .H/O HYPERTENSION, BENIGN CHRONIC OBSTRUCTIVE PULMONARY DISEASE (COPD, BRONCHITIS, EMPHYSEMA) WITH ACUTE EXACERBATION home O2 requirement HIV (HUMAN IMMUNODEFICIENCY VIRUS, ACQUIRED IMMUNODEFICIENCY SYNDROME, AIDS)'","Neurologic: Attentive, Follows simple commands, Responds to: Verbal stimuli, Oriented (to): person, place, Movement: Purposeful, No(t) Sedated, No(t) Paralyzed, Tone: Normal Labs / Radiology 230 K/uL 30.9 % 10.4 g/dL 113 mg/dL 0.9 mg/dL 13 mg/dL 31 mEq/L 103 mEq/L 4.0 mEq/L 141 mEq/L 7.3 K/uL [image002.jpg] [**2151-9-22**] 03:06 AM WBC 7.3 Hct 30.9 Plt 230 Cr 0.9 Glucose 113 Other labs: ALT / AST:16/26, Alk Phos / T Bili:71/0.4, Amylase / Lipase:/30, Ca++:7.9 mg/dL, Mg++:2.0 mg/dL, PO4:2.7 mg/dL'",2151-09-22 08:03:00,2151-09-22 05:00:00,sentence,sentence,"{Bronchitis, Glucose, Infantile Neuroaxonal Dystrophy, Nausea, White Blood Cell Count procedure, brain-derived neurotrophic factor, human, Primed lymphocyte test, Laboratory test finding, Pulmonary Emphysema, Chronic Obstructive Airway Disease, Hypertet, HIV Vaccine, Acquired Immunodeficiency Syndrome, glucose}","{Lipase, Glucose, White Blood Cell Count procedure, Amylases, glucose, Laboratory test finding, amylase}",0.537484,Laboratory or Test Result


In [233]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [234]:
pair_idx = 110
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = data_2.txt
start, end = re.search("7.3", contradicting_txt).span()
contradicting_txt = contradicting_txt[:start] + "1.2" + contradicting_txt[end:]

start, end = re.search("7.3", contradicting_txt).span()
contradicting_txt = contradicting_txt[:start] + "2.4" + contradicting_txt[end:]
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

lab 1:	Patient's White Blood Cells lab came back 7.3 K/uL.
sentence 2:	   Neurologic: Attentive, Follows simple commands, Responds to: Verbal    stimuli, Oriented (to): person, place, Movement: Purposeful, No(t)    Sedated, No(t) Paralyzed, Tone: Normal    Labs / Radiology    230 K/uL    30.9 %    10.4 g/dL    113 mg/dL    0.9 mg/dL    13 mg/dL    31 mEq/L    103 mEq/L    4.0 mEq/L    141 mEq/L    7.3 K/uL         [image002.jpg]                               [**2151-9-22**]   03:06 AM    WBC                                      7.3    Hct                                     30.9    Plt                                      230    Cr                                      0.9    Glucose                                      113    Other labs: ALT / AST:16/26, Alk Phos / T Bili:71/0.4, Amylase /    Lipase:/30, Ca++:7.9 mg/dL, Mg++:2.0 mg/dL, PO4:2.7 mg/dL'

New contradicting sentence:    Neurologic: Attentive, Follows simple commands, Responds to: Verbal    stimuli, Oriented (to): person, pl

In [235]:
no_contradiction_pair_idx = [89]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")
    
# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
lab 1:	Patient's Glucose lab came back 113 mg/dL , which is abnormal.
sentence 2:	   Neurologic: Attentive, Follows simple commands, Responds to: Verbal    stimuli, Oriented (to): person, place, Movement: Purposeful, No(t)    Sedated, No(t) Paralyzed, Tone: Normal    Labs / Radiology    230 K/uL    30.9 %    10.4 g/dL    113 mg/dL    0.9 mg/dL    13 mg/dL    31 mEq/L    103 mEq/L    4.0 mEq/L    141 mEq/L    7.3 K/uL         [image002.jpg]                               [**2151-9-22**]   03:06 AM    WBC                                      7.3    Hct                                     30.9    Plt                                      230    Cr                                      0.9    Glucose                                      113    Other labs: ALT / AST:16/26, Alk Phos / T Bili:71/0.4, Amylase /    Lipase:/30, Ca++:7.9 mg/dL, Mg++:2.0 mg/dL, PO4:2.7 mg/dL'
*****************************


## Patient 6

In [236]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[5] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 180836


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of

********** Processing data for 2152-02-15 **********
********** Processing data for 2152-02-16 **********
********** Processing data for 2152-02-17 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.6804138174397719
----- Data i -----
>> Time: 2152-02-17 10:25:00
>> Type: sentence
>> Concepts: {'Pulmonary Function Test/Forced Expiratory Volume 1', 'Chronic Obstructive Airway Disease', 'Terminal illness'}
>>    COPD:  As above, end stage disease with FEV1 of 0.5 which is 20%    predicted.'
----- Data j -----
>> Time: 2152-02-17 10:25:00
>> Type: sentence
>> Concepts: {'Chronic Obstructive Airway Disease', 'Ventral Lateral Thalamic Nucleus', 'HIV Vaccine', 'Pulmonary Function Test/Forced Expiratory Volume 1', 'Respiratory distress'}
>> Respiratory Distress    HPI:    Mr.  [**Known lastname **]  is a 67M with HIV (Cd4 183, VL 96 copies/mL) and end stage    COPD on 3-4L home O2 with a FEV1 of 0.5 who presented to the emergency    room on  [**2152-2-15**]  with increased shortness of 

In [237]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about lab values
semantic_type_ids   = CONFLICT_TO_SEMANTIC_TYPE['test_results']
semantic_type_names = [SEMANTIC_TYPE_TO_NAME[st_id] for st_id in semantic_type_ids]

is_lab = df['semantic type'].apply(lambda x: x in semantic_type_names)
lab_pairs_df = df.loc[(df['type 1'] == "lab") | (df['type 2'] == "lab") | is_lab]

lab_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
9,Patient's Oxygen lab came back 100 %.,He is on [**12-21**] liters home oxygen.',2152-02-17,2152-02-17 10:25:00,lab,sentence,{Oxygen},{Oxygen},1.0,Pharmacologic Substance
10,Patient's Oxygen lab came back 100 %.,He was feeling more short of breath despite increasing oxygen use.',2152-02-17,2152-02-17 10:25:00,lab,sentence,{Oxygen},{Oxygen},1.0,Pharmacologic Substance


In [238]:
lab_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
9,Patient's Oxygen lab came back 100 %.,He is on [**12-21**] liters home oxygen.',2152-02-17 00:00:00,2152-02-17 10:25:00,lab,sentence,{Oxygen},{Oxygen},1.0,Pharmacologic Substance
10,Patient's Oxygen lab came back 100 %.,He was feeling more short of breath despite increasing oxygen use.',2152-02-17 00:00:00,2152-02-17 10:25:00,lab,sentence,{Oxygen},{Oxygen},1.0,Pharmacologic Substance
21,"ABG shows chronic respiratory acidosis which appears compensated but on exam he has increased work of breathing, accessory muscle use and distress.'","Attending Evaluated pt who was in severe resp distress with grunting, accessory muscle use, with ABG evidence of fatigue.'",2152-02-17 10:25:00,2152-02-17 21:38:00,sentence,sentence,"{Analysis of arterial blood gases and pH, Chronic respiratory acidosis}","{Fatigue, Analysis of arterial blood gases and pH}",0.782624,Laboratory Procedure
22,He had an ABG on a non-rebreather which was 7.37/57/207/34.',"Attending Evaluated pt who was in severe resp distress with grunting, accessory muscle use, with ABG evidence of fatigue.'",2152-02-17 10:25:00,2152-02-17 21:38:00,sentence,sentence,{Analysis of arterial blood gases and pH},"{Fatigue, Analysis of arterial blood gases and pH}",0.935414,Laboratory Procedure
23,He had an ABG on a non-rebreather which was 7.37/57/207/34.',"ABG shows chronic respiratory acidosis which appears compensated but on exam he has increased work of breathing, accessory muscle use and distress.'",2152-02-17 10:25:00,2152-02-17 10:25:00,sentence,sentence,{Analysis of arterial blood gases and pH},"{Analysis of arterial blood gases and pH, Chronic respiratory acidosis}",0.83666,Laboratory Procedure
24,ABG: 7.46/43/383//6',"Attending Evaluated pt who was in severe resp distress with grunting, accessory muscle use, with ABG evidence of fatigue.'",2152-02-17 10:25:00,2152-02-17 21:38:00,sentence,sentence,{Analysis of arterial blood gases and pH},"{Fatigue, Analysis of arterial blood gases and pH}",0.935414,Laboratory Procedure
25,ABG: 7.46/43/383//6',"ABG shows chronic respiratory acidosis which appears compensated but on exam he has increased work of breathing, accessory muscle use and distress.'",2152-02-17 10:25:00,2152-02-17 10:25:00,sentence,sentence,{Analysis of arterial blood gases and pH},"{Analysis of arterial blood gases and pH, Chronic respiratory acidosis}",0.83666,Laboratory Procedure
26,ABG: 7.46/43/383//6',He had an ABG on a non-rebreather which was 7.37/57/207/34.',2152-02-17 10:25:00,2152-02-17 10:25:00,sentence,sentence,{Analysis of arterial blood gases and pH},{Analysis of arterial blood gases and pH},1.0,Laboratory Procedure
31,"Patient's Red Blood Cells lab came back 3.58 m/uL , which is abnormal.",RR: 15 (11 - 33) insp/min SpO2: 94% Heart rhythm: SR (Sinus Rhythm) Total In: 190 mL 584 mL PO: TF: IVF: 40 mL 584 mL Blood products: Total out: 800 mL 230 mL Urine: 800 mL 230 mL NG: Stool: Drains: Balance: -610 mL 354 mL Respiratory support O2 Delivery Device: Endotracheal tube Ventilator mode: CMV/ASSIST/AutoFlow Vt (Set): 400 (400 - 400) mL Vt (Spontaneous): 653 (653 - 653) mL',2152-02-18 00:00:00,2152-02-18 07:24:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Tachycardia, Ventricular, Drainage procedure}",0.507093,Pharmacologic Substance
39,ABG: 7.33/64/95.',"ABG shows chronic respiratory acidosis which appears compensated but on exam he has increased work of breathing, accessory muscle use and distress.'",2152-02-18 07:24:00,2152-02-18 07:24:00,sentence,sentence,{Analysis of arterial blood gases and pH},"{Analysis of arterial blood gases and pH, Chronic respiratory acidosis}",0.83666,Laboratory Procedure


In [239]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [240]:
pair_idx = 21
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "Attending    Evaluated pt who had normal resp " +\
                    "with ABG no evidence of fatigue."
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

sentence 1:	ABG shows    chronic respiratory acidosis which appears compensated but on exam he    has increased work of breathing, accessory muscle use and distress.'
sentence 2:	Attending    Evaluated pt who was in severe resp distress with grunting, accessory    muscle use, with ABG evidence of fatigue.'

New contradicting sentence: Attending    Evaluated pt who had normal resp with ABG no evidence of fatigue.


## Patient 7

In [241]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[3] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 197325


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

********** Processing data for 2157-02-01 **********
********** Processing data for 2157-02-02 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.5477225575051663
----- Data i -----
>> Time: 2157-02-02 01:37:00
>> Type: sentence
>> Concepts: {'Infantile Neuroaxonal Dystrophy'}
>> Remainder of plan as outlined    above.'
----- Data j -----
>> Time: 2157-02-02 01:37:00
>> Type: sentence
>> Concepts: {'Infantile Neuroaxonal Dystrophy', 'carbidopa', 'Flagyl', 'BCX 34', 'Acute cholangitis', 'Mapap'}
>> Agree with plan to manage acute cholangitis with obstructing CBD stone    with broad abx coverage with vanco / zosyn / flagyl while awaiting BCx    and continuing hydration based on MAP / UOP.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.7745966692414834
----- Data i -----
>> Time: 2157-02-02 01:37:00
>> Type: sentence
>> Concepts: {'Ulcerative Colitis', 'Urinalysis'}
>> Getting UA and UC given urinary frequency.'
----- Data j -----
>> Time: 2157-02-

In [242]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about lab values
semantic_type_ids   = CONFLICT_TO_SEMANTIC_TYPE['test_results']
semantic_type_names = [SEMANTIC_TYPE_TO_NAME[st_id] for st_id in semantic_type_ids]

is_lab = df['semantic type'].apply(lambda x: x in semantic_type_names)
lab_pairs_df = df.loc[(df['type 1'] == "lab") | (df['type 2'] == "lab") | is_lab]

lab_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
39,Patient's Lipase lab came back 23 IU/L.,IVF bolus at 500cc this morning -Follow LFT s at this time and follow Lipase in the setting of modest epigastric discomfort ICU Care',2157-02-02,2157-02-02 10:02:00,lab,sentence,"{Lipase, lipase}","{Lipase, Assisted Reproductive Technologies, Liver Function Tests, lipase}",0.632456,Pharmacologic Substance
40,"Patient's Red Blood Cells lab came back 3.15 m/uL , which is abnormal.","RR: 15 (12 - 24) insp/min SpO2: 98% Heart rhythm: SB (Sinus Bradycardia) Total In: 564 mL 1,915 mL PO: TF: IVF: 564 mL 1,915 mL Blood products: Total out: 160 mL 545 mL Urine: 160 mL 545 mL NG: Stool: Drains: Balance: 404 mL 1,370 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 98%'",2157-02-02,2157-02-02 10:02:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance


In [243]:
lab_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
39,Patient's Lipase lab came back 23 IU/L.,IVF bolus at 500cc this morning -Follow LFT s at this time and follow Lipase in the setting of modest epigastric discomfort ICU Care',2157-02-02 00:00:00,2157-02-02 10:02:00,lab,sentence,"{Lipase, lipase}","{Lipase, Assisted Reproductive Technologies, Liver Function Tests, lipase}",0.632456,Pharmacologic Substance
40,"Patient's Red Blood Cells lab came back 3.15 m/uL , which is abnormal.","RR: 15 (12 - 24) insp/min SpO2: 98% Heart rhythm: SB (Sinus Bradycardia) Total In: 564 mL 1,915 mL PO: TF: IVF: 564 mL 1,915 mL Blood products: Total out: 160 mL 545 mL Urine: 160 mL 545 mL NG: Stool: Drains: Balance: 404 mL 1,370 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 98%'",2157-02-02 00:00:00,2157-02-02 10:02:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
41,"Patient's Red Blood Cells lab came back 3.15 m/uL , which is abnormal.","RR: 24 (12 - 24) insp/min SpO2: 98% Heart rhythm: SB (Sinus Bradycardia) Total In: 563 mL 1,051 mL PO: TF: IVF: 563 mL 1,051 mL Blood products: Total out: 160 mL 475 mL Urine: 160 mL 475 mL NG: Stool: Drains: Balance: 403 mL 576 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 98%'",2157-02-02 00:00:00,2157-02-02 07:51:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
42,Patient's Lactate lab came back 1.6 mmol/L.,- lactate in AM .',2157-02-02 00:00:00,2157-02-02 01:37:00,lab,sentence,{Lactate},{Lactate},1.0,Pharmacologic Substance
43,"Patient's Red Blood Cells lab came back 3.26 m/uL , which is abnormal.","RR: 15 (12 - 24) insp/min SpO2: 98% Heart rhythm: SB (Sinus Bradycardia) Total In: 564 mL 1,915 mL PO: TF: IVF: 564 mL 1,915 mL Blood products: Total out: 160 mL 545 mL Urine: 160 mL 545 mL NG: Stool: Drains: Balance: 404 mL 1,370 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 98%'",2157-02-02 00:00:00,2157-02-02 10:02:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
44,"Patient's Red Blood Cells lab came back 3.26 m/uL , which is abnormal.","RR: 24 (12 - 24) insp/min SpO2: 98% Heart rhythm: SB (Sinus Bradycardia) Total In: 563 mL 1,051 mL PO: TF: IVF: 563 mL 1,051 mL Blood products: Total out: 160 mL 475 mL Urine: 160 mL 475 mL NG: Stool: Drains: Balance: 403 mL 576 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 98%'",2157-02-02 00:00:00,2157-02-02 07:51:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
67,Right: Carotid 2+ Femoral 2+ Popliteal 2+ DP 2+ PT 2+',Hypertension - Pt with normal renal function coming with cholangitis.',2157-02-02 01:37:00,2157-02-02 01:37:00,sentence,sentence,"{carotid, Femur, Prothrombin time assay}","{Prothrombin time assay, Hypertet}",0.67082,Laboratory Procedure
68,"Left: Carotid 2+ Femoral 2+ Popliteal 2+ DP 2+ PT 2+ Labs / Radiology [image002.jpg] Other labs: Lactic Acid:4.3 mmol/L Assessment and Plan Mrs. [**Known firstname 12011**] [**Known lastname 12012**] is a very nice 85 year-old woman with significant past medical history of hypertension, cholecystectomy and ampullar stenosis wh ocomes with cholangitis now s/p stent removal.'",Right: Carotid 2+ Femoral 2+ Popliteal 2+ DP 2+ PT 2+',2157-02-02 01:37:00,2157-02-02 01:37:00,sentence,sentence,"{Femur, Prothrombin time assay, Infantile Neuroaxonal Dystrophy, Cholangitis, Excision, Laboratory test finding, Cholecystectomy procedure, Hypertet, carotid}","{carotid, Femur, Prothrombin time assay}",0.559017,Laboratory Procedure
69,"Elevated lactate - Pt with stable VS, cholangitis as above.'",Hypertension - Pt with normal renal function coming with cholangitis.',2157-02-02 01:37:00,2157-02-02 01:37:00,sentence,sentence,"{Prothrombin time assay, Cholangitis}","{Prothrombin time assay, Hypertet}",0.75,Laboratory Procedure
70,"Elevated lactate - Pt with stable VS, cholangitis as above.'",Right: Carotid 2+ Femoral 2+ Popliteal 2+ DP 2+ PT 2+',2157-02-02 01:37:00,2157-02-02 01:37:00,sentence,sentence,"{Prothrombin time assay, Cholangitis}","{carotid, Femur, Prothrombin time assay}",0.67082,Laboratory Procedure


In [244]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [245]:
pair_idx = 79
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "Labs normal range    for WBC as above, HCT 40, K+ 3.2, Cr 1.1, lactate 4.3."
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

lab 1:	Patient's White Blood Cells lab came back 12.0 K/uL , which is abnormal.
sentence 2:	Labs notable    for WBC as above, HCT 40, K+ 3.2, Cr 1.1, lactate 4.3.'

New contradicting sentence: Labs normal range    for WBC as above, HCT 40, K+ 3.2, Cr 1.1, lactate 4.3.


In [246]:
no_contradiction_pair_idx = [39, 42, 72, 78, 80]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")
    
# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
lab 1:	Patient's Lipase lab came back 23 IU/L.
sentence 2:	IVF bolus    at 500cc this morning    -Follow LFT s at this time and follow Lipase in the setting of modest    epigastric discomfort    ICU Care'
*****************************
lab 1:	Patient's Lactate lab came back 1.6 mmol/L.
sentence 2:	- lactate in AM    .'
*****************************
sentence 1:	Diverticulosis: Stable HCT, no signs of bleeding or inflammation.'
sentence 2:	Diverticulosis - Stable HCT, no signs of bleeding or inflammation.'
*****************************
lab 1:	Patient's PTT lab came back 35.5 sec , which is abnormal.
sentence 2:	3.2 mEq/L    17 mg/dL    110 mEq/L    143 mEq/L    30.3 %    12.0 K/uL         [image002.jpg]                               [**2157-2-2**]   04:53 AM    WBC    12.0    Hct    30.3    Plt    100    Cr    0.9    Glucose    109    Other labs: PT / PTT / INR:15.7/35.5/1.4, ALT / AST:128/268, Alk Phos /    T Bili:76/3.4, Lacti

## Patient 8

In [247]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[7] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 133857


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boole

********** Processing data for 2175-03-12 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.6123724356957945
----- Data i -----
>> Time: 2175-03-12 22:31:00
>> Type: sentence
>> Concepts: {'Dilantin', 'nimodipine', 'Infantile Neuroaxonal Dystrophy', 'Nimodipine'}
>>    Neurologic: Goal SBP < 140, Plan for angio tommorow, Nimodipine,    Dilantin, Hob >30 degrees.'
----- Data j -----
>> Time: 2175-03-12 22:31:00
>> Type: sentence
>> Concepts: {'Infantile Neuroaxonal Dystrophy'}
>>    Assessment And Plan: 69 year old male admitted with SAH'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 1.0
----- Data i -----
>> Time: 2175-03-12 00:00:00
>> Type: prescription
>> Concepts: {'nicardipine', 'niCARdipine'}
>> Patient was prescribed NiCARdipine IV 2.5mg/mL;10mL Amp IV DRIP of total 125 mg
----- Data j -----
>> Time: 2175-03-12 22:31:00
>> Type: sentence
>> Concepts: {'nicardipine', 'niCARdipine'}
>> Nicardipine gtt'
**********************************
****

In [248]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about lab values
semantic_type_ids   = CONFLICT_TO_SEMANTIC_TYPE['test_results']
semantic_type_names = [SEMANTIC_TYPE_TO_NAME[st_id] for st_id in semantic_type_ids]

is_lab = df['semantic type'].apply(lambda x: x in semantic_type_names)
lab_pairs_df = df.loc[(df['type 1'] == "lab") | (df['type 2'] == "lab") | is_lab]

lab_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
3,Patient's Red Blood Cells lab came back 4.63 m/uL.,ICP: 7 (7 - 16) mmHg Total In: 952 mL PO: TF: IVF: 252 mL Blood products: Total out: 0 mL 640 mL Urine: 620 mL NG: Stool: Drains: 20 mL Balance: 0 mL 312 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 99%',2175-03-12,2175-03-12 22:31:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
4,Patient's Red Blood Cells lab came back 4.87 m/uL.,ICP: 7 (7 - 16) mmHg Total In: 952 mL PO: TF: IVF: 252 mL Blood products: Total out: 0 mL 640 mL Urine: 620 mL NG: Stool: Drains: 20 mL Balance: 0 mL 312 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 99%',2175-03-12,2175-03-12 22:31:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance


In [249]:
lab_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
3,Patient's Red Blood Cells lab came back 4.63 m/uL.,ICP: 7 (7 - 16) mmHg Total In: 952 mL PO: TF: IVF: 252 mL Blood products: Total out: 0 mL 640 mL Urine: 620 mL NG: Stool: Drains: 20 mL Balance: 0 mL 312 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 99%',2175-03-12,2175-03-12 22:31:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
4,Patient's Red Blood Cells lab came back 4.87 m/uL.,ICP: 7 (7 - 16) mmHg Total In: 952 mL PO: TF: IVF: 252 mL Blood products: Total out: 0 mL 640 mL Urine: 620 mL NG: Stool: Drains: 20 mL Balance: 0 mL 312 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 99%',2175-03-12,2175-03-12 22:31:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
5,"Patient's White Blood Cells lab came back 31.3 K/uL , which is abnormal.",45.5 % 37.1 K/uL [image002.jpg] [**2175-3-12**] 09:32 PM WBC 37.1 Hct 45.5 Plt 109 Assessment and Plan',2175-03-12,2175-03-12 22:31:00,lab,sentence,{White Blood Cell Count procedure},"{Primed lymphocyte test, White Blood Cell Count procedure}",0.790569,Laboratory Procedure
6,Patient's WBC lab came back 0-2 #/hpf.,45.5 % 37.1 K/uL [image002.jpg] [**2175-3-12**] 09:32 PM WBC 37.1 Hct 45.5 Plt 109 Assessment and Plan',2175-03-12,2175-03-12 22:31:00,lab,sentence,{White Blood Cell Count procedure},"{Primed lymphocyte test, White Blood Cell Count procedure}",0.790569,Laboratory Procedure
7,"Patient's White Blood Cells lab came back 37.1 K/uL , which is abnormal.",45.5 % 37.1 K/uL [image002.jpg] [**2175-3-12**] 09:32 PM WBC 37.1 Hct 45.5 Plt 109 Assessment and Plan',2175-03-12,2175-03-12 22:31:00,lab,sentence,{White Blood Cell Count procedure},"{Primed lymphocyte test, White Blood Cell Count procedure}",0.790569,Laboratory Procedure
39,"Patient's Calcium, Total lab came back 8.7 mg/dL.",Calcium Gluconate 6.',2175-03-13,2175-03-13 05:13:00,lab,sentence,"{CALCIUM SUPPLEMENTS, Calcium}","{Calcium Gluconate, calcium gluconate}",0.632456,Pharmacologic Substance
40,"Patient's Phenytoin lab came back 9.9 ug/mL , which is abnormal.",Phenytoin 16.',2175-03-13,2175-03-13 05:13:00,lab,sentence,"{phenytoin, Phenytoin}","{phenytoin, Phenytoin}",1.0,Pharmacologic Substance
41,"Patient's Phenytoin lab came back 9.9 ug/mL , which is abnormal.",Phenytoin',2175-03-13,2175-03-13 05:13:00,lab,sentence,"{phenytoin, Phenytoin}","{phenytoin, Phenytoin}",1.0,Pharmacologic Substance
42,"Patient's Red Blood Cells lab came back 4.42 m/uL , which is abnormal.","ICP: 5 (5 - 16) mmHg Total In: 1,105 mL 499 mL PO: 100 mL Tube feeding: IV Fluid: 405 mL 399 mL Blood products: Total out: 943 mL 1,241 mL Urine: 910 mL 1,170 mL NG: Stool: Drains: 33 mL 71 mL Balance: 162 mL -742 mL Respiratory support O2 Delivery Device: Nasal cannula SPO2: 100%'",2175-03-13,2175-03-13 05:13:00,lab,sentence,"{Red blood cells, blood product}","{Tube feeding of patient, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.520266,Pharmacologic Substance
43,"Patient's Calcium, Total lab came back 8.6 mg/dL.",Calcium Gluconate 6.',2175-03-13,2175-03-13 05:13:00,lab,sentence,"{CALCIUM SUPPLEMENTS, Calcium}","{Calcium Gluconate, calcium gluconate}",0.632456,Pharmacologic Substance


In [250]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [251]:
pair_idx = 117
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "Standard WBC levels         no sign of CLL."
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

lab 1:	Patient's White Blood Cells lab came back 31.0 K/uL , which is abnormal.
sentence 2:	Persistently elevated WBC     [**2-6**]  CLL.'

New contradicting sentence: Standard WBC levels         no sign of CLL.


In [252]:
pair_idx = 212
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "WBC persistently    low sign of infection."
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

lab 1:	Patient's White Blood Cells lab came back 53.9 K/uL , which is abnormal.
sentence 2:	WBC persistently    elevated  [**2-6**]  CLL.'

New contradicting sentence: WBC persistently    low sign of infection.


## Patient 9

In [253]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[8] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 166389


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

********** Processing data for 2196-10-13 **********
********** Processing data for 2196-10-14 **********
***** PAIR INDEX 0 *****
Cosine similarity: 1.0
----- Data i -----
>> Time: 2196-10-14 22:36:00
>> Type: sentence
>> Concepts: {'Hyponatremia'}
>> Given hyponatremia, this is concerning for    new mental status changes, however, attention intact.'
----- Data j -----
>> Time: 2196-10-14 22:36:00
>> Type: sentence
>> Concepts: {'Hyponatremia'}
>> # Hyponatremia: No clear baseline.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.5773502691896258
----- Data i -----
>> Time: 2196-10-14 22:36:00
>> Type: sentence
>> Concepts: {'Pharmaceutical Preparations', 'Hyponatremia'}
>> Hyponatremia    HPI:    89 yo M admitted to medicine for hyponatremia of 114.'
----- Data j -----
>> Time: 2196-10-14 22:36:00
>> Type: sentence
>> Concepts: {'Hyponatremia'}
>> # Hyponatremia: No clear baseline.'
**********************************
***** PAIR INDEX 2 *****
Cosine si

In [254]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about lab values
semantic_type_ids   = CONFLICT_TO_SEMANTIC_TYPE['test_results']
semantic_type_names = [SEMANTIC_TYPE_TO_NAME[st_id] for st_id in semantic_type_ids]

is_lab = df['semantic type'].apply(lambda x: x in semantic_type_names)
lab_pairs_df = df.loc[(df['type 1'] == "lab") | (df['type 2'] == "lab") | is_lab]

lab_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
9,"Patient's Potassium lab came back 2.9 mEq/L , which is abnormal.",TTKG - (39 x 234) / (2.9 x 346) is 9 - suggesting significant renal loss of potassium.',2196-10-14,2196-10-14 22:36:00,lab,sentence,"{kalium, Potassium supplement}","{kalium, Potassium supplement}",1.0,Pharmacologic Substance
10,"Patient's Potassium lab came back 3.1 mEq/L , which is abnormal.",TTKG - (39 x 234) / (2.9 x 346) is 9 - suggesting significant renal loss of potassium.',2196-10-14,2196-10-14 22:36:00,lab,sentence,"{kalium, Potassium supplement}","{kalium, Potassium supplement}",1.0,Pharmacologic Substance


In [255]:
lab_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
9,"Patient's Potassium lab came back 2.9 mEq/L , which is abnormal.",TTKG - (39 x 234) / (2.9 x 346) is 9 - suggesting significant renal loss of potassium.',2196-10-14 00:00:00,2196-10-14 22:36:00,lab,sentence,"{kalium, Potassium supplement}","{kalium, Potassium supplement}",1.0,Pharmacologic Substance
10,"Patient's Potassium lab came back 3.1 mEq/L , which is abnormal.",TTKG - (39 x 234) / (2.9 x 346) is 9 - suggesting significant renal loss of potassium.',2196-10-14 00:00:00,2196-10-14 22:36:00,lab,sentence,"{kalium, Potassium supplement}","{kalium, Potassium supplement}",1.0,Pharmacologic Substance
11,"Patient's Red Blood Cells lab came back 4.14 m/uL , which is abnormal.",Denies any blood in his stool.',2196-10-14 00:00:00,2196-10-14 22:36:00,lab,sentence,"{Red blood cells, blood product}",{Blood Stop},0.534522,Pharmacologic Substance
12,"Patient's Red Blood Cells lab came back 4.14 m/uL , which is abnormal.","RR: 22 (19 - 22) insp/min SpO2: 99% Heart rhythm: SR (Sinus Rhythm) Wgt (current): 73 kg (admission): 73 kg Total In: PO: TF: IVF: Blood products: Total out: 0 mL 1,080 mL Urine: 80 mL NG: Stool: Drains: Balance: 0 mL -1,080 mL Respiratory O2 Delivery Device: None SpO2: 99% Physical Examination'",2196-10-14 00:00:00,2196-10-14 22:36:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
13,Patient's Folate lab came back 11.2 ng/mL.,"- Guaiac all stools - Iron, B12, folate, TSh studies - Continue to follow .'",2196-10-14 00:00:00,2196-10-14 22:36:00,lab,sentence,{folate},"{Iron, iron, Folate, folate, Scientific Study}",0.632456,Pharmacologic Substance
14,Patient's Potassium lab came back 4.0 mEq/L.,TTKG - (39 x 234) / (2.9 x 346) is 9 - suggesting significant renal loss of potassium.',2196-10-14 00:00:00,2196-10-14 22:36:00,lab,sentence,"{kalium, Potassium supplement}","{kalium, Potassium supplement}",1.0,Pharmacologic Substance
15,Patient's Potassium lab came back 3.8 mEq/L.,TTKG - (39 x 234) / (2.9 x 346) is 9 - suggesting significant renal loss of potassium.',2196-10-14 00:00:00,2196-10-14 22:36:00,lab,sentence,"{kalium, Potassium supplement}","{kalium, Potassium supplement}",1.0,Pharmacologic Substance
16,"Patient's Red Blood Cells lab came back 4.23 m/uL , which is abnormal.",Denies any blood in his stool.',2196-10-14 00:00:00,2196-10-14 22:36:00,lab,sentence,"{Red blood cells, blood product}",{Blood Stop},0.534522,Pharmacologic Substance
17,"Patient's Red Blood Cells lab came back 4.23 m/uL , which is abnormal.","RR: 22 (19 - 22) insp/min SpO2: 99% Heart rhythm: SR (Sinus Rhythm) Wgt (current): 73 kg (admission): 73 kg Total In: PO: TF: IVF: Blood products: Total out: 0 mL 1,080 mL Urine: 80 mL NG: Stool: Drains: Balance: 0 mL -1,080 mL Respiratory O2 Delivery Device: None SpO2: 99% Physical Examination'",2196-10-14 00:00:00,2196-10-14 22:36:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
19,"Pt evaluated by Renal, recs for 3% saline.'",# Hematuria: Pt was pulling on his foley catheter earlier.',2196-10-14 22:36:00,2196-10-14 22:36:00,sentence,sentence,"{Prothrombin time assay, Kidney}","{Prothrombin time assay, Hematuria}",0.75,Laboratory Procedure


In [256]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [257]:
no_contradiction_pair_idx = [16]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")
    
# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
lab 1:	Patient's Red Blood Cells lab came back 4.23 m/uL , which is abnormal.
sentence 2:	Denies any blood    in his stool.'
*****************************


## Patient 10

In [258]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[9] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 196357


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice 

********** Processing data for 2143-04-20 **********
********** Processing data for 2143-04-21 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.6546536707079772
----- Data i -----
>> Time: 2143-04-21 09:19:00
>> Type: sentence
>> Concepts: {'Heart failure', 'Hyponatremia'}
>>    Hyponatremia:  Likely due to heart failure.'
----- Data j -----
>> Time: 2143-04-21 09:19:00
>> Type: sentence
>> Concepts: {'Heart failure', 'Kidney Failure, Acute'}
>>    Acute Renal Failure:   [**Month (only) 60**]  be due to decompensated heart failure.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.8728715609439696
----- Data i -----
>> Time: 2143-04-21 01:24:00
>> Type: sentence
>> Concepts: {'Kidney Failure, Acute'}
>>    ARF: Picture consistent with pre-renal picture.'
----- Data j -----
>> Time: 2143-04-21 09:19:00
>> Type: sentence
>> Concepts: {'Heart failure', 'Kidney Failure, Acute'}
>>    Acute Renal Failure:   [**Month (only) 60**]  be due to decompensat

In [259]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about lab values
semantic_type_ids   = CONFLICT_TO_SEMANTIC_TYPE['test_results']
semantic_type_names = [SEMANTIC_TYPE_TO_NAME[st_id] for st_id in semantic_type_ids]

is_lab = df['semantic type'].apply(lambda x: x in semantic_type_names)
lab_pairs_df = df.loc[(df['type 1'] == "lab") | (df['type 2'] == "lab") | is_lab]

lab_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
21,Patient's Red Blood Cells lab came back 4.34 m/uL.,"Height: 65 Inch Total In: 1,467 mL PO: TF: IVF: 1,467 mL Blood products: Total out: 1,200 mL 200 mL Urine: 200 mL NG: Stool: Drains: Balance: -1,200 mL 1,267 mL Respiratory O2 Delivery Device: Nasal cannula SpO2: 99%'",2143-04-21,2143-04-21 09:19:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
22,Patient's Red Blood Cells lab came back 4.34 m/uL.,Neurologic: No(t) Seizure Flowsheet Data as of [**2143-4-21**] 01:04 AM Vital Signs Hemodynamic monitoring Fluid Balance 24 hours Since 12 AM Total In: PO: TF: IVF: Blood products: Total out: 0 mL 0 mL Urine: NG: Stool: Drains: Balance: 0 mL 0 mL Respiratory O2 Delivery Device: Nasal cannula Physical Examination',2143-04-21,2143-04-21 01:24:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance


In [260]:
lab_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
21,Patient's Red Blood Cells lab came back 4.34 m/uL.,"Height: 65 Inch Total In: 1,467 mL PO: TF: IVF: 1,467 mL Blood products: Total out: 1,200 mL 200 mL Urine: 200 mL NG: Stool: Drains: Balance: -1,200 mL 1,267 mL Respiratory O2 Delivery Device: Nasal cannula SpO2: 99%'",2143-04-21 00:00:00,2143-04-21 09:19:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
22,Patient's Red Blood Cells lab came back 4.34 m/uL.,Neurologic: No(t) Seizure Flowsheet Data as of [**2143-4-21**] 01:04 AM Vital Signs Hemodynamic monitoring Fluid Balance 24 hours Since 12 AM Total In: PO: TF: IVF: Blood products: Total out: 0 mL 0 mL Urine: NG: Stool: Drains: Balance: 0 mL 0 mL Respiratory O2 Delivery Device: Nasal cannula Physical Examination',2143-04-21 00:00:00,2143-04-21 01:24:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
47,"Patient's Asparate Aminotransferase (AST) lab came back 55 IU/L , which is abnormal.","She denies any weight gain, however, reports several pound unintentional weight loss over the ast 3 weeks in the setting anorexia.'",2143-04-21 00:00:00,2143-04-21 01:24:00,lab,sentence,{Aspartate aminotransferase measurement},"{Anorexia, Aspartate aminotransferase measurement}",0.866025,Laboratory Procedure
48,Patient's PTT lab came back 29.5 sec.,"36.5 % 12.2 g/dL 122 mg/dL 1.5 mg/dL 44 mg/dL 29 mEq/L 84 mEq/L 4.3 mEq/L 120 mEq/L 12.5 K/uL [image002.jpg] [**2143-4-21**] 12:58 AM [**2143-4-21**] 04:50 AM WBC 12.5 Hct 36.5 Plt 163 Cr 1.5 1.5 TropT 0.03 Glucose 130 122 Other labs: PT / PTT / INR:14.7/29.5/1.3, CK / CKMB / Troponin-T:30/2/0.03, ALT / AST:28/55, Alk Phos / T Bili:161/0.4, Amylase / Lipase:27/13, Albumin:2.9 g/dL, LDH:347 IU/L, Ca++:8.2 mg/dL, Mg++:3.1 mg/dL, PO4:5.1 mg/dL'",2143-04-21 00:00:00,2143-04-21 09:19:00,lab,sentence,{Activated Partial Thromboplastin Time measurement},"{Lipase, Prothrombin time assay, White Blood Cell Count procedure, Amylases, Activated Partial Thromboplastin Time measurement, Primed lymphocyte test, Laboratory test finding, amylase}",0.526235,Laboratory Procedure
49,"Patient's White Blood Cells lab came back 12.5 K/uL , which is abnormal.","WBC 13, Cr 1.6 (baseline 1.0).'",2143-04-21 00:00:00,2143-04-21 01:24:00,lab,sentence,{White Blood Cell Count procedure},{White Blood Cell Count procedure},1.0,Laboratory Procedure
50,"Neurologic: Attentive, Responds to: Verbal stimuli, Oriented (to): person and place, Movement: Not assessed, Tone: Not assessed Labs / Radiology 155 141'","Neurologic: Responds to: Not assessed, Movement: Not assessed, Tone: Not assessed, Lethargic Labs / Radiology 163 K/uL'",2143-04-21 01:24:00,2143-04-21 09:19:00,sentence,sentence,{Laboratory test finding},"{Lethargy, Laboratory test finding}",0.866025,Laboratory or Test Result
78,"Patient's Red Blood Cells lab came back 4.14 m/uL , which is abnormal.","Height: 65 Inch Total In: 2,313 mL 589 mL PO: 60 mL 60 mL TF: IVF: 2,013 mL 529 mL Blood products: Total out: 1,180 mL 940 mL Urine: 1,180 mL 940 mL NG: Stool: Drains: Balance: 1,133 mL -351 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 99%'",2143-04-22 00:00:00,2143-04-22 06:59:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
79,"Patient's Red Blood Cells lab came back 4.14 m/uL , which is abnormal.","Height: 65 Inch Total In: 2,313 mL 617 mL PO: 60 mL 60 mL TF: IVF: 2,013 mL 557 mL Blood products: Total out: 1,180 mL 1,115 mL Urine: 1,180 mL 1,115 mL NG: Stool: Drains: Balance: 1,133 mL -498 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 98%'",2143-04-22 00:00:00,2143-04-22 09:04:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
80,"Patient's Red Blood Cells lab came back 3.53 m/uL , which is abnormal.","Height: 65 Inch Total In: 2,313 mL 589 mL PO: 60 mL 60 mL TF: IVF: 2,013 mL 529 mL Blood products: Total out: 1,180 mL 940 mL Urine: 1,180 mL 940 mL NG: Stool: Drains: Balance: 1,133 mL -351 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 99%'",2143-04-22 00:00:00,2143-04-22 06:59:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance
81,"Patient's Red Blood Cells lab came back 3.53 m/uL , which is abnormal.","Height: 65 Inch Total In: 2,313 mL 617 mL PO: 60 mL 60 mL TF: IVF: 2,013 mL 557 mL Blood products: Total out: 1,180 mL 1,115 mL Urine: 1,180 mL 1,115 mL NG: Stool: Drains: Balance: 1,133 mL -498 mL Respiratory support O2 Delivery Device: Nasal cannula SpO2: 98%'",2143-04-22 00:00:00,2143-04-22 09:04:00,lab,sentence,"{Red blood cells, blood product}","{Assisted Reproductive Technologies, Blood Stop Topical Product, Systane Balance, Uridine, Blood product, Drainage procedure}",0.534522,Pharmacologic Substance


In [261]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [262]:
pair_idx = 48
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "36.5 %    12.2 g/dL    122 mg/dL    1.5 mg/dL    44 mg/dL    29 mEq/L    " +\
                    "84 mEq/L    4.3 mEq/L    120 mEq/L    12.5 K/uL         [image002.jpg]                               " +\
                    "[**2143-4-21**]   12:58 AM                               [**2143-4-21**]   04:50 AM    " +\
                    "WBC                                     12.5    Hct                                     36.5    " +\
                    "Plt                                      163    Cr                                      " +\
                    "1.5                                      1.5    TropT                                     0.03    " +\
                    "Glucose                                      130                                      122    " +\
                    "Other labs: PT / PTT / INR:14.7/290.0/1.3, CK / CKMB /    Troponin-T:30/2/0.03, ALT / AST:28/55, Alk Phos " +\
                    "/ T Bili:161/0.4,    Amylase / Lipase:27/13, Albumin:2.9 g/dL, LDH:347 IU/L, Ca++:8.2 mg/dL,    " +\
                    "Mg++:3.1 mg/dL, PO4:5.1 mg/dL'"
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

lab 1:	Patient's PTT lab came back 29.5 sec.
sentence 2:	36.5 %    12.2 g/dL    122 mg/dL    1.5 mg/dL    44 mg/dL    29 mEq/L    84 mEq/L    4.3 mEq/L    120 mEq/L    12.5 K/uL         [image002.jpg]                               [**2143-4-21**]   12:58 AM                               [**2143-4-21**]   04:50 AM    WBC                                     12.5    Hct                                     36.5    Plt                                      163    Cr                                      1.5                                      1.5    TropT                                     0.03    Glucose                                      130                                      122    Other labs: PT / PTT / INR:14.7/29.5/1.3, CK / CKMB /    Troponin-T:30/2/0.03, ALT / AST:28/55, Alk Phos / T Bili:161/0.4,    Amylase / Lipase:27/13, Albumin:2.9 g/dL, LDH:347 IU/L, Ca++:8.2 mg/dL,    Mg++:3.1 mg/dL, PO4:5.1 mg/dL'

New contradicting sentence: 36.5 %    12.2 g/dL    122 mg/dL    1.5 mg/dL 

In [263]:
no_contradiction_pair_idx = [48, 49, 104]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")
    
# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
lab 1:	Patient's PTT lab came back 29.5 sec.
sentence 2:	36.5 %    12.2 g/dL    122 mg/dL    1.5 mg/dL    44 mg/dL    29 mEq/L    84 mEq/L    4.3 mEq/L    120 mEq/L    12.5 K/uL         [image002.jpg]                               [**2143-4-21**]   12:58 AM                               [**2143-4-21**]   04:50 AM    WBC                                     12.5    Hct                                     36.5    Plt                                      163    Cr                                      1.5                                      1.5    TropT                                     0.03    Glucose                                      130                                      122    Other labs: PT / PTT / INR:14.7/29.5/1.3, CK / CKMB /    Troponin-T:30/2/0.03, ALT / AST:28/55, Alk Phos / T Bili:161/0.4,    Amylase / Lipase:27/13, Albumin:2.9 g/dL, LDH:347 IU/L, Ca++:8.2 mg/dL,    Mg++:3.1 mg/dL, PO4:5.1 mg/dL'
**************

In [267]:
import pickle
data_dict_file = "generated_data_dict_lab.pkl"
with open(data_dict_file, "wb") as f:
    pickle.dump(generated_data_dict, f)

# 5. Loading contradictions data for pipeline [skip 4 if pickle file already created]

If `generated_data_dict_lab.pkl` has already been created, skip part 4. You should still run the inital cells, above "README" in that section though.

About 2 min per HADM_ID, 20 min total

In [21]:
# 9 - positive examples
# 16 - negative examples

In [25]:
import pickle
data_dict_file = "generated_data_dict_lab.pkl"
with open(data_dict_file, "rb") as f:
    generated_data_dict = pickle.load(f)

In [40]:
generated_dataset = [] # list of tuples, ((data 1, data 2), label)

for hadm_id in hadm_ids[:10]:
    print("***********************************")
    print(f"Patient {int(hadm_id)}")
    try:
        hadm_generated_dict = generated_data_dict[int(hadm_id)]
    except KeyError:
        print("This patient does not exist in contradiction set.")
        continue
        
    # Step 1: Select a patient -- process all data
    pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
                  med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
                  physician_only=True)

    # Step 2: Generate pairs for this patient
    df, data_inst_pairs = generate_data_pairs(pat)

    # Step 3A: Insert contradictions 
    print("+++++ Inserting contradictions +++++")
    for pair_idx, (is_sentence2, contradicting_txt) in hadm_generated_dict['contradiction'].items():
        data_1 = data_inst_pairs[pair_idx][0][0]
        data_2 = data_inst_pairs[pair_idx][0][1]

        print(f"{data_1.type} 1:\t{data_1.txt}")
        print(f"{data_2.type} 2:\t{data_2.txt}")

        sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

        # Set `contradicting_txt` to the new contradicting sentence.
        # Update text and reprocess features.
        sentence_to_modify.update_text(contradicting_txt, True)

        print(f"\nNew contradicting sentence: {contradicting_txt}")
        print("+++++++++++++++++++++++++++++++++++")
        
        # Add example to dataset
        if is_sentence2:
            sentences = (data_1, sentence_to_modify)
        else:
            sentences = (sentence_to_modify, data_2)
        generated_dataset.append((sentences, 1)) # these are all contradictions
    
    # Step 3B: Insert negative examples (not contradictions)
    print("+++++ Inserting negative examples +++++")
    for pair_idx in hadm_generated_dict['none']:
        data_1 = data_inst_pairs[pair_idx][0][0]
        data_2 = data_inst_pairs[pair_idx][0][1]

        print(f"{data_1.type} 1:\t{data_1.txt}")
        print(f"{data_2.type} 2:\t{data_2.txt}")
        
        generated_dataset.append(((data_1, data_2), 0))
        print("+++++++++++++++++++++++++++++++++++")

***********************************
Patient 154802
This patient does not exist in contradiction set.
***********************************
Patient 133857


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boole

********** Processing data for 2175-03-12 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.6123724356957945
----- Data i -----
>> Time: 2175-03-12 22:31:00
>> Type: sentence
>> Concepts: {'Nimodipine', 'nimodipine', 'Infantile Neuroaxonal Dystrophy', 'Dilantin'}
>>    Neurologic: Goal SBP < 140, Plan for angio tommorow, Nimodipine,    Dilantin, Hob >30 degrees.'
----- Data j -----
>> Time: 2175-03-12 22:31:00
>> Type: sentence
>> Concepts: {'Infantile Neuroaxonal Dystrophy'}
>>    Assessment And Plan: 69 year old male admitted with SAH'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 1.0
----- Data i -----
>> Time: 2175-03-12 00:00:00
>> Type: prescription
>> Concepts: {'nicardipine', 'niCARdipine'}
>> Patient was prescribed NiCARdipine IV 2.5mg/mL;10mL Amp IV DRIP of total 125 mg
----- Data j -----
>> Time: 2175-03-12 22:31:00
>> Type: sentence
>> Concepts: {'nicardipine', 'niCARdipine'}
>> Nicardipine gtt'
**********************************
****

  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]



New contradicting sentence: Standard WBC levels         no sign of CLL.
+++++++++++++++++++++++++++++++++++
lab 1:	Patient's Phenytoin lab came back 7.0 ug/mL , which is abnormal.
sentence 2:	Phenytoin 100    mg PO TID  Order date:  [**3-15**]  @ 0908'

New contradicting sentence: WBC persistently    low sign of infection.
+++++++++++++++++++++++++++++++++++
+++++ Inserting negative examples +++++
***********************************
Patient 166389


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

********** Processing data for 2196-10-13 **********
********** Processing data for 2196-10-14 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.5773502691896258
----- Data i -----
>> Time: 2196-10-14 22:36:00
>> Type: sentence
>> Concepts: {'Pharmaceutical Preparations', 'Hyponatremia'}
>> Hyponatremia    HPI:    89 yo M admitted to medicine for hyponatremia of 114.'
----- Data j -----
>> Time: 2196-10-14 22:36:00
>> Type: sentence
>> Concepts: {'Hyponatremia'}
>> Given hyponatremia, this is concerning for    new mental status changes, however, attention intact.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 1.0
----- Data i -----
>> Time: 2196-10-14 22:36:00
>> Type: sentence
>> Concepts: {'Hyponatremia'}
>> # Hyponatremia: No clear baseline.'
----- Data j -----
>> Time: 2196-10-14 22:36:00
>> Type: sentence
>> Concepts: {'Hyponatremia'}
>> Given hyponatremia, this is concerning for    new mental status changes, however, attention intact.'
****

  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice 

********** Processing data for 2143-04-20 **********
********** Processing data for 2143-04-21 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.6172133998483676
----- Data i -----
>> Time: 2143-04-21 09:19:00
>> Type: sentence
>> Concepts: {'Kidney Failure, Acute', 'Heart failure'}
>>    Acute Renal Failure:   [**Month (only) 60**]  be due to decompensated heart failure.'
----- Data j -----
>> Time: 2143-04-21 09:19:00
>> Type: sentence
>> Concepts: {'Chronic diastolic heart failure', 'Tomorrow', 'Kidney'}
>> Renal US to r/o obstruction    Acute on chronic diastolic heart failure:  Check echo tomorrow.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.6546536707079772
----- Data i -----
>> Time: 2143-04-21 09:19:00
>> Type: sentence
>> Concepts: {'Heart failure', 'Hyponatremia'}
>>    Hyponatremia:  Likely due to heart failure.'
----- Data j -----
>> Time: 2143-04-21 09:19:00
>> Type: sentence
>> Concepts: {'Kidney Failure, Acute', 'Heart failure

  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]



New contradicting sentence: 36.5 %    12.2 g/dL    122 mg/dL    1.5 mg/dL    44 mg/dL    29 mEq/L    84 mEq/L    4.3 mEq/L    120 mEq/L    12.5 K/uL         [image002.jpg]                               [**2143-4-21**]   12:58 AM                               [**2143-4-21**]   04:50 AM    WBC                                     12.5    Hct                                     36.5    Plt                                      163    Cr                                      1.5                                      1.5    TropT                                     0.03    Glucose                                      130                                      122    Other labs: PT / PTT / INR:14.7/290.0/1.3, CK / CKMB /    Troponin-T:30/2/0.03, ALT / AST:28/55, Alk Phos / T Bili:161/0.4,    Amylase / Lipase:27/13, Albumin:2.9 g/dL, LDH:347 IU/L, Ca++:8.2 mg/dL,    Mg++:3.1 mg/dL, PO4:5.1 mg/dL'
+++++++++++++++++++++++++++++++++++
+++++ Inserting negative examples +++++
sentence 1:	confusion, let

In [45]:
n = len(generated_dataset)
n_negatives = len(list(filter(lambda x: x[1]==0, generated_dataset)))
n_positives = len(list(filter(lambda x: x[1]==1, generated_dataset)))

print(f"We have {n} total examples\n\t- {n_negatives} negative examples\n\t- {n_positives} positive examples")

We have 25 total examples
	- 16 negative examples
	- 9 positive examples


# 6. Generating evaluation data (unlabeled) from MIMIC

We'll avoid the first 10 patients since they were used for generated contradictions

In [46]:
per_pat_dataset_dict = {} # maps HADMID to patient's dataset in the form [((data 1, data 2), label), ...]
df_list = []
for hadm_id in hadm_ids[10:30]:
    print("***********************************")
    print(f"Patient {int(hadm_id)}")
        
    # Step 1: Select a patient -- process all data
    pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
                  med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
                  physician_only=True)

    # Step 2: Generate pairs for this patient
    df, data_inst_pairs = generate_data_pairs(pat)
    df['HADM_ID'] = hadm_id
    per_pat_dataset_dict[hadm_id] = data_inst_pairs
    df_list.append(df)

***********************************
Patient 162197


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

********** Processing data for 2189-09-07 **********
********** Processing data for 2189-09-08 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.9999999999999998
----- Data i -----
>> Time: 2189-09-08 00:49:00
>> Type: sentence
>> Concepts: {'Communicable Diseases'}
>> Most likely with ascending GU infection.'
----- Data j -----
>> Time: 2189-09-08 00:49:00
>> Type: sentence
>> Concepts: {'Communicable Diseases'}
>> Also sepsis criteria given identification of GU infection.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 1.0
----- Data i -----
>> Time: 2189-09-08 00:49:00
>> Type: sentence
>> Concepts: {'Pyelonephritis'}
>> Findings c/w pyelonephritis with    ureteritis bilaterally.'
----- Data j -----
>> Time: 2189-09-08 00:49:00
>> Type: sentence
>> Concepts: {'Pyelonephritis'}
>> She had a grossly positive U/A, and CT Abd/Pelvis    showed evidence of bilateral pyelonephritis.'
**********************************
***** PAIR INDEX 2 *****
Cosine s

In [51]:
df

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type,HADM_ID
0,Most likely with ascending GU infection.',Also sepsis criteria given identification of GU infection.',2189-09-08 00:49:00,2189-09-08 00:49:00,sentence,sentence,{Communicable Diseases},{Communicable Diseases},1.0,Disease or Syndrome,162197.0
1,Findings c/w pyelonephritis with ureteritis bilaterally.',"She had a grossly positive U/A, and CT Abd/Pelvis showed evidence of bilateral pyelonephritis.'",2189-09-08 00:49:00,2189-09-08 00:49:00,sentence,sentence,{Pyelonephritis},{Pyelonephritis},1.0,Disease or Syndrome,162197.0
2,No recent UTIs or hx of resistent organisms.',She denies ever having a urinary tract infection.',2189-09-08 00:49:00,2189-09-08 00:49:00,sentence,sentence,{Urinary tract infection},{Urinary tract infection},1.0,Disease or Syndrome,162197.0
3,# Pyelonephritis/Sepsis: Pt w/ signs and sx of UTI for several days.',She denies ever having a urinary tract infection.',2189-09-08 00:49:00,2189-09-08 00:49:00,sentence,sentence,"{Pyelonephritis, Urinary tract infection, Physical therapy}",{Urinary tract infection},0.707107,Disease or Syndrome,162197.0
4,# Pyelonephritis/Sepsis: Pt w/ signs and sx of UTI for several days.',No recent UTIs or hx of resistent organisms.',2189-09-08 00:49:00,2189-09-08 00:49:00,sentence,sentence,"{Pyelonephritis, Urinary tract infection, Physical therapy}",{Urinary tract infection},0.707107,Disease or Syndrome,162197.0
...,...,...,...,...,...,...,...,...,...,...,...
111,Pt currently on RA and satting well.',"Pt [**Name (NI) 2919**] indicates hyopdensity in L upper pole that could be developing into an abscess, but no dicrete abscess seen.'",2189-09-09 07:23:00,2189-09-09 07:23:00,sentence,sentence,"{Physical therapy, Refractory anemias}","{Abscess, Physical therapy}",0.57735,Therapeutic or Preventive Procedure,162197.0
112,Pt currently on RA and satting well.',"Pt has minimal bibasilar crackles on exam, possible mild pulmonary edema.'",2189-09-09 07:23:00,2189-09-09 07:23:00,sentence,sentence,"{Physical therapy, Refractory anemias}",{Physical therapy},0.707107,Therapeutic or Preventive Procedure,162197.0
113,Pt currently on RA and satting well.',# Dispo: Pt can either be called out to the floor or d/c directly from ICU if no O2 requirement and hemodynamically stable.',2189-09-09 07:23:00,2189-09-09 07:23:00,sentence,sentence,"{Physical therapy, Refractory anemias}",{Physical therapy},0.707107,Therapeutic or Preventive Procedure,162197.0
114,Pt currently on RA and satting well.',Pt will need close follow-up if symptoms persist.',2189-09-09 07:23:00,2189-09-09 07:23:00,sentence,sentence,"{Physical therapy, Refractory anemias}",{Physical therapy},0.707107,Therapeutic or Preventive Procedure,162197.0


### Getting History + Allergy Information - @Sharon, you can ignore everytihng below

In [116]:
# todo: 
# - DONE function to re-process all data from Patient instance -- pat.process_notes(); pat.process_by_date()
# - function to update Note -- should update dataframe of patient directly
#   - can go back to dataframe, but can't map tokenized sentence to original note in df -- todo
#   - function to update tokenized sentence
# - later: function to update original dataframe from patient dataframe

import re

def get_section(regex_dict, txt):
    """ Given a dictionary of start and end regex's for a
        particular section, gets the start and endpoint of 
        section in the text and returns indices. 
        Returns None if section does not exist.
    """
    try:
        start    = re.search(regex_dict["start"], txt).start()
        end      = re.search(regex_dict["end"],   txt).start()
    except AttributeError:
        start, end = None, None
    
    return start, end

note = pat.notes[4]

# Sections to store 
# note: most of these sections have already been removed,
#       but if they haven't might have to remove then 
#       reprocess everything
allergy_regex = {"start": "Allergies:",
                 "end":   "Last dose of Antibiotics:"}
history_regex = {"start": "Past medical history:",
                 "end":   "Other:"}

allergy_start, allergy_end = get_section(allergy_regex, note.txt)
history_start, history_end = get_section(history_regex, note.txt)

pt_allergies = "" if allergy_start is None else note.txt[allergy_start:allergy_end]
pt_histories = "" if history_start is None else note.txt[history_start:history_end]

print("******** Allergies ********")
print(pt_allergies[:100])
print("******** Histories ********")
print(pt_histories[:100])

******** Allergies ********
Allergies:
   Bactrim (Oral) (Sulfamethoxazole/Trimethoprim)
   Nausea/Vomiting
   Amiodarone
   Ras
******** Histories ********

