In [1]:
# Import libraries
import os
import re
import random
import pickle
import subprocess
import numpy as np
import pandas as pd
import datetime as dt

from tqdm import tqdm
from datetime import datetime
from collections import Counter

# 1. Setup concept extractors

Some options were [MetaMap](https://metamap.nlm.nih.gov/) and [spaCy](https://spacy.io/). 

[MetaMap](https://metamap.nlm.nih.gov/) is specific to recognizing UMLS concepts. There is a [Python wrapper](https://github.com/AnthonyMRios/pymetamap), but known to be slow and bad.

[spaCy](https://spacy.io/) is a popular NLP Python package with an extensive library for named entity recognition. It has a wide variety of [extensions](https://spacy.io/universe) and models to choose from. We're going with the following.

* [scispaCy](https://spacy.io/universe/project/scispacy) contains spaCy models for processing biomedical, scientific or clinical text. It seems easy to use and has a wide variety of concepts it can recognize, including UMLS, RxNorm, etc.

* [negspaCy](https://spacy.io/universe/project/negspacy) identifies negations using some extension of regEx. Probably useful for things like, "this pt is diabetic" v. "this pt is not diabetic." [todo: negation identification of medspacy might be better, https://github.com/medspacy/medspacy]

* [Med7](https://github.com/kormilitzin/med7) is a model trained for recognizing entities in prescription text, e.g. identifies drug name, dosage, duration, etc., which could be useful stuff to check for conflicts. 

We're going with spaCy for this.. and coming up with a coherent way to integrate entities picked up by these three extensions/models.

## i) Installations

In [2]:
import sys; sys.executable

'/opt/conda/envs/opennotes/bin/python'

In [3]:
import spacy
import scispacy

from pprint import pprint
from collections import OrderedDict

from spacy import displacy
# from scispacy.abbreviation import AbbreviationDetector # UMLS already contains abbrev. detect
from scispacy.umls_linking import UmlsEntityLinker

# should be 2.3.5 and >=0.3.0
spacy.__version__, scispacy.__version__

('2.3.5', '0.3.0')

## ii) Setting up the model

The model is used to form word/sentence embeddings for the NER task. Thus, it's important to choose model that has been tuned for our specific use case (e.g. clinical text, prescription information) so the embeddings are useful for naming the entity.

[Note to self:] one potential idea to look into if we have time remaining, something about using custom model for spacy pipeline (could we do smth with the romanov models since they've been trained specifically for conflict detection?) -- https://spacy.io/usage/v3

### a) scispaCy

For scispaCy, we set up one of their models that has been trained on biomedical data. Other models can be found [here](https://allenai.github.io/scispacy/). 

We load two models since we will be linking different entity linkers (knowledge bases that link text to named entites) later.

In [4]:
## uncomment to install model if not already installed
# !/opt/conda/envs/opennotes/bin/python -m pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.5/en_core_sci_sm-0.2.5.tar.gz

In [5]:
# for umls (general biomedical concepts)
umls_nlp   = spacy.load("en_core_sci_sm")

# for rxnorm (prescriptions)
rxnorm_nlp = spacy.load("en_core_sci_sm")

### b) Med7

For Med7, we set up their model that has been trained specifically for NER of medication-related concepts: dosage, drug names, duration, form, frequency, route of administration, and strength. The model is trained on MIMIC-III, so it should work well for us.

In [6]:
# # installs Med7 model
!/opt/conda/envs/opennotes/bin/python3.7 -m pip install https://www.dropbox.com/s/xbgsy6tyctvrqz3/en_core_med7_lg.tar.gz?dl=1

Collecting https://www.dropbox.com/s/xbgsy6tyctvrqz3/en_core_med7_lg.tar.gz?dl=1
  Downloading https://www.dropbox.com/s/xbgsy6tyctvrqz3/en_core_med7_lg.tar.gz?dl=1 (892.8 MB)
[K     |████████████████████████████████| 892.8 MB 7.3 kB/s s eta 0:00:01    |██████████████                  | 392.1 MB 69.8 MB/s eta 0:00:08     |███████████████████████████████▏| 871.0 MB 316 kB/s eta 0:01:09
You should consider upgrading via the '/opt/conda/envs/opennotes/bin/python3.7 -m pip install --upgrade pip' command.[0m


In [7]:
sys.executable

'/opt/conda/envs/opennotes/bin/python'

In [8]:
med7_nlp = spacy.load("en_core_med7_lg")

## iii) Adding an entity linker

The EntityLinker is a spaCy component that links to a knowledge base. The linker compares words with the concepts in the specified knowledge base (e.g. scispaCy's UMLS does some form of character overlap-based nearest neighbor search, has option to resolve abbreviations first).

[Note: Entities generally get resolved to a list of different entities. This [blog post](http://sujitpal.blogspot.com/2020/08/disambiguating-scispacy-umls-entities.html) describes one potential way to disambiguate this by figuring out "most likely" set of entities. Gonna start off with just resolving to the 1st entity tho... hopefully that's sufficient.]

### a) scispaCy

#### UMLS Linker

UMLS linker maps entities to the UMLS concept. Main parts we'll be interested in are: semantic type and concept (mainly the common name, maybe the CUI might become important later).

* _Semantic type_ is the broader category that the entity falls under, e.g. disease, pharmacologic substance, etc. See [this](https://metamap.nlm.nih.gov/Docs/SemanticTypes_2018AB.txt) for a full list.

* _Concepts_ refer to the more fundamental entity itself, e.g. pneumothorax, ventillator, etc. Many concepts can fall under a semantic type.

More info on `UmlsEntityLinker` ([source code](https://github.com/allenai/scispacy/blob/4ade4ec897fa48c2ecf3187caa08a949920d126d/scispacy/linking.py#L9))

See source code for `.jsonl` file with the knowledge base.

In [9]:
from scispacy.umls_linking import UmlsEntityLinker

# abbreviation_pipe = AbbreviationDetector(nlp) # automatically included with UMLS linker
# nlp.add_pipe(abbreviation_pipe)
umls_linker = UmlsEntityLinker(k=10,                          # number of nearest neighbors to look up from
                               threshold=0.7,                 # confidence threshold to be added as candidate
                               max_entities_per_mention=1,    # number of entities returned per concept (todo: tune)
                               filter_for_definitions=False,  # no definition is OK
                               resolve_abbreviations=True)    # resolve abbreviations before linking
umls_nlp.add_pipe(umls_linker)

https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/linkers/2020-10-09/umls/tfidf_vectors_sparse.npz not found in cache, downloading to /tmp/tmpceyc9jdd
Finished download, copying /tmp/tmpceyc9jdd to cache at /home/dianaflores71798/.scispacy/datasets/e9f7327283e43f0482f7c0c71b71dec278a58ccb3ffdd03c2c2350159e7ef146.f2a350ad19015b2591545f7feeed6a6d6d2fffcd635d868a5d7fc0dfc3cadfd8.tfidf_vectors_sparse.npz
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/linkers/2020-10-09/umls/nmslib_index.bin not found in cache, downloading to /tmp/tmp8bfisdlm
Finished download, copying /tmp/tmp8bfisdlm to cache at /home/dianaflores71798/.scispacy/datasets/f48455d6c79262057cce66b4619123c2b558b21092d42fac97f47bb99a5b8f9f.dd70d3dffe7d90d7ac8914460e16a48375dab32485fb6313a34e6fbcaf53218b.nmslib_index.bin
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/linkers/2020-10-09/umls/tfidf_vectorizer.joblib not found in cache, downloading to /tmp/tmp1mce318j
Finished download, copying /tmp/tmp1mce3



https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/linkers/2020-10-09/umls/concept_aliases.json not found in cache, downloading to /tmp/tmp57gxwrb0
Finished download, copying /tmp/tmp57gxwrb0 to cache at /home/dianaflores71798/.scispacy/datasets/1428ec15d3b1061731ea273c03699130b3d6b90948993e74bda66af605ff8e2a.aeb7a686c654df6bccb6c2c23d3eda3eb381daaefda4592b58158d0bee53b352.concept_aliases.json
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/kbs/2020-10-09/umls_2020_aa_cat0129.jsonl not found in cache, downloading to /tmp/tmpzwfe6t6n
Finished download, copying /tmp/tmpzwfe6t6n to cache at /home/dianaflores71798/.scispacy/datasets/4d7fb8fcae1035d1e0a47d9072b43d5a628057d35497fbfb2499b4b7b2dd4dd7.05ec7eef12f336d4666da85b7fa69b9401883a7dd4244473f7b88b413ccbba03.umls_2020_aa_cat0129.jsonl
https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/data/umls_semantic_type_tree.tsv not found in cache, downloading to /tmp/tmpnve9ywt_
Finished download, copying /tmp/tmpnve9ywt_ to cache at /

#### RxNorm Linker

RxNorm linker maps entities to RxNorm, an ontology for clinical drug names. It contains about 100k concepts for normalized names for clinical drugs. It is comprised of several other drug vocabularies commonly used in pharmacy management and drug interaction, including First Databank, Micromedex, and the Gold Standard Drug Database.

More info on `RxNorm` ([NIH page](https://www.nlm.nih.gov/research/umls/rxnorm/index.html), [source code](https://github.com/allenai/scispacy/blob/2290a80cfe0948e48d8ecfbd60064019d57a6874/scispacy/linking_utils.py#L120))

See source code for `.jsonl` file with the knowledge base.

In [10]:
from scispacy.linking import EntityLinker

# rxnorm_linker = EntityLinker(resolve_abbreviations=True, name="rxnorm")
rxnorm_linker = EntityLinker(k=10,                          # number of nearest neighbors to look up from
                             threshold=0.7,                 # confidence threshold to be added as candidate
                             max_entities_per_mention=1,    # number of entities returned per concept (todo: tune)
                             filter_for_definitions=False,  # no definition is OK
                             resolve_abbreviations=True,    # resolve abbreviations before linking
                             name="rxnorm")                 # RxNorm ontology

rxnorm_nlp.add_pipe(rxnorm_linker)



### b) Med7 

No need for entity linker

### c) Negspacy [TODO]

# 2. Setup data structures

## Categorizing type of conflict

The first larger task is to categorize by the type of conflict to check for since our method will likely be different (at least for the rule based). We wrote up a short list [here](https://docs.google.com/document/d/1fEBk0JHeyQWshYWW5w_VTkaYyRfm9MBxJ9DAGoVa8Yw/edit?usp=sharing). 

To do this, we're using the semantic type that is identified by the UMLS linker. Here's a table of the semantic types we're filtering for, and which conflict they'll be used for.

Here's a [full list](https://metamap.nlm.nih.gov/Docs/SemanticTypes_2018AB.txt) of semantic types. You can look up definitions of semantic types [here](http://linkedlifedata.com/resource/umls-semnetwork/T033).

| Conflict | Semantic Type |
| --- | ----------- |
| Diagnoses-related errors | Disease or Syndrome (T047), Diagnostic Procedure(T060) |
| Inaccurate description of medical history (symptoms) | Sign or Symptom (T184) |
| Inaccurate description of medical history (operations) | Therapeutic or Preventive Procedure (T061) |
| Inaccurate description of medical history (other) | [all of the above and below] |
| Medication or allergies | Clinical Drug (T200), Pharmacologic Substance (T121) |
| Test procedures or results | Laboratory Procedure (T059), Laboratory or Test Result (T034) | 


For clarity, the concepts we'll keep from the UMLS linker are anything falling into these semantic types (which we will then categorize by type of conflict using the table above):

* T047 - Disease or Syndrome
* T121 - Pharmacologic Substance
* T023 - Body Part, Organ, or Organ Component
* T061 - Therapeutic or Preventive Procedure 
* T060 - Diagnostic Procedure
* T059 - Laboratory Procedure
* T034 - Laboratory or Test Result 
* T184 - Sign or Symptom 
* T200 - Clinical Drug

We'll store this info into a dictionary now.

<!-- Some useful def's 
Finding - 
That which is discovered by direct observation or measurement of an organism attribute or condition, including the clinical history of the patient. The history of the presence of a disease is a 'Finding' and is distinguished from the disease itself.  -->

In [11]:
SEMANTIC_TYPES = ['T047', 'T121', 'T023', 'T061', 'T060', 'T059', 'T034', 'T184', 'T200']
SEMANTIC_NAMES = ['Disease or Syndrome', 'Pharmacologic Substance', 'Body Part, Organ, or Organ Component', \
                  'Therapeutic or Preventive Procedure', 'Diagnostic Procedure', 'Laboratory Procedure', \
                  'Laboratory or Test Result', 'Sign or Symptom', 'Clinical Drug']
SEMANTIC_TYPE_TO_NAME = dict(zip(SEMANTIC_TYPES, SEMANTIC_NAMES))

SEMANTIC_TYPE_TO_NAME

{'T047': 'Disease or Syndrome',
 'T121': 'Pharmacologic Substance',
 'T023': 'Body Part, Organ, or Organ Component',
 'T061': 'Therapeutic or Preventive Procedure',
 'T060': 'Diagnostic Procedure',
 'T059': 'Laboratory Procedure',
 'T034': 'Laboratory or Test Result',
 'T184': 'Sign or Symptom',
 'T200': 'Clinical Drug'}

In [13]:
CONFLICT_TO_SEMANTIC_TYPE = {
    "diagnosis": {'T047', 'T060'},
    "med_history_symptom": {'T184'},
    "med_history_operation": {'T061'},
    "med_history_other": set(SEMANTIC_TYPES),
    "med_allergy": {'T200', 'T121'},
    "test_results": {'T059', 'T034'}
}

CONFLICT_TO_SEMANTIC_TYPE

{'diagnosis': {'T047', 'T060'},
 'med_history_symptom': {'T184'},
 'med_history_operation': {'T061'},
 'med_history_other': {'T023',
  'T034',
  'T047',
  'T059',
  'T060',
  'T061',
  'T121',
  'T184',
  'T200'},
 'med_allergy': {'T121', 'T200'},
 'test_results': {'T034', 'T059'}}

In [14]:
from data_structures import Patient,\
                            Note, PrescriptionOrders, LabResults,\
                            Sentence, Prescription, Lab

In [15]:
# from importlib import reload # python 2.7 does not require this
# import data_structures
# reload(data_structures)
# from data_structures import Patient,\
#                             Note, PrescriptionOrders, LabResults,\
#                             Sentence, Prescription, Lab

# 3. Load and process data

In [16]:
# Load MIMIC tables
notes_df  = pd.read_csv('NOTEEVENTS.csv.gz',    compression='gzip', error_bad_lines=False)
drug_df   = pd.read_csv('PRESCRIPTIONS.csv.gz', compression='gzip', error_bad_lines=False)
lab_df    = pd.read_csv('LABEVENTS.csv.gz',     compression='gzip', error_bad_lines=False)
d_lab_df  = pd.read_csv('D_LABITEMS.csv.gz',    compression='gzip', error_bad_lines=False)

  interactivity=interactivity, compiler=compiler, result=result)
  interactivity=interactivity, compiler=compiler, result=result)


#### Updated script for processing HADM ID's with consecutive physician notes (does not count the autosaves)

In [17]:
# Load HADM ID's with consecutive physician notes
if os.path.exists("hadm_ids.pkl"):
    with open("hadm_ids.pkl", "rb") as f:
        hadm_ids = pickle.load(f)
else:
    hadm_ids = []
    for hadm_id in tqdm(notes_df.HADM_ID.unique()):
        hadm_data = notes_df.loc[notes_df.HADM_ID == hadm_id]
        hadm_phys_notes = hadm_data.loc[hadm_data.CATEGORY == "Physician "]

        if len(hadm_phys_notes.CHARTTIME.unique()) > 1: # ensure > 1 unique notes (not counting autosave)
            hadm_ids.append(hadm_id)

    with open("hadm_ids.pkl", "wb") as f:
        pickle.dump(hadm_ids, f)
        
print(f"There are {len(hadm_ids)} patients with consecutive physician notes.")

There are 8158 patients with consecutive physician notes.


# 4. Generating Contradictions

Generate 25-50 examples of positive and negative contradictions, each.

For lab values: 

* Find 50-100 total data pairs (about 2-4 per patient) and insert contradiction, or label as not a contradiction

In [18]:
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [19]:
pd.set_option("display.max_colwidth", -1) # prints full text

  """Entry point for launching an IPython kernel.


In [20]:
from importlib import reload # python 2.7 does not require this
import data_structures
reload(data_structures)
from data_structures import Patient,\
                            Note, PrescriptionOrders, LabResults,\
                            Sentence, Prescription, Lab

In [21]:
def is_comparable_type(data_i, data_j):
    """ We only want to compare note-to-note OR note-to-structured data. 
    
    Comparable types:
    - sentence v. sentence
    - sentence v. prescription
    - sentence v. lab
    
    Uncomparable types:
    - lab v. lab 
    - lab v. prescription
    - prescription v. prescription
    """
    return (data_i.type == "sentence"     and data_j.type == "sentence") or \
           (data_i.type == "sentence"     and data_j.type == "prescription") or \
           (data_i.type == "prescription" and data_j.type == "sentence") or \
           (data_i.type == "sentence"     and data_j.type == "lab") or \
           (data_i.type == "lab"          and data_j.type == "sentence")

In [22]:
def generate_data_pairs(pat):
    processed_pairs = []  # for dataframe + csv
    data_inst_pairs = []  # for pipeline, list of tuples: ((Data 1, Data 2), label)
    pair_idx = 0

    # Iterate over all of the patient's DailyData instances (e.g. note, prescription order, lab results for same day)
    ## pat.dailydata = {[date]: [DailyData instance from that date], ...}
    for day, pat_dailydatas in pat.dailydata.items(): # pat_dailydatas is list of all DailyData instances for `day`
        print(f"********** Processing data for {day} **********")
        # Collect all the daily datas (note, prescription orders, lab results) for current day
        current_dds = []
        current_dds_features = []
        current_dds_txts = []
        current_dds_sem_types = []
        current_dds_sem_names = []
        for dd in pat_dailydatas: # iterating over DailyData instances, e.g. dd=physician note taken on `day`
            current_dds.extend(dd.datas)
            current_dds_features.extend(dd.datas_features)
            current_dds_txts.extend(dd.datas_txts)
            current_dds_sem_types.extend(dd.datas_semantic_types)
            current_dds_sem_names.extend(dd.datas_semantic_names)

        current_dds           = np.array(current_dds)
        current_dds_features  = np.array(current_dds_features)
        current_dds_txts      = np.array(current_dds_txts)
        current_dds_sem_types = np.array(current_dds_sem_types)
        current_dds_sem_names = np.array(current_dds_sem_names)

        # extract similar sentences for each semantic type
        for sem_type in SEMANTIC_TYPES:
            # data for this semantic type
            sem_type_bools   = [sem_type in x for x in current_dds_sem_types]
            sem_type_indices = np.where(sem_type_bools)[0]
            indices_map = dict(
                            zip(range(len(sem_type_indices)), 
                                sem_type_indices)
                          )  # maps regular indices in sem_type_current_dds_* lists to indices in current_dds_* lists

            sem_type_current_dds           = current_dds[sem_type_indices]
            sem_type_current_dds_features  = current_dds_features[sem_type_indices]
            sem_type_current_dds_txts      = current_dds_txts[sem_type_indices]
            sem_type_current_dds_sem_types = current_dds_sem_types[sem_type_indices]
            sem_type_current_dds_sem_names = current_dds_sem_names[sem_type_indices]

            # current_dds_featuresfor features (umls + rxnorm concepts)
            vectorizer = CountVectorizer()
            corpus = list(map(lambda x: ' '.join(x), sem_type_current_dds_features))
            if len(corpus) == 0: # skip rest if no candidate sentences exist
                continue
            X = vectorizer.fit_transform(corpus)
            X = X.toarray()

            # get cosine similarity using umls + rxnorm concepts
            similarity = cosine_similarity(X)     # larger=more similar
            sim_is, sim_js = np.where(similarity>0.5) # all pairs with at least 0.5 similarity

            for i, j in zip(sim_is, sim_js):
                data_i = sem_type_current_dds[i]
                data_j = sem_type_current_dds[j]
                # removing same sentence pairs, checking dates
                if i>j and is_comparable_type(data_i, data_j):
                    print(f"***** PAIR INDEX {pair_idx} *****")
                    print(f"Cosine similarity: {similarity[i, j]}")
                    print(f"----- Data i -----")
                    print(f">> Time: {data_i.time}\n" +\
                          f">> Type: {data_i.type}\n" +\
                          f">> Concepts: {data_i.features}\n" +\
                          f">> {data_i.txt}")
                    print(f"----- Data j -----")
                    print(f">> Time: {data_j.time}\n" +\
                          f">> Type: {data_j.type}\n" +\
                          f">> Concepts: {data_j.features}\n" +\
                          f">> {data_j.txt}")
                    print("**********************************")

                    # save
                    processed_pairs.append([data_i.txt,      data_j.txt, \
                                            data_i.time,     data_j.time, \
                                            data_i.type,     data_j.type, \
                                            data_i.features, data_j.features, \
                                            similarity[i, j], SEMANTIC_TYPE_TO_NAME[sem_type]])
            #                                 SEMANTIC_TYPE_TO_NAME[semantic_type]])

                    data_inst_pairs.append(((data_i, data_j), None))
                    pair_idx += 1

    ###############
    #### Final ####
    ###############        
    df = \
    pd.DataFrame(np.array(processed_pairs), \
                 columns=["sentence 1", "sentence 2", \
                          "time 1", "time 2", \
                          "type 1", "type 2", \
                          "concepts 1", "concepts 2", \
                          "cosine similarity", "semantic type"])
    
    return df, data_inst_pairs

## README: Store generated data here

In [23]:
generated_data_dict = {}

## Patient 1

In [25]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[0] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 155131


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boole

********** Processing data for 2131-12-23 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.848528137423857
----- Data i -----
>> Time: 2131-12-23 23:51:00
>> Type: sentence
>> Concepts: {'Structure of left lower lobe of lung', 'Pneumonia'}
>> CXR with LLL infiltrate, which may be    persitent radiograph manifestation of her previous pneumonia (may take    6-8 weeks to resolve).'
----- Data j -----
>> Time: 2131-12-23 23:51:00
>> Type: sentence
>> Concepts: {'Chronic obstructive pulmonary disease of horses', 'Structure of left lower lobe of lung', 'Pneumonia'}
>> HPI:     [**Age over 90 382**]  year old woman hx of COPD, recent admit in early  [**Month (only) 102**]  with LLL    pneumonia.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.7442084075352509
----- Data i -----
>> Time: 2131-12-23 23:51:00
>> Type: sentence
>> Concepts: {'Lung hyperinflation', 'Structure of left lower lobe of lung'}
>> Compared to previous CXR, LLL infiltrate not    s

In [38]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about prescriptions values
prescription_pairs_df = df.loc[(df['type 1'] == "prescription") | (df['type 2'] == "prescription")]

prescription_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
31,Patient was prescribed Ipratropium Bromide MDI 12.9g Inhaler IH of total 2-8 PUFF,Ipratropium Bromide Inhalation Q6H',2131-12-23,2131-12-23 22:56:00,prescription,sentence,"{Ipratropium Bromide, ipratropium bromide}","{Ipratropium Bromide, ipratropium bromide}",1.0,Pharmacologic Substance
32,Patient was prescribed Levothyroxine Sodium 25mcg Tablet PO of total 25 mcg,Levothyroxine 125 mcg PO DAILY',2131-12-23,2131-12-23 22:56:00,prescription,sentence,"{Levothyroxine Sodium, Levothyroxine Sodium 0.025 MG Oral Capsule}",{levothyroxine},0.57735,Pharmacologic Substance


In [41]:
prescription_pairs_df

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
31,Patient was prescribed Ipratropium Bromide MDI 12.9g Inhaler IH of total 2-8 PUFF,Ipratropium Bromide Inhalation Q6H',2131-12-23,2131-12-23 22:56:00,prescription,sentence,"{Ipratropium Bromide, ipratropium bromide}","{Ipratropium Bromide, ipratropium bromide}",1.0,Pharmacologic Substance
32,Patient was prescribed Levothyroxine Sodium 25mcg Tablet PO of total 25 mcg,Levothyroxine 125 mcg PO DAILY',2131-12-23,2131-12-23 22:56:00,prescription,sentence,"{Levothyroxine Sodium, Levothyroxine Sodium 0.025 MG Oral Capsule}",{levothyroxine},0.57735,Pharmacologic Substance
33,Patient was prescribed Nephrocaps 1 Capsule PO of total 1 CAP,Nephrocaps 1 mg PO DAILY',2131-12-23,2131-12-23 22:56:00,prescription,sentence,{Nephrocaps},{Nephrocaps},1.0,Pharmacologic Substance
140,Patient was prescribed Heparin Flush (10 units/ml) 10 Units/mL - 5 mL Syringe IV of total 2 mL,DC heparin today.',2131-12-25,2131-12-25 09:37:00,prescription,sentence,{heparin},{heparin},1.0,Pharmacologic Substance
141,Patient was prescribed Levothyroxine Sodium 25mcg Tablet PO of total 25 mcg,- Continue levothyroxine ICU Care',2131-12-25,2131-12-25 07:56:00,prescription,sentence,"{Levothyroxine Sodium, Levothyroxine Sodium 0.025 MG Oral Capsule}",{levothyroxine},0.57735,Pharmacologic Substance
142,"Patient was prescribed Heparin Sodium 25,000 unit Premix Bag IV of total 25,000 UNIT",DC heparin today.',2131-12-25,2131-12-25 09:37:00,prescription,sentence,{heparin sodium},{heparin},0.707107,Pharmacologic Substance
143,"Patient was prescribed Heparin Sodium 25,000 unit Premix Bag IV of total 25,000 UNIT",DC heparin today.',2131-12-25,2131-12-25 09:37:00,prescription,sentence,{heparin sodium},{heparin},0.707107,Pharmacologic Substance
183,Patient was prescribed Levothyroxine Sodium 25mcg Tablet PO of total 25 mcg,- Continue levothyroxine ICU Care',2131-12-26,2131-12-26 07:42:00,prescription,sentence,"{Levothyroxine Sodium, Levothyroxine Sodium 0.025 MG Oral Capsule}",{levothyroxine},0.57735,Pharmacologic Substance


In [209]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [40]:
pair_idx = 140
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "DC heparin stopped."
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

prescription 1:	Patient was prescribed Heparin Flush (10 units/ml) 10 Units/mL - 5 mL Syringe IV of total 2 mL
sentence 2:	DC heparin today.'

New contradicting sentence: DC heparin stopped.


In [42]:
no_contradiction_pair_idx = [32, 33]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")
    
# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
prescription 1:	Patient was prescribed Levothyroxine Sodium 25mcg Tablet PO of total 25 mcg
sentence 2:	Levothyroxine 125 mcg PO DAILY'
*****************************
prescription 1:	Patient was prescribed Nephrocaps 1 Capsule PO of total 1 CAP
sentence 2:	Nephrocaps 1 mg PO DAILY'
*****************************


## Patient 2

In [43]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[1] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 129414


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

********** Processing data for 2174-02-12 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.7252406676228422
----- Data i -----
>> Time: 2174-02-12 19:14:00
>> Type: sentence
>> Concepts: {'Sleep Apnea, Obstructive', 'Obesity', 'Glucose Intolerance (disease)', 'Continuous Positive Airway Pressure'}
>> Obesity:    Glucose intolerance:    Obstructive sleep apnea: declined CPAP therapy.'
----- Data j -----
>> Time: 2174-02-12 19:14:00
>> Type: sentence
>> Concepts: {'Sleep Apnea, Obstructive', 'Positive pressure therapy', 'Continuous Positive Airway Pressure'}
>> Obstructive sleep apnea:  Declined CPAP therapy in the past but    will likely benefit from positive pressure as he is desatting when he    falls asleep.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.75
----- Data i -----
>> Time: 2174-02-12 19:14:00
>> Type: sentence
>> Concepts: {'Vitamin B 12 Deficiency', 'Asthma'}
>> Vitamin B12 deficiency:  not on b12 currently    Twin sister with a

In [44]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about prescriptions values
prescription_pairs_df = df.loc[(df['type 1'] == "prescription") | (df['type 2'] == "prescription")]

prescription_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
9,Patient was prescribed Albuterol 0.083% Neb Soln 0.083%;3mL Vial IH of total 1 NEB,He tried increasing his albuterol use but this did not help so he came to the emergency room.',2174-02-12,2174-02-12 19:14:00,prescription,sentence,"{Albuterol, albuterol}","{Albuterol, albuterol}",1.0,Pharmacologic Substance
10,Patient was prescribed Albuterol 0.083% Neb Soln 0.083%;3mL Vial IH of total 1 NEB,He tried increasing his albuterol use but this did not help so he came to the emergency room.',2174-02-12,2174-02-12 19:14:00,prescription,sentence,"{Albuterol, albuterol}","{Albuterol, albuterol}",1.0,Pharmacologic Substance


In [45]:
prescription_pairs_df

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
9,Patient was prescribed Albuterol 0.083% Neb Soln 0.083%;3mL Vial IH of total 1 NEB,He tried increasing his albuterol use but this did not help so he came to the emergency room.',2174-02-12,2174-02-12 19:14:00,prescription,sentence,"{Albuterol, albuterol}","{Albuterol, albuterol}",1.0,Pharmacologic Substance
10,Patient was prescribed Albuterol 0.083% Neb Soln 0.083%;3mL Vial IH of total 1 NEB,He tried increasing his albuterol use but this did not help so he came to the emergency room.',2174-02-12,2174-02-12 19:14:00,prescription,sentence,"{Albuterol, albuterol}","{Albuterol, albuterol}",1.0,Pharmacologic Substance
25,Patient was prescribed Hydrochlorothiazide 25mg Tablet PO/NG of total 25 mg,#Hyponatremia: [**Month (only) 51**] be [**1-18**] intrapulmonary process (SIADH) or to HCTZ (salt wasting with diuretics).',2174-02-12,2174-02-12 19:14:00,prescription,sentence,"{Hydrochlorothiazide 25 MG, Hydrochlorothiazide 25 MG Oral Tablet}","{Hydrochlorothiazide 50 MG Oral Tablet, Hyponatremia, Diuretics, Inappropriate ADH Syndrome, hydrochlorothiazide}",0.592999,Clinical Drug
26,Patient was prescribed Hydrochlorothiazide 25mg Tablet PO/NG of total 25 mg,"-urine lytes and osms, serum osms -recheck sodium in the AM -continue HCTZ for now .'",2174-02-12,2174-02-12 19:14:00,prescription,sentence,"{Hydrochlorothiazide 25 MG, Hydrochlorothiazide 25 MG Oral Tablet}","{Hydrochlorothiazide 50 MG Oral Tablet, Americium, hydrochlorothiazide}",0.712697,Clinical Drug


In [215]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [46]:
pair_idx = 10
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "He stopped his albuterol use but this did not help so he came to the emergency room."
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

prescription 1:	Patient was prescribed Albuterol 0.083% Neb Soln 0.083%;3mL Vial IH of total 1 NEB
sentence 2:	He tried increasing his albuterol use but this    did not help so he came to the emergency room.'

New contradicting sentence: He stopped his albuterol use but this did not help so he came to the emergency room.


In [47]:
no_contradiction_pair_idx = [25, 26]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")

# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
prescription 1:	Patient was prescribed Hydrochlorothiazide 25mg Tablet PO/NG of total 25 mg
sentence 2:	#Hyponatremia:  [**Month (only) 51**]  be  [**1-18**]  intrapulmonary process (SIADH) or to HCTZ    (salt wasting with diuretics).'
*****************************
prescription 1:	Patient was prescribed Hydrochlorothiazide 25mg Tablet PO/NG of total 25 mg
sentence 2:	-urine lytes and osms, serum osms    -recheck sodium in the AM    -continue HCTZ for now    .'
*****************************


In [218]:
"""
Todo: ask Dr. Saenz
"""
potential_contradiction_pair_indices = [21]

print("Potential examples of contradictions")
print("*****************************")
for pair_idx in potential_contradiction_pair_indices:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")

Potential examples of contradictions
*****************************
lab 1:	Patient's White Blood Cells lab came back 6.0 K/uL.
sentence 2:	Labs notable for WBC of 5, HCT 34.5,    sodium of 131 and creatinine of 1.0.'
*****************************


## Patient 3

In [48]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[2] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 133623


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)
A value is trying to be set on a copy of a slice from a

********** Processing data for 2145-11-30 **********
********** Processing data for 2145-12-01 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.6030226891555273
----- Data i -----
>> Time: 2145-12-01 03:47:00
>> Type: sentence
>> Concepts: {'Atrial Fibrillation', 'ethanol', 'Chest Pain', 'Hepatitis C', 'Infantile Neuroaxonal Dystrophy', 'Ting AF'}
>>    ASSESSMENT AND PLAN: 54M with hx of ETOH abuse, HCV, presenting with AF    with RVR, Chest Pain in the setting of ETOH intoxication.'
----- Data j -----
>> Time: 2145-12-01 03:47:00
>> Type: sentence
>> Concepts: {'ethanol', 'Chest Pain', 'Hepatitis C'}
>> Chest Pain, ETOH Intoxication    HPI:    54M with hx of ETOH abuse, HCV, presented to the ED this evening    intoxicated.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.6761234037828131
----- Data i -----
>> Time: 2145-12-01 03:47:00
>> Type: sentence
>> Concepts: {'Ting AF', 'Atrial Fibrillation', 'Dilt'}
>> #: AF with RVR: HR at 110s upon a

In [49]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about prescriptions values
prescription_pairs_df = df.loc[(df['type 1'] == "prescription") | (df['type 2'] == "prescription")]

prescription_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
37,Patient was prescribed Metoprolol Tartrate 25mg Tablet PO/NG of total 25 mg,Recommended Metoprolol.',2145-12-01,2145-12-01 03:47:00,prescription,sentence,"{metoprolol, Metoprolol}","{metoprolol, Metoprolol}",1.0,Pharmacologic Substance


In [50]:
prescription_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
37,Patient was prescribed Metoprolol Tartrate 25mg Tablet PO/NG of total 25 mg,Recommended Metoprolol.',2145-12-01,2145-12-01 03:47:00,prescription,sentence,"{metoprolol, Metoprolol}","{metoprolol, Metoprolol}",1.0,Pharmacologic Substance


In [222]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [51]:
pair_idx = 37
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "Metoprolol not recommended."
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

prescription 1:	Patient was prescribed Metoprolol Tartrate 25mg Tablet PO/NG of total 25 mg
sentence 2:	Recommended    Metoprolol.'

New contradicting sentence: Metoprolol not recommended.


In [52]:
# no_contradiction_pair_idx = [48, 76]

# print("Examples of non-contradictions")
# print("*****************************")
# for pair_idx in no_contradiction_pair_idx:
#     data_1 = data_inst_pairs[pair_idx][0][0]
#     data_2 = data_inst_pairs[pair_idx][0][1]
    
#     print(f"{data_1.type} 1:\t{data_1.txt}")
#     print(f"{data_2.type} 2:\t{data_2.txt}")
#     print("*****************************")
    
# # Store negative examples
# generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

## Patient 4

In [53]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[3] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 197325


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/

********** Processing data for 2157-02-01 **********
********** Processing data for 2157-02-02 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.5477225575051662
----- Data i -----
>> Time: 2157-02-02 01:37:00
>> Type: sentence
>> Concepts: {'Acute cholangitis', 'Mapap', 'Deuteranomaly', 'Flagyl', 'Infantile Neuroaxonal Dystrophy', 'BCX 34'}
>> Agree with plan to manage acute cholangitis with obstructing CBD stone    with broad abx coverage with vanco / zosyn / flagyl while awaiting BCx    and continuing hydration based on MAP / UOP.'
----- Data j -----
>> Time: 2157-02-02 01:37:00
>> Type: sentence
>> Concepts: {'Infantile Neuroaxonal Dystrophy'}
>> Remainder of plan as outlined    above.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.9999999999999998
----- Data i -----
>> Time: 2157-02-02 01:37:00
>> Type: sentence
>> Concepts: {'Refractory anemias'}
>> In the ER her VS were T 101.5, HR 68, BP 152/76, RR 16, SpO2 95% RA.'
----- Data j -----
>

In [54]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about prescriptions values
prescription_pairs_df = df.loc[(df['type 1'] == "prescription") | (df['type 2'] == "prescription")]

prescription_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
48,Patient was prescribed Piperacillin-Tazobactam 4.5 g Frozen Bag IV of total 4.5 g,"['Chief Complaint:', ""24 Hour Events: - ERCP performed - Blood cultures returned GNR's Allergies: No Known Drug Allergies Last dose of Antibiotics: Vancomycin - [**2157-2-2**] 12:50 AM Metronidazole - [**2157-2-2**] 02:00 AM Piperacillin/Tazobactam (Zosyn) - [**2157-2-2**] 03:38 AM Infusions: Other ICU medications: Other medications: Changes to medical and family history: Review of systems is unchanged from admission except as noted below Review of systems: Flowsheet Data as of [**2157-2-2**] 07:50 AM Vital signs Hemodynamic monitoring Fluid balance 24 hours Since [**58**] AM""",2157-02-02,2157-02-02 07:51:00,prescription,sentence,"{piperacillin-tazobactam combination, Piperacillin / tazobactam}","{metronidazole, Metronidazole, Blood culture, piperacillin-tazobactam combination, Americium, Pharmaceutical Preparations, Piperacillin / tazobactam, Infusion procedures}",0.67082,Pharmacologic Substance
49,Patient was prescribed Acetaminophen 325mg Tablet PO/NG of total 325-650 mg,Acetaminophen level was negative.',2157-02-02,2157-02-02 01:37:00,prescription,sentence,"{acetaminophen, Acetaminophen}","{acetaminophen, Acetaminophen}",1.0,Pharmacologic Substance


In [55]:
prescription_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
48,Patient was prescribed Piperacillin-Tazobactam 4.5 g Frozen Bag IV of total 4.5 g,"['Chief Complaint:', ""24 Hour Events: - ERCP performed - Blood cultures returned GNR's Allergies: No Known Drug Allergies Last dose of Antibiotics: Vancomycin - [**2157-2-2**] 12:50 AM Metronidazole - [**2157-2-2**] 02:00 AM Piperacillin/Tazobactam (Zosyn) - [**2157-2-2**] 03:38 AM Infusions: Other ICU medications: Other medications: Changes to medical and family history: Review of systems is unchanged from admission except as noted below Review of systems: Flowsheet Data as of [**2157-2-2**] 07:50 AM Vital signs Hemodynamic monitoring Fluid balance 24 hours Since [**58**] AM""",2157-02-02,2157-02-02 07:51:00,prescription,sentence,"{piperacillin-tazobactam combination, Piperacillin / tazobactam}","{metronidazole, Metronidazole, Blood culture, piperacillin-tazobactam combination, Americium, Pharmaceutical Preparations, Piperacillin / tazobactam, Infusion procedures}",0.67082,Pharmacologic Substance
49,Patient was prescribed Acetaminophen 325mg Tablet PO/NG of total 325-650 mg,Acetaminophen level was negative.',2157-02-02,2157-02-02 01:37:00,prescription,sentence,"{acetaminophen, Acetaminophen}","{acetaminophen, Acetaminophen}",1.0,Pharmacologic Substance
50,Patient was prescribed Atenolol 50 mg Tab PO/NG of total 75 mg,"- Hold Atenolol, Lisinopril and nifedipine until able to assess BP trend - [**Month (only) 8**] restart in AM .'",2157-02-02,2157-02-02 01:37:00,prescription,sentence,"{atenolol, Atenolol}","{NIFEdipine, nifedipine, Lisinopril, Americium, Atenolol, atenolol, lisinopril}",0.5547,Pharmacologic Substance
51,Patient was prescribed Atenolol 50 mg Tab PO/NG of total 75 mg,"- Holding Atenolol, Lisinopril and Nifedipine for now given tenuous UOP - Will restart when UOP improves #.'",2157-02-02,2157-02-02 07:51:00,prescription,sentence,"{atenolol, Atenolol}","{NIFEdipine, nifedipine, Holding patient, Lisinopril, Atenolol, atenolol, lisinopril}",0.534522,Pharmacologic Substance
52,Patient was prescribed Piperacillin-Tazobactam 4.5 g Vial IV of total 4.5 gm,"['Chief Complaint:', ""24 Hour Events: - ERCP performed - Blood cultures returned GNR's Allergies: No Known Drug Allergies Last dose of Antibiotics: Vancomycin - [**2157-2-2**] 12:50 AM Metronidazole - [**2157-2-2**] 02:00 AM Piperacillin/Tazobactam (Zosyn) - [**2157-2-2**] 03:38 AM Infusions: Other ICU medications: Other medications: Changes to medical and family history: Review of systems is unchanged from admission except as noted below Review of systems: Flowsheet Data as of [**2157-2-2**] 07:50 AM Vital signs Hemodynamic monitoring Fluid balance 24 hours Since [**58**] AM""",2157-02-02,2157-02-02 07:51:00,prescription,sentence,"{Intravenous infusion procedures, piperacillin-tazobactam combination, Piperacillin / tazobactam}","{metronidazole, Metronidazole, Blood culture, piperacillin-tazobactam combination, Americium, Pharmaceutical Preparations, Piperacillin / tazobactam, Infusion procedures}",0.710047,Pharmacologic Substance
53,Patient was prescribed Piperacillin-Tazobactam 4.5 g Vial IV of total 4.5 gm,"['Chief Complaint:', ""24 Hour Events: - ERCP performed - Blood cultures returned GNR's Allergies: No Known Drug Allergies Last dose of Antibiotics: Vancomycin - [**2157-2-2**] 12:50 AM Metronidazole - [**2157-2-2**] 02:00 AM Piperacillin/Tazobactam (Zosyn) - [**2157-2-2**] 03:38 AM Infusions: Other ICU medications: Other medications: Changes to medical and family history: Review of systems is unchanged from admission except as noted below Review of systems: Flowsheet Data as of [**2157-2-2**] 07:50 AM Vital signs Hemodynamic monitoring Fluid balance 24 hours Since [**58**] AM""",2157-02-02,2157-02-02 07:51:00,prescription,sentence,"{Intravenous infusion procedures, piperacillin-tazobactam combination, Piperacillin / tazobactam}","{metronidazole, Metronidazole, Blood culture, piperacillin-tazobactam combination, Americium, Pharmaceutical Preparations, Piperacillin / tazobactam, Infusion procedures}",0.710047,Pharmacologic Substance
75,Patient was prescribed Piperacillin-Tazobactam 4.5 g Vial IV of total 4.5 gm,"['Chief Complaint:', ""24 Hour Events: - ERCP performed - Blood cultures returned GNR's Allergies: No Known Drug Allergies Last dose of Antibiotics: Vancomycin - [**2157-2-2**] 12:50 AM Metronidazole - [**2157-2-2**] 02:00 AM Piperacillin/Tazobactam (Zosyn) - [**2157-2-2**] 03:38 AM Infusions: Other ICU medications: Other medications: Changes to medical and family history: Review of systems is unchanged from admission except as noted below Review of systems: Flowsheet Data as of [**2157-2-2**] 07:50 AM Vital signs Hemodynamic monitoring Fluid balance 24 hours Since [**58**] AM""",2157-02-02,2157-02-02 07:51:00,prescription,sentence,"{Intravenous infusion procedures, piperacillin-tazobactam combination, Piperacillin / tazobactam}","{metronidazole, Metronidazole, Blood culture, piperacillin-tazobactam combination, Americium, Pharmaceutical Preparations, Piperacillin / tazobactam, Infusion procedures}",0.710047,Therapeutic or Preventive Procedure
76,Patient was prescribed Piperacillin-Tazobactam 4.5 g Vial IV of total 4.5 gm,"['Chief Complaint:', ""24 Hour Events: - ERCP performed - Blood cultures returned GNR's Allergies: No Known Drug Allergies Last dose of Antibiotics: Vancomycin - [**2157-2-2**] 12:50 AM Metronidazole - [**2157-2-2**] 02:00 AM Piperacillin/Tazobactam (Zosyn) - [**2157-2-2**] 03:38 AM Infusions: Other ICU medications: Other medications: Changes to medical and family history: Review of systems is unchanged from admission except as noted below Review of systems: Flowsheet Data as of [**2157-2-2**] 07:50 AM Vital signs Hemodynamic monitoring Fluid balance 24 hours Since [**58**] AM""",2157-02-02,2157-02-02 07:51:00,prescription,sentence,"{Intravenous infusion procedures, piperacillin-tazobactam combination, Piperacillin / tazobactam}","{metronidazole, Metronidazole, Blood culture, piperacillin-tazobactam combination, Americium, Pharmaceutical Preparations, Piperacillin / tazobactam, Infusion procedures}",0.710047,Therapeutic or Preventive Procedure
106,Patient was prescribed Piperacillin-Tazobactam 4.5 g Frozen Bag IV of total 4.5 g,Allergies: No Known Drug Allergies Last dose of Antibiotics: Metronidazole - [**2157-2-2**] 02:00 AM Vancomycin - [**2157-2-2**] 09:00 AM Piperacillin/Tazobactam (Zosyn) - [**2157-2-3**] 03:45 AM Infusions: Other ICU medications: Furosemide (Lasix) - [**2157-2-2**] 07:24 PM Other medications: Changes to medical and family history: Review of systems is unchanged from admission except as noted below Review of systems: Flowsheet Data as of [**2157-2-3**] 07:45 AM Vital signs Hemodynamic monitoring Fluid balance 24 hours Since [**58**] AM',2157-02-03,2157-02-03 07:46:00,prescription,sentence,"{piperacillin-tazobactam combination, Piperacillin / tazobactam}","{Americium, Piperacillin / tazobactam, metronidazole, Metronidazole, Pharmaceutical Preparations, piperacillin-tazobactam combination, Lasix, Infusion procedures}",0.688247,Pharmacologic Substance


In [228]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [56]:
pair_idx = 49
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "Acetaminophen not given."
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

prescription 1:	Patient was prescribed Acetaminophen 325mg Tablet PO/NG of total 325-650 mg
sentence 2:	Acetaminophen level was negative.'

New contradicting sentence: Acetaminophen not given.


In [57]:
no_contradiction_pair_idx = [76]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")
    
# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
prescription 1:	Patient was prescribed Piperacillin-Tazobactam 4.5 g Vial IV of total 4.5 gm
sentence 2:	['Chief Complaint:', "24 Hour Events:    - ERCP performed    - Blood cultures returned GNR's    Allergies:    No Known Drug Allergies    Last dose of Antibiotics:    Vancomycin -  [**2157-2-2**]  12:50 AM    Metronidazole -  [**2157-2-2**]  02:00 AM    Piperacillin/Tazobactam (Zosyn) -  [**2157-2-2**]  03:38 AM    Infusions:    Other ICU medications:    Other medications:    Changes to medical and family history:    Review of systems is unchanged from admission except as noted below    Review of systems:    Flowsheet Data as of   [**2157-2-2**]  07:50 AM    Vital signs    Hemodynamic monitoring    Fluid balance                                                                   24 hours                                                                Since  [**58**]  AM"
*****************************


## Patient 5

In [58]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[4] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 186291


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

********** Processing data for 2151-09-21 **********
********** Processing data for 2151-09-22 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.7385489458759966
----- Data i -----
>> Time: 2151-09-22 05:00:00
>> Type: sentence
>> Concepts: {'Chronic obstructive pulmonary disease of horses'}
>> COPD -- no evidence for significant COPD exacerbation.'
----- Data j -----
>> Time: 2151-09-22 05:00:00
>> Type: sentence
>> Concepts: {'Chronic obstructive pulmonary disease of horses', 'HIV Seropositivity', 'Nausea', 'hiVETIC', 'Vomiting'}
>> HPI:    66 yom HIV+ COPD (severe) in USOH until day of addmission when    developed nausea and emesis (non-bloody).'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.5222329678670936
----- Data i -----
>> Time: 2151-09-22 05:00:00
>> Type: sentence
>> Concepts: {'Chronic obstructive pulmonary disease of horses', 'MICROCEPHALY, EPILEPSY, AND DIABETES SYNDROME', 'Microbicides'}
>> Continue usual    meds, and low thresh

In [59]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about prescriptions values
prescription_pairs_df = df.loc[(df['type 1'] == "prescription") | (df['type 2'] == "prescription")]

prescription_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
101,Patient was prescribed Doxazosin 1mg Tablet PO of total 2 mg,"- restart doxazosin - hydralazine as needed for diastolic >100 - continue to monitor # HIV: last CD4 307, VL 187; per ID fellow overnight recommendations, continue pt on abacavir/lamivudine/atazanavir.'",2151-09-22,2151-09-22 08:03:00,prescription,sentence,"{Doxazosin 1 MG Oral Tablet, Doxazosin}","{doxazosin, Doxazosin, hydralazine, hydrALAZINE}",0.534522,Pharmacologic Substance
102,Patient was prescribed Morphine Sulfate 2mg Syringe IV of total 2-4 mg,Received morphine and dilaudid for pain (back and abd).',2151-09-22,2151-09-22 05:00:00,prescription,sentence,"{Morphine Sulfate, morphine sulfate}","{morphine, Dilaudid, Morphine, Papain}",0.57735,Pharmacologic Substance


In [60]:
prescription_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
101,Patient was prescribed Doxazosin 1mg Tablet PO of total 2 mg,"- restart doxazosin - hydralazine as needed for diastolic >100 - continue to monitor # HIV: last CD4 307, VL 187; per ID fellow overnight recommendations, continue pt on abacavir/lamivudine/atazanavir.'",2151-09-22,2151-09-22 08:03:00,prescription,sentence,"{Doxazosin 1 MG Oral Tablet, Doxazosin}","{doxazosin, Doxazosin, hydralazine, hydrALAZINE}",0.534522,Pharmacologic Substance
102,Patient was prescribed Morphine Sulfate 2mg Syringe IV of total 2-4 mg,Received morphine and dilaudid for pain (back and abd).',2151-09-22,2151-09-22 05:00:00,prescription,sentence,"{Morphine Sulfate, morphine sulfate}","{morphine, Dilaudid, Morphine, Papain}",0.57735,Pharmacologic Substance
103,Patient was prescribed Morphine Sulfate 2mg Syringe IV of total 2-4 mg,"A right femoral line was placed and he received Morphine Sulfate 4mg IV x 1, dilaudid 1mg IV x 3, tylenol, and zofran for nausea.'",2151-09-22,2151-09-22 03:58:00,prescription,sentence,"{Morphine Sulfate, morphine sulfate}","{Dilaudid, morphine sulfate, Tylenol, Nausea, Morphine Sulfate, Zofran}",0.816497,Pharmacologic Substance
104,Patient was prescribed Doxazosin 1mg Tablet PO of total 2 mg,"- restart doxazosin - hydralazine as needed for diastolic >100 - continue to monitor # HIV: last CD4 307, VL 187; per ID fellow overnight recommendations, continue pt on abacavir/lamivudine/atazanavir.'",2151-09-22,2151-09-22 08:03:00,prescription,sentence,"{Doxazosin 1 MG Oral Tablet, Doxazosin}","{doxazosin, Doxazosin, hydralazine, hydrALAZINE}",0.534522,Pharmacologic Substance


In [233]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [61]:
pair_idx = 101
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "- stop doxazosin - hydralazine as needed for diastolic >100 - continue to monitor # HIV: last CD4 307, VL 187; per ID fellow overnight recommendations, continue pt on abacavir/lamivudine/atazanavir."

sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

prescription 1:	Patient was prescribed Doxazosin 1mg Tablet PO of total 2 mg
sentence 2:	- restart doxazosin    - hydralazine as needed for diastolic >100    - continue to monitor    # HIV: last CD4 307, VL 187; per ID fellow overnight recommendations,    continue pt on abacavir/lamivudine/atazanavir.'

New contradicting sentence: - stop doxazosin - hydralazine as needed for diastolic >100 - continue to monitor # HIV: last CD4 307, VL 187; per ID fellow overnight recommendations, continue pt on abacavir/lamivudine/atazanavir.


In [62]:
no_contradiction_pair_idx = [102, 103]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")
    
# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
prescription 1:	Patient was prescribed Morphine Sulfate 2mg Syringe IV of total 2-4 mg
sentence 2:	Received morphine and    dilaudid for pain (back and abd).'
*****************************
prescription 1:	Patient was prescribed Morphine Sulfate 2mg Syringe IV of total 2-4 mg
sentence 2:	A right femoral line was placed and he received    Morphine Sulfate 4mg IV x 1, dilaudid 1mg IV x 3, tylenol, and zofran    for nausea.'
*****************************


## Patient 6

In [63]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[5] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 180836


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of

********** Processing data for 2152-02-15 **********
********** Processing data for 2152-02-16 **********
********** Processing data for 2152-02-17 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.5669467095138409
----- Data i -----
>> Time: 2152-02-17 10:25:00
>> Type: sentence
>> Concepts: {'Chronic obstructive pulmonary disease of horses', 'Respiratory distress', 'Blood culture', 'Rhinorrhea', 'Infantile Neuroaxonal Dystrophy'}
>>    Microbiology: Blood cultures x 2 from  [**2152-2-15**]  - no growth to date    Assessment and Plan    Assessment/Plan:  Mr.  [**Known lastname **]  is a 67 year old male with HIV and end    stage COPD who presented  [**2152-2-15**]  with worsening shortness of breath in    the setting of rhinorrhea and productive cough now transferred to the    MICU for worsening respiratory distress.'
----- Data j -----
>> Time: 2152-02-17 10:25:00
>> Type: sentence
>> Concepts: {'Chronic obstructive pulmonary disease of horses', 'Virus Diseases'}
>> The    mos

In [64]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about prescriptions values
prescription_pairs_df = df.loc[(df['type 1'] == "prescription") | (df['type 2'] == "prescription")]

prescription_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
9,Patient was prescribed Albuterol 0.083% Neb Soln 0.083%;3mL Vial IH of total 1 NEB,"While on the floor he was started on azithromycin, solumedrol 125 mg IV TID, albuterol and ipratropium nebulizers.'",2152-02-17,2152-02-17 10:25:00,prescription,sentence,"{Albuterol, albuterol}","{Albuterol, Intravenous infusion procedures, albuterol, Solu-Medrol}",0.666667,Pharmacologic Substance
10,Patient was prescribed Aspirin 81mg Tab PO of total 81 mg,"He received levofloxacin 750 mg IV x 1, duonebs, solumedrol 125 mg IV x 1 and aspirin 81 mg.'",2152-02-17,2152-02-17 10:25:00,prescription,sentence,"{Aspirin, aspirin}","{Solu-Medrol, DuoNeb, Aspirin, aspirin}",0.755929,Pharmacologic Substance


In [65]:
prescription_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
9,Patient was prescribed Albuterol 0.083% Neb Soln 0.083%;3mL Vial IH of total 1 NEB,"While on the floor he was started on azithromycin, solumedrol 125 mg IV TID, albuterol and ipratropium nebulizers.'",2152-02-17,2152-02-17 10:25:00,prescription,sentence,"{Albuterol, albuterol}","{Albuterol, Intravenous infusion procedures, albuterol, Solu-Medrol}",0.666667,Pharmacologic Substance
10,Patient was prescribed Aspirin 81mg Tab PO of total 81 mg,"He received levofloxacin 750 mg IV x 1, duonebs, solumedrol 125 mg IV x 1 and aspirin 81 mg.'",2152-02-17,2152-02-17 10:25:00,prescription,sentence,"{Aspirin, aspirin}","{Solu-Medrol, DuoNeb, Aspirin, aspirin}",0.755929,Pharmacologic Substance
11,Patient was prescribed Heparin 5000 Units / mL- 1mL Vial SC of total 5000 UNIT,Prophylaxis: heparin SQ',2152-02-17,2152-02-17 10:25:00,prescription,sentence,{heparin},{heparin},1.0,Pharmacologic Substance
12,Patient was prescribed Albuterol 0.083% Neb Soln 0.083%;3mL Vial IH of total 1 NEB,"While on the floor he was started on azithromycin, solumedrol 125 mg IV TID, albuterol and ipratropium nebulizers.'",2152-02-17,2152-02-17 10:25:00,prescription,sentence,"{Albuterol, albuterol}","{Albuterol, Intravenous infusion procedures, albuterol, Solu-Medrol}",0.666667,Pharmacologic Substance
31,Patient was prescribed Heparin 5000 Units / mL- 1mL Vial SC of total 5000 UNIT,Prophylaxis: heparin SQ',2152-02-18,2152-02-18 07:24:00,prescription,sentence,{heparin},{heparin},1.0,Pharmacologic Substance
49,Patient was prescribed Heparin 5000 Units / mL- 1mL Vial SC of total 5000 UNIT,Prophylaxis: heparin SQ',2152-02-19,2152-02-19 07:02:00,prescription,sentence,{heparin},{heparin},1.0,Pharmacologic Substance
71,Patient was prescribed Morphine Sulfate 2mg Syringe IV of total 2 mg,Allergies: No Known Drug Allergies Last dose of Antibiotics: Azithromycin - [**2152-2-17**] 11:30 PM Infusions: Other ICU medications: Heparin Sodium (Prophylaxis) - [**2152-2-19**] 10:22 PM Ranitidine (Prophylaxis) - [**2152-2-19**] 10:23 PM Morphine Sulfate - [**2152-2-20**] 06:00 AM Other medications: Changes to medical and family history: Review of systems is unchanged from admission except as noted below Review of systems: Flowsheet Data as of [**2152-2-20**] 07:22 AM Vital signs Hemodynamic monitoring Fluid balance 24 hours Since 12 AM',2152-02-20,2152-02-20 07:22:00,prescription,sentence,"{Morphine Sulfate, morphine sulfate}","{ranitidine, Americium, morphine sulfate, Pharmaceutical Preparations, heparin sodium, Morphine Sulfate, Ranitidine, Infusion procedures}",0.648886,Pharmacologic Substance
72,Patient was prescribed Morphine Sulfate 2mg Syringe IV of total 2 mg,Initially patient appeared in stress but later was doing well after small dose IV morphine.',2152-02-20,2152-02-20 07:22:00,prescription,sentence,"{Morphine Sulfate, morphine sulfate}","{Morphine, morphine}",0.707107,Pharmacologic Substance
73,Patient was prescribed MethylPREDNISolone Sodium Succ 40mg Vial IV of total 60 mg,24 Hour Events: INVASIVE VENTILATION - STOP [**2152-2-19**] 05:00 PM ARTERIAL LINE - STOP [**2152-2-20**] 06:18 AM - Decreased methylprednisolone to 60mg IV BID - Extubated.',2152-02-20,2152-02-20 07:22:00,prescription,sentence,"{Methylprednisolone, Intravenous infusion procedures, methylprednisolone}","{Arteries, Methylprednisolone, Intravenous infusion procedures, methylprednisolone, Tracheal Extubation}",0.83666,Pharmacologic Substance
74,Patient was prescribed Morphine Sulfate 2mg Syringe IV of total 2-4 mg,Allergies: No Known Drug Allergies Last dose of Antibiotics: Azithromycin - [**2152-2-17**] 11:30 PM Infusions: Other ICU medications: Heparin Sodium (Prophylaxis) - [**2152-2-19**] 10:22 PM Ranitidine (Prophylaxis) - [**2152-2-19**] 10:23 PM Morphine Sulfate - [**2152-2-20**] 06:00 AM Other medications: Changes to medical and family history: Review of systems is unchanged from admission except as noted below Review of systems: Flowsheet Data as of [**2152-2-20**] 07:22 AM Vital signs Hemodynamic monitoring Fluid balance 24 hours Since 12 AM',2152-02-20,2152-02-20 07:22:00,prescription,sentence,"{Morphine Sulfate, morphine sulfate}","{ranitidine, Americium, morphine sulfate, Pharmaceutical Preparations, heparin sodium, Morphine Sulfate, Ranitidine, Infusion procedures}",0.648886,Pharmacologic Substance


In [239]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [66]:
pair_idx = 72
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "Initially patient appeared in stress but later was doing well and did not require Morphine."
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

prescription 1:	Patient was prescribed Morphine Sulfate 2mg Syringe IV of total 2 mg
sentence 2:	Initially patient appeared in stress but later was doing    well after small dose IV morphine.'

New contradicting sentence: Initially patient appeared in stress but later was doing well and did not require Morphine.


In [67]:
no_contradiction_pair_idx = [31, 88]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")
    
# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
prescription 1:	Patient was prescribed Heparin 5000 Units / mL- 1mL Vial SC of total 5000 UNIT
sentence 2:	   Prophylaxis: heparin SQ'
*****************************
prescription 1:	Patient was prescribed MethylPREDNISolone Sodium Succ 40mg Vial IV of total 60 mg
sentence 2:	24 Hour Events:  INVASIVE VENTILATION - STOP  [**2152-2-19**]  05:00 PM  ARTERIAL LINE - STOP  [**2152-2-20**]  06:18 AM    - Decreased methylprednisolone to 60mg IV BID    - Extubated.'
*****************************


## Patient 7

In [68]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[6] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 154802


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)
A value is trying to be set on a copy of a slice from a

********** Processing data for 2121-03-31 **********
********** Processing data for 2121-04-01 **********
***** PAIR INDEX 0 *****
Cosine similarity: 1.0
----- Data i -----
>> Time: 2121-04-01 09:29:00
>> Type: sentence
>> Concepts: {'Bone structure of tibia', 'Limb structure', 'Posterior pituitary disease'}
>>    Right Extremities: (Edema: Trace), (Temperature: Warm), (Pulse -    Dorsalis pedis: Present), (Pulse - Posterior tibial: Present)'
----- Data j -----
>> Time: 2121-04-01 09:29:00
>> Type: sentence
>> Concepts: {'Bone structure of tibia', 'Limb structure', 'Posterior pituitary disease'}
>>    Left Extremities: (Edema: Trace), (Temperature: Warm), (Pulse -    Dorsalis pedis: Present), (Pulse - Posterior tibial: Present)'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.6030226891555273
----- Data i -----
>> Time: 2121-04-01 09:29:00
>> Type: sentence
>> Concepts: {'Hydroxymethylglutaryl-CoA Reductase Inhibitors', 'Aspirin', 'Adrenergic beta-Antag

In [69]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about prescriptions values
prescription_pairs_df = df.loc[(df['type 1'] == "prescription") | (df['type 2'] == "prescription")]

prescription_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
2,Patient was prescribed 5% Dextrose 250mL Bag IV DRIP of total 250 mL,Dextrose 50% .',2121-04-01,2121-04-01 09:29:00,prescription,sentence,"{Glucose 50 MG/ML Oral Solution, glucose}","{Glucose 50 MG/ML Oral Solution, glucose}",1.0,Pharmacologic Substance
3,Patient was prescribed Morphine Sulfate 2mg Syringe IV of total 0.5-5.0 mg,Ranitidine 24 Hour Events: INTUBATION - At [**2121-3-31**] 01:45 PM received intubated from or OR RECEIVED - At [**2121-3-31**] 01:45 PM ARTERIAL LINE - START [**2121-3-31**] 01:45 PM PA CATHETER - START [**2121-3-31**] 02:00 PM EKG - At [**2121-3-31**] 02:00 PM FEVER - 101.3 C - [**2121-4-1**] 12:00 AM Post operative day: POD#1 - Cabg Allergies: No Known Drug Allergies Last dose of Antibiotics: Infusions: Other ICU medications: Morphine Sulfate - [**2121-4-1**] 03:16 AM Other medications: Flowsheet Data as of [**2121-4-1**] 09:27 AM Vital signs Hemodynamic monitoring Fluid balance 24 hours Since [**24**] a.m.',2121-04-01,2121-04-01 09:29:00,prescription,sentence,"{Morphine Sulfate, morphine sulfate}","{ranitidine, Americium, Feverall, morphine sulfate, Pharmaceutical Preparations, Arteries, Electrocardiography, Morphine Sulfate, Ranitidine, Intubation, Infusion procedures}",0.617213,Pharmacologic Substance


In [70]:
prescription_pairs_df

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
2,Patient was prescribed 5% Dextrose 250mL Bag IV DRIP of total 250 mL,Dextrose 50% .',2121-04-01,2121-04-01 09:29:00,prescription,sentence,"{Glucose 50 MG/ML Oral Solution, glucose}","{Glucose 50 MG/ML Oral Solution, glucose}",1.0,Pharmacologic Substance
3,Patient was prescribed Morphine Sulfate 2mg Syringe IV of total 0.5-5.0 mg,Ranitidine 24 Hour Events: INTUBATION - At [**2121-3-31**] 01:45 PM received intubated from or OR RECEIVED - At [**2121-3-31**] 01:45 PM ARTERIAL LINE - START [**2121-3-31**] 01:45 PM PA CATHETER - START [**2121-3-31**] 02:00 PM EKG - At [**2121-3-31**] 02:00 PM FEVER - 101.3 C - [**2121-4-1**] 12:00 AM Post operative day: POD#1 - Cabg Allergies: No Known Drug Allergies Last dose of Antibiotics: Infusions: Other ICU medications: Morphine Sulfate - [**2121-4-1**] 03:16 AM Other medications: Flowsheet Data as of [**2121-4-1**] 09:27 AM Vital signs Hemodynamic monitoring Fluid balance 24 hours Since [**24**] a.m.',2121-04-01,2121-04-01 09:29:00,prescription,sentence,"{Morphine Sulfate, morphine sulfate}","{ranitidine, Americium, Feverall, morphine sulfate, Pharmaceutical Preparations, Arteries, Electrocardiography, Morphine Sulfate, Ranitidine, Intubation, Infusion procedures}",0.617213,Pharmacologic Substance
4,Patient was prescribed Morphine Sulfate 10mg Syringe IV of total 0.5-5.0 mg,Ranitidine 24 Hour Events: INTUBATION - At [**2121-3-31**] 01:45 PM received intubated from or OR RECEIVED - At [**2121-3-31**] 01:45 PM ARTERIAL LINE - START [**2121-3-31**] 01:45 PM PA CATHETER - START [**2121-3-31**] 02:00 PM EKG - At [**2121-3-31**] 02:00 PM FEVER - 101.3 C - [**2121-4-1**] 12:00 AM Post operative day: POD#1 - Cabg Allergies: No Known Drug Allergies Last dose of Antibiotics: Infusions: Other ICU medications: Morphine Sulfate - [**2121-4-1**] 03:16 AM Other medications: Flowsheet Data as of [**2121-4-1**] 09:27 AM Vital signs Hemodynamic monitoring Fluid balance 24 hours Since [**24**] a.m.',2121-04-01,2121-04-01 09:29:00,prescription,sentence,"{Morphine Sulfate, morphine sulfate}","{ranitidine, Americium, Feverall, morphine sulfate, Pharmaceutical Preparations, Arteries, Electrocardiography, Morphine Sulfate, Ranitidine, Intubation, Infusion procedures}",0.617213,Pharmacologic Substance
5,Patient was prescribed Morphine Sulfate 4mg Syringe IV of total 0.5-5.0 mg,Ranitidine 24 Hour Events: INTUBATION - At [**2121-3-31**] 01:45 PM received intubated from or OR RECEIVED - At [**2121-3-31**] 01:45 PM ARTERIAL LINE - START [**2121-3-31**] 01:45 PM PA CATHETER - START [**2121-3-31**] 02:00 PM EKG - At [**2121-3-31**] 02:00 PM FEVER - 101.3 C - [**2121-4-1**] 12:00 AM Post operative day: POD#1 - Cabg Allergies: No Known Drug Allergies Last dose of Antibiotics: Infusions: Other ICU medications: Morphine Sulfate - [**2121-4-1**] 03:16 AM Other medications: Flowsheet Data as of [**2121-4-1**] 09:27 AM Vital signs Hemodynamic monitoring Fluid balance 24 hours Since [**24**] a.m.',2121-04-01,2121-04-01 09:29:00,prescription,sentence,"{Morphine Sulfate 4 MG/ML, morphine sulfate}","{ranitidine, Americium, Feverall, morphine sulfate, Pharmaceutical Preparations, Arteries, Electrocardiography, Morphine Sulfate, Ranitidine, Intubation, Infusion procedures}",0.552052,Pharmacologic Substance
6,Patient was prescribed Magnesium Sulfate 2 g / 50 mL Premix Bag IV of total 2 gm,Influenza Virus Vaccine Ketorolac Magnesium Sulfate .',2121-04-01,2121-04-01 09:29:00,prescription,sentence,"{Magnesium Sulfate, magnesium sulfate}","{ketorolac, Magnesium Sulfate, Vaccines, Ketorolac, magnesium sulfate}",0.784465,Pharmacologic Substance
7,Patient was prescribed Insulin Human Regular 100Units/mL; 10mL Vial IV DRIP of total 100 UNIT,Insulin .',2121-04-01,2121-04-01 09:29:00,prescription,sentence,"{brain-derived neurotrophic factor, human, insulin glulisine}",{insulin glulisine},0.534522,Pharmacologic Substance
8,Patient was prescribed 5% Dextrose 250mL Bag IV of total 250 mL,Dextrose 50% .',2121-04-01,2121-04-01 09:29:00,prescription,sentence,"{Glucose 50 MG/ML Oral Solution, glucose}","{Glucose 50 MG/ML Oral Solution, glucose}",1.0,Pharmacologic Substance
9,Patient was prescribed Calcium Gluconate 1g/10mL Vial IV of total 2 gm,Calcium Gluconate6.',2121-04-01,2121-04-01 09:29:00,prescription,sentence,"{Intravenous infusion procedures, Calcium Gluconate, calcium gluconate}","{Calcium Gluconate, calcium gluconate}",0.852803,Pharmacologic Substance
10,Patient was prescribed Metoclopramide 5mg/mL-2mL Vial IV of total 10 mg,Metoclopramide .',2121-04-01,2121-04-01 09:29:00,prescription,sentence,"{Intravenous infusion procedures, Metoclopramide, metoclopramide}","{Metoclopramide, metoclopramide}",0.755929,Pharmacologic Substance
11,Patient was prescribed Oxycodone-Acetaminophen 5mg/325mg Tablet PO of total 1-2 TAB,Oxycodone-Acetaminophen .',2121-04-01,2121-04-01 09:29:00,prescription,sentence,{Acetaminophen / Oxycodone},{Acetaminophen / Oxycodone},1.0,Pharmacologic Substance


In [244]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [71]:
pair_idx = 10
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "Metoclopramide held."
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

prescription 1:	Patient was prescribed Metoclopramide 5mg/mL-2mL Vial IV of total 10 mg
sentence 2:	Metoclopramide .'

New contradicting sentence: Metoclopramide held.


In [72]:
no_contradiction_pair_idx = [13, 14, 39, 44]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")
    
# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
prescription 1:	Patient was prescribed Acetaminophen 650mg Supp PR of total 650 mg
sentence 2:	Oxycodone-Acetaminophen .'
*****************************
prescription 1:	Patient was prescribed Dextrose 50% 50mL Syringe IV of total 12.5 gm
sentence 2:	Dextrose    50% .'
*****************************
prescription 1:	Patient was prescribed Aspirin EC 81mg  EC Tab PO of total 81 mg
sentence 2:	   Cardiovascular: Aspirin, Beta-blocker, Statins, start betablockers,    monitor rhythm'
*****************************
prescription 1:	Patient was prescribed Metoprolol Tartrate 5mg/5mL Vial IV of total 1 mg
sentence 2:	uneventful or out on    neo/prop    Allergies:    No Known Drug Allergies    Last dose of Antibiotics:    Cefazolin -  [**2121-4-1**]  10:32 PM    Infusions:    Other ICU medications:    Metoprolol -  [**2121-4-2**]  05:09 AM    Furosemide (Lasix) -  [**2121-4-2**]  11:30 AM    Other medications:    Flowsheet Data as of   [**

## Patient 8

In [73]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[7] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 133857


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boole

********** Processing data for 2175-03-12 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.6123724356957946
----- Data i -----
>> Time: 2175-03-12 22:31:00
>> Type: sentence
>> Concepts: {'Nimodipine', 'nimodipine', 'Infantile Neuroaxonal Dystrophy', 'Dilantin'}
>>    Neurologic: Goal SBP < 140, Plan for angio tommorow, Nimodipine,    Dilantin, Hob >30 degrees.'
----- Data j -----
>> Time: 2175-03-12 22:31:00
>> Type: sentence
>> Concepts: {'Infantile Neuroaxonal Dystrophy'}
>>    Assessment And Plan: 69 year old male admitted with SAH'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 1.0
----- Data i -----
>> Time: 2175-03-12 00:00:00
>> Type: prescription
>> Concepts: {'niCARdipine', 'nicardipine'}
>> Patient was prescribed NiCARdipine IV 2.5mg/mL;10mL Amp IV DRIP of total 125 mg
----- Data j -----
>> Time: 2175-03-12 22:31:00
>> Type: sentence
>> Concepts: {'niCARdipine', 'nicardipine'}
>> Nicardipine gtt'
**********************************
****

In [74]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about prescriptions values
prescription_pairs_df = df.loc[(df['type 1'] == "prescription") | (df['type 2'] == "prescription")]

prescription_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
1,Patient was prescribed NiCARdipine IV 2.5mg/mL;10mL Amp IV DRIP of total 125 mg,Nicardipine gtt',2175-03-12,2175-03-12 22:31:00,prescription,sentence,"{niCARdipine, nicardipine}","{niCARdipine, nicardipine}",1.0,Pharmacologic Substance
2,Patient was prescribed Nimodipine 30mg Capsule PO of total 60 mg,"Neurologic: Goal SBP < 140, Plan for angio tommorow, Nimodipine, Dilantin, Hob >30 degrees.'",2175-03-12,2175-03-12 22:31:00,prescription,sentence,"{Nimodipine, nimodipine}","{Nimodipine, nimodipine, Infantile Neuroaxonal Dystrophy, Dilantin}",0.707107,Pharmacologic Substance


In [75]:
prescription_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
1,Patient was prescribed NiCARdipine IV 2.5mg/mL;10mL Amp IV DRIP of total 125 mg,Nicardipine gtt',2175-03-12,2175-03-12 22:31:00,prescription,sentence,"{niCARdipine, nicardipine}","{niCARdipine, nicardipine}",1.0,Pharmacologic Substance
2,Patient was prescribed Nimodipine 30mg Capsule PO of total 60 mg,"Neurologic: Goal SBP < 140, Plan for angio tommorow, Nimodipine, Dilantin, Hob >30 degrees.'",2175-03-12,2175-03-12 22:31:00,prescription,sentence,"{Nimodipine, nimodipine}","{Nimodipine, nimodipine, Infantile Neuroaxonal Dystrophy, Dilantin}",0.707107,Pharmacologic Substance
10,Patient was prescribed Phenytoin Sodium 250mg/5mL Vial IV of total 1000 mg,Phenytoin',2175-03-13,2175-03-13 05:13:00,prescription,sentence,{Phenytoin sodium},"{Phenytoin, phenytoin}",0.707107,Pharmacologic Substance
11,Patient was prescribed Phenytoin Sodium 250mg/5mL Vial IV of total 1000 mg,Phenytoin 16.',2175-03-13,2175-03-13 05:13:00,prescription,sentence,{Phenytoin sodium},"{Phenytoin, phenytoin}",0.707107,Pharmacologic Substance
12,Patient was prescribed Potassium Chloride 20mEq Packet PO of total 40 mEq,Potassium Chloride 20.',2175-03-13,2175-03-13 05:13:00,prescription,sentence,"{potassium chloride, Potassium Chloride}","{potassium chloride, Potassium Chloride}",1.0,Pharmacologic Substance
...,...,...,...,...,...,...,...,...,...,...
308,Patient was prescribed Potassium Chloride 2mEq/mL-20mL IV of total 40 mEq,"?Cerebral Salt wasting, NaCl started, sodium mildly improved today.'",2175-03-20,2175-03-20 06:58:00,prescription,sentence,"{Potassium Chloride 2 MEQ/ML includes Injectable Solution and Injection, Potassium Chloride 2 MEQ/ML}",{sodium chloride 0.124 MEQ/ML Oral Solution},0.57735,Clinical Drug
309,Patient was prescribed Potassium Chloride 2mEq/mL-20mL IV of total 60 mEq,"?Cerebral Salt wasting, NaCl started, sodium mildly improved today.'",2175-03-20,2175-03-20 06:58:00,prescription,sentence,"{Potassium Chloride 2 MEQ/ML includes Injectable Solution and Injection, Potassium Chloride 2 MEQ/ML}",{sodium chloride 0.124 MEQ/ML Oral Solution},0.57735,Clinical Drug
310,Patient was prescribed Potassium Chloride 10mEq/100mL Premix IV of total 20 mEq,"?Cerebral Salt wasting, NaCl started, sodium mildly improved today.'",2175-03-20,2175-03-20 06:58:00,prescription,sentence,"{Calcium Chloride 0.23 MEQ/ML / Magnesium Chloride 0.53 MEQ/ML / Potassium Chloride 0.1 MEQ/ML / Sodium Acetate 1.48 MEQ/ML / Sodium Chloride 0.27 MEQ/ML Injectable Solution, 100 ML Potassium Chloride 0.1 MEQ/ML Injection, Potassium Chloride 2 MEQ/ML}",{sodium chloride 0.124 MEQ/ML Oral Solution},0.689667,Clinical Drug
311,Patient was prescribed Potassium Chloride 2mEq/mL-20mL IV of total 40 mEq,"?Cerebral Salt wasting, NaCl started, sodium mildly improved today.'",2175-03-20,2175-03-20 06:58:00,prescription,sentence,"{Potassium Chloride 2 MEQ/ML includes Injectable Solution and Injection, Potassium Chloride 2 MEQ/ML}",{sodium chloride 0.124 MEQ/ML Oral Solution},0.57735,Clinical Drug


In [250]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [76]:
pair_idx = 10
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "Phenytoin not given."
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

prescription 1:	Patient was prescribed Phenytoin Sodium 250mg/5mL Vial IV of total 1000 mg
sentence 2:	Phenytoin'

New contradicting sentence: Phenytoin not given.


In [77]:
no_contradiction_pair_idx = [1, 309]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")
    
# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
prescription 1:	Patient was prescribed NiCARdipine IV 2.5mg/mL;10mL Amp IV DRIP of total 125 mg
sentence 2:	Nicardipine gtt'
*****************************
prescription 1:	Patient was prescribed Potassium Chloride 2mEq/mL-20mL IV of total 60 mEq
sentence 2:	?Cerebral Salt wasting, NaCl started,    sodium mildly improved today.'
*****************************


## Patient 9

In [94]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[8] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 166389


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

********** Processing data for 2196-10-13 **********
********** Processing data for 2196-10-14 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.5773502691896258
----- Data i -----
>> Time: 2196-10-14 22:36:00
>> Type: sentence
>> Concepts: {'Hyponatremia'}
>> # Hyponatremia: No clear baseline.'
----- Data j -----
>> Time: 2196-10-14 22:36:00
>> Type: sentence
>> Concepts: {'Pharmaceutical Preparations', 'Hyponatremia'}
>> Hyponatremia    HPI:    89 yo M admitted to medicine for hyponatremia of 114.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.5773502691896258
----- Data i -----
>> Time: 2196-10-14 22:36:00
>> Type: sentence
>> Concepts: {'Hyponatremia'}
>> Given hyponatremia, this is concerning for    new mental status changes, however, attention intact.'
----- Data j -----
>> Time: 2196-10-14 22:36:00
>> Type: sentence
>> Concepts: {'Pharmaceutical Preparations', 'Hyponatremia'}
>> Hyponatremia    HPI:    89 yo M admitted to medicine for hy

In [95]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about prescriptions values
prescription_pairs_df = df.loc[(df['type 1'] == "prescription") | (df['type 2'] == "prescription")]

prescription_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
7,Patient was prescribed Heparin Flush (10 units/ml) 10 Units/mL - 5 mL Syringe IV of total 2 mL,# Prophylaxis: Subutaneous heparin .',2196-10-14,2196-10-14 22:36:00,prescription,sentence,{heparin},{subcutaneous heparin},0.707107,Pharmacologic Substance
8,Patient was prescribed Heparin 5000 Units / mL- 1mL Vial SC of total 5000 UNIT,# Prophylaxis: Subutaneous heparin .',2196-10-14,2196-10-14 22:36:00,prescription,sentence,{heparin},{subcutaneous heparin},0.707107,Pharmacologic Substance


In [96]:
prescription_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type
7,Patient was prescribed Heparin Flush (10 units/ml) 10 Units/mL - 5 mL Syringe IV of total 2 mL,# Prophylaxis: Subutaneous heparin .',2196-10-14,2196-10-14 22:36:00,prescription,sentence,{heparin},{subcutaneous heparin},0.707107,Pharmacologic Substance
8,Patient was prescribed Heparin 5000 Units / mL- 1mL Vial SC of total 5000 UNIT,# Prophylaxis: Subutaneous heparin .',2196-10-14,2196-10-14 22:36:00,prescription,sentence,{heparin},{subcutaneous heparin},0.707107,Pharmacologic Substance
9,Patient was prescribed Potassium Chloride 20mEq Packet PO of total 60 mEq,"He received IVF, 60 mEq of KCL, and was free water restricted.'",2196-10-14,2196-10-14 22:36:00,prescription,sentence,"{potassium chloride, Potassium Chloride}","{potassium chloride, Calcium Chloride 0.23 MEQ/ML / Magnesium Chloride 0.53 MEQ/ML / Potassium Chloride 0.1 MEQ/ML / Sodium Acetate 1.48 MEQ/ML / Sodium Chloride 0.27 MEQ/ML Injectable Solution, Fertilization in Vitro}",0.507833,Pharmacologic Substance
10,Patient was prescribed Heparin 5000 Units / mL- 1mL Vial SC of total 5000 UNIT,# Prophylaxis: Subutaneous heparin .',2196-10-14,2196-10-14 22:36:00,prescription,sentence,{heparin},{subcutaneous heparin},0.707107,Pharmacologic Substance
28,Patient was prescribed Potassium Chloride 2mEq/mL-20mL IV of total 40 mEq,"He received IVF, 60 mEq of KCL, and was free water restricted.'",2196-10-14,2196-10-14 22:36:00,prescription,sentence,"{Potassium Chloride 2 MEQ/ML includes Injectable Solution and Injection, Potassium Chloride 2 MEQ/ML}","{potassium chloride, Calcium Chloride 0.23 MEQ/ML / Magnesium Chloride 0.53 MEQ/ML / Potassium Chloride 0.1 MEQ/ML / Sodium Acetate 1.48 MEQ/ML / Sodium Chloride 0.27 MEQ/ML Injectable Solution, Fertilization in Vitro}",0.805993,Clinical Drug
29,Patient was prescribed Sodium Chloride 3% (Hypertonic) 500 mL Bag IV of total 500 mL,"He received IVF, 60 mEq of KCL, and was free water restricted.'",2196-10-14,2196-10-14 22:36:00,prescription,sentence,{sodium chloride 3%},"{potassium chloride, Calcium Chloride 0.23 MEQ/ML / Magnesium Chloride 0.53 MEQ/ML / Potassium Chloride 0.1 MEQ/ML / Sodium Acetate 1.48 MEQ/ML / Sodium Chloride 0.27 MEQ/ML Injectable Solution, Fertilization in Vitro}",0.507833,Clinical Drug
41,Patient was prescribed Heparin 5000 Units / mL- 1mL Vial SC of total 5000 UNIT,# Prophylaxis: Subcutaneous heparin .',2196-10-15,2196-10-15 06:46:00,prescription,sentence,{heparin},{subcutaneous heparin},0.707107,Pharmacologic Substance


In [97]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [98]:
pair_idx = 28
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "He received IVFand was free water restricted, KCL stopped."
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

prescription 1:	Patient was prescribed Potassium Chloride 2mEq/mL-20mL IV of total 40 mEq
sentence 2:	He received IVF, 60    mEq of KCL, and was free water restricted.'

New contradicting sentence: He received IVFand was free water restricted, KCL stopped.


In [99]:
no_contradiction_pair_idx = [41]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")
    
# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
prescription 1:	Patient was prescribed Heparin 5000 Units / mL- 1mL Vial SC of total 5000 UNIT
sentence 2:	# Prophylaxis: Subcutaneous heparin    .'
*****************************


## Patient 10

In [100]:
#### Process patient data and iterate over pairs of Data instances to get pairs
# Step 1: Select a patient -- processes all the data
hadm_id = hadm_ids[9] # Note: `hadm_ids` is a list of all HADM id's with consecutive physician notes

# for storing data
generated_data_dict[int(hadm_id)] = {"contradiction": {}, "none": []}

print(f"Patient {int(hadm_id)}")

pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
              physician_only=True)

# Making data directory
processed_dir = 'processed'
os.makedirs(processed_dir, exist_ok=True)

pt_csv = os.path.join(processed_dir, f"{int(hadm_id)}.csv")

# Step 2: Generate pairs for this patient
df, data_inst_pairs = generate_data_pairs(pat)

# df.to_csv(pt_csv)
# print("Data has been saved!")

Patient 196357


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice 

********** Processing data for 2143-04-20 **********
********** Processing data for 2143-04-21 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.5714285714285713
----- Data i -----
>> Time: 2143-04-21 09:19:00
>> Type: sentence
>> Concepts: {'Heart failure', 'Kidney Failure, Acute'}
>>    Acute Renal Failure:   [**Month (only) 60**]  be due to decompensated heart failure.'
----- Data j -----
>> Time: 2143-04-21 09:19:00
>> Type: sentence
>> Concepts: {'Kidney', 'Tomorrow', 'Chronic diastolic heart failure', 'Ultrasonography'}
>> Renal US to r/o obstruction    Acute on chronic diastolic heart failure:  Check echo tomorrow.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.6546536707079772
----- Data i -----
>> Time: 2143-04-21 09:19:00
>> Type: sentence
>> Concepts: {'Heart failure', 'Hyponatremia'}
>>    Hyponatremia:  Likely due to heart failure.'
----- Data j -----
>> Time: 2143-04-21 09:19:00
>> Type: sentence
>> Concepts: {'Heart failure', 'Ki

In [101]:
#### Inserting contradictions to Sentence instances
# IMPORTANT: We should only insert contradictions if it is a sentence from a note ("type" should be sentence, not lab or prescription)! 

# Step 3: Get all the pairs about prescriptions values
prescription_pairs_df = df.loc[(df['type 1'] == "prescription") | (df['type 2'] == "prescription")]

prescription_pairs_df.head(2)

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type


In [102]:
prescription_pairs_df 

Unnamed: 0,sentence 1,sentence 2,time 1,time 2,type 1,type 2,concepts 1,concepts 2,cosine similarity,semantic type


In [261]:
# Step 4: Insert contradictions

# We should probably aim for 1-2 contradictions per patient. 
# So basically, copy/paste code for Steps 1-4 for each patient, and push to Github.
# Small heads up -- for a given patient, try not to insert contradictions 
# into two sentences that look really really similar. 
# There's a chance this might refer to the same underlying Sentence instance, 
# which could overwrite a contradiction you previously inserted. 

# Look through the sentence pairs by going through `prescription_pairs_df`.
# If you find a good one you want to insert a contradiction for, 
# make note of the row index (i.e. the number at the left), 
# and set this to `pair_idx` below. 
# Also make note of which sentence (i.e. sentence 1 or sentence 2)
# you want to modify, and set the `is_sentence2` flag appropriately.

In [262]:
pair_idx = 48
is_sentence2 = True

data_1 = data_inst_pairs[pair_idx][0][0]
data_2 = data_inst_pairs[pair_idx][0][1]

print(f"{data_1.type} 1:\t{data_1.txt}")
print(f"{data_2.type} 2:\t{data_2.txt}")

sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

# Set `contradicting_txt` to the new contradicting sentence.
# This will just update the text for now.

contradicting_txt = "36.5 %    12.2 g/dL    122 mg/dL    1.5 mg/dL    44 mg/dL    29 mEq/L    " +\
                    "84 mEq/L    4.3 mEq/L    120 mEq/L    12.5 K/uL         [image002.jpg]                               " +\
                    "[**2143-4-21**]   12:58 AM                               [**2143-4-21**]   04:50 AM    " +\
                    "WBC                                     12.5    Hct                                     36.5    " +\
                    "Plt                                      163    Cr                                      " +\
                    "1.5                                      1.5    TropT                                     0.03    " +\
                    "Glucose                                      130                                      122    " +\
                    "Other labs: PT / PTT / INR:14.7/290.0/1.3, CK / CKMB /    Troponin-T:30/2/0.03, ALT / AST:28/55, Alk Phos " +\
                    "/ T Bili:161/0.4,    Amylase / Lipase:27/13, Albumin:2.9 g/dL, LDH:347 IU/L, Ca++:8.2 mg/dL,    " +\
                    "Mg++:3.1 mg/dL, PO4:5.1 mg/dL'"
sentence_to_modify.update_text(contradicting_txt)

print(f"\nNew contradicting sentence: {contradicting_txt}")

# Store conflict
generated_data_dict[int(hadm_id)]['contradiction'][pair_idx] = (is_sentence2, contradicting_txt)

lab 1:	Patient's PTT lab came back 29.5 sec.
sentence 2:	36.5 %    12.2 g/dL    122 mg/dL    1.5 mg/dL    44 mg/dL    29 mEq/L    84 mEq/L    4.3 mEq/L    120 mEq/L    12.5 K/uL         [image002.jpg]                               [**2143-4-21**]   12:58 AM                               [**2143-4-21**]   04:50 AM    WBC                                     12.5    Hct                                     36.5    Plt                                      163    Cr                                      1.5                                      1.5    TropT                                     0.03    Glucose                                      130                                      122    Other labs: PT / PTT / INR:14.7/29.5/1.3, CK / CKMB /    Troponin-T:30/2/0.03, ALT / AST:28/55, Alk Phos / T Bili:161/0.4,    Amylase / Lipase:27/13, Albumin:2.9 g/dL, LDH:347 IU/L, Ca++:8.2 mg/dL,    Mg++:3.1 mg/dL, PO4:5.1 mg/dL'

New contradicting sentence: 36.5 %    12.2 g/dL    122 mg/dL    1.5 mg/dL 

In [263]:
no_contradiction_pair_idx = [48, 49, 104]

print("Examples of non-contradictions")
print("*****************************")
for pair_idx in no_contradiction_pair_idx:
    data_1 = data_inst_pairs[pair_idx][0][0]
    data_2 = data_inst_pairs[pair_idx][0][1]
    
    print(f"{data_1.type} 1:\t{data_1.txt}")
    print(f"{data_2.type} 2:\t{data_2.txt}")
    print("*****************************")
    
# Store negative examples
generated_data_dict[int(hadm_id)]['none'] = no_contradiction_pair_idx

Examples of non-contradictions
*****************************
lab 1:	Patient's PTT lab came back 29.5 sec.
sentence 2:	36.5 %    12.2 g/dL    122 mg/dL    1.5 mg/dL    44 mg/dL    29 mEq/L    84 mEq/L    4.3 mEq/L    120 mEq/L    12.5 K/uL         [image002.jpg]                               [**2143-4-21**]   12:58 AM                               [**2143-4-21**]   04:50 AM    WBC                                     12.5    Hct                                     36.5    Plt                                      163    Cr                                      1.5                                      1.5    TropT                                     0.03    Glucose                                      130                                      122    Other labs: PT / PTT / INR:14.7/29.5/1.3, CK / CKMB /    Troponin-T:30/2/0.03, ALT / AST:28/55, Alk Phos / T Bili:161/0.4,    Amylase / Lipase:27/13, Albumin:2.9 g/dL, LDH:347 IU/L, Ca++:8.2 mg/dL,    Mg++:3.1 mg/dL, PO4:5.1 mg/dL'
**************

In [103]:
generated_data_dict

{155131: {'contradiction': {140: (True, 'DC heparin stopped.')},
  'none': [32, 33]},
 129414: {'contradiction': {10: (True,
    'He stopped his albuterol use but this did not help so he came to the emergency room.')},
  'none': [25, 26]},
 133623: {'contradiction': {37: (True, 'Metoprolol not recommended.')},
  'none': []},
 197325: {'contradiction': {49: (True, 'Acetaminophen not given.')},
  'none': [76]},
 186291: {'contradiction': {101: (True,
    '- stop doxazosin - hydralazine as needed for diastolic >100 - continue to monitor # HIV: last CD4 307, VL 187; per ID fellow overnight recommendations, continue pt on abacavir/lamivudine/atazanavir.')},
  'none': [102, 103]},
 180836: {'contradiction': {72: (True,
    'Initially patient appeared in stress but later was doing well and did not require Morphine.')},
  'none': [31, 88]},
 154802: {'contradiction': {10: (True, 'Metoclopramide held.')},
  'none': [13, 14, 39, 44]},
 133857: {'contradiction': {10: (True, 'Phenytoin not given.'

In [104]:
import pickle
data_dict_file = "generated_data_dict_prescription.pkl"
with open(data_dict_file, "wb") as f:
    pickle.dump(generated_data_dict, f)

# 5. Loading contradictions data for pipeline [skip 4 if pickle file already created]

If `generated_data_dict_lab.pkl` has already been created, skip part 4. You should still run the inital cells, above "README" in that section though.

About 2 min per HADM_ID, 20 min total

In [105]:
# 9 - positive examples
# 16 - negative examples

In [106]:
import pickle
data_dict_file = "generated_data_dict_lab.pkl"
with open(data_dict_file, "rb") as f:
    generated_data_dict = pickle.load(f)

In [None]:
generated_dataset = [] # list of tuples, ((data 1, data 2), label)

for hadm_id in hadm_ids[:10]:
    print("***********************************")
    print(f"Patient {int(hadm_id)}")
    try:
        hadm_generated_dict = generated_data_dict[int(hadm_id)]
    except KeyError:
        print("This patient does not exist in contradiction set.")
        continue
        
    # Step 1: Select a patient -- process all data
    pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
                  med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
                  physician_only=True)

    # Step 2: Generate pairs for this patient
    df, data_inst_pairs = generate_data_pairs(pat)

    # Step 3A: Insert contradictions 
    print("+++++ Inserting contradictions +++++")
    for pair_idx, (is_sentence2, contradicting_txt) in hadm_generated_dict['contradiction'].items():
        data_1 = data_inst_pairs[pair_idx][0][0]
        data_2 = data_inst_pairs[pair_idx][0][1]

        print(f"{data_1.type} 1:\t{data_1.txt}")
        print(f"{data_2.type} 2:\t{data_2.txt}")

        sentence_to_modify = data_inst_pairs[pair_idx][0][is_sentence2]

        # Set `contradicting_txt` to the new contradicting sentence.
        # Update text and reprocess features.
        sentence_to_modify.update_text(contradicting_txt, True)

        print(f"\nNew contradicting sentence: {contradicting_txt}")
        print("+++++++++++++++++++++++++++++++++++")
        
        # Add example to dataset
        if is_sentence2:
            sentences = (data_1, sentence_to_modify)
        else:
            sentences = (sentence_to_modify, data_2)
        generated_dataset.append((sentences, 1)) # these are all contradictions
    
    # Step 3B: Insert negative examples (not contradictions)
    print("+++++ Inserting negative examples +++++")
    for pair_idx in hadm_generated_dict['none']:
        data_1 = data_inst_pairs[pair_idx][0][0]
        data_2 = data_inst_pairs[pair_idx][0][1]

        print(f"{data_1.type} 1:\t{data_1.txt}")
        print(f"{data_2.type} 2:\t{data_2.txt}")
        
        generated_dataset.append(((data_1, data_2), 0))
        print("+++++++++++++++++++++++++++++++++++")

***********************************
Patient 155131


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boole

In [None]:
n = len(generated_dataset)
n_negatives = len(list(filter(lambda x: x[1]==0, generated_dataset)))
n_positives = len(list(filter(lambda x: x[1]==1, generated_dataset)))

print(f"We have {n} total examples\n\t- {n_negatives} negative examples\n\t- {n_positives} positive examples")

# 6. Generating evaluation data (unlabeled) from MIMIC

We'll avoid the first 10 patients since they were used for generated contradictions

In [55]:
processed_dir = "processed"
os.makedirs(processed_dir, exist_ok=True)

In [56]:
per_pat_dataset_dict = {} # maps HADMID to patient's dataset in the form [((data 1, data 2), label), ...]
df_list = []
for hadm_id in hadm_ids[10:20]:
    print("***********************************")
    print(f"Patient {int(hadm_id)}")
        
    # Step 1: Select a patient -- process all data
    pat = Patient(hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
                  med7_nlp, umls_nlp, rxnorm_nlp, umls_linker, rxnorm_linker, \
                  physician_only=True)

    # Step 2: Generate pairs for this patient
    df, data_inst_pairs = generate_data_pairs(pat)
    df['HADM_ID'] = hadm_id
    per_pat_dataset_dict[hadm_id] = data_inst_pairs
    df_list.append(df)
    
    df.to_csv(f"{processed_dir}/{int(hadm_id)}.csv")

***********************************
Patient 162197


  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

********** Processing data for 2189-09-07 **********
********** Processing data for 2189-09-08 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.9999999999999998
----- Data i -----
>> Time: 2189-09-08 00:49:00
>> Type: sentence
>> Concepts: {'Communicable Diseases'}
>> Most likely with ascending GU infection.'
----- Data j -----
>> Time: 2189-09-08 00:49:00
>> Type: sentence
>> Concepts: {'Communicable Diseases'}
>> Also sepsis criteria given identification of GU infection.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 1.0
----- Data i -----
>> Time: 2189-09-08 00:49:00
>> Type: sentence
>> Concepts: {'Pyelonephritis'}
>> Findings c/w pyelonephritis with    ureteritis bilaterally.'
----- Data j -----
>> Time: 2189-09-08 00:49:00
>> Type: sentence
>> Concepts: {'Pyelonephritis'}
>> She had a grossly positive U/A, and CT Abd/Pelvis    showed evidence of bilateral pyelonephritis.'
**********************************
***** PAIR INDEX 2 *****
Cosine s

  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

********** Processing data for 2119-04-06 **********
********** Processing data for 2119-04-07 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.6324555320336758
----- Data i -----
>> Time: 2119-04-07 02:21:00
>> Type: sentence
>> Concepts: {'X-Ray Computed Tomography', 'Communicable Diseases'}
>> Infection cannot be ruled out based on CT scan of    the neck.'
----- Data j -----
>> Time: 2119-04-07 02:21:00
>> Type: sentence
>> Concepts: {'Communicable Diseases'}
>> In    the ED she was given Azithromycin, Vancomycin, and Ceftriaxone for    presumed infection.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.9999999999999998
----- Data i -----
>> Time: 2119-04-07 02:21:00
>> Type: sentence
>> Concepts: {'Communicable Diseases'}
>> No infection of orbit or sella found on prelim read.'
----- Data j -----
>> Time: 2119-04-07 02:21:00
>> Type: sentence
>> Concepts: {'Communicable Diseases'}
>> In    the ED she was given Azithromycin, Vancomycin, and 

  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice 

********** Processing data for 2132-04-10 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.9999999999999998
----- Data i -----
>> Time: 2132-04-10 19:21:00
>> Type: sentence
>> Concepts: {'Refractory anemias'}
>>    Vitals: T 98.7, P 91, BP 155/88, RR 12, O2 sat 98 RA'
----- Data j -----
>> Time: 2132-04-10 19:21:00
>> Type: sentence
>> Concepts: {'Refractory anemias'}
>> In the ED, initial VS were: T 97.1, P 84, BP 126/90, RR 18, O2sat 95    RA.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.8451542547285165
----- Data i -----
>> Time: 2132-04-10 19:21:00
>> Type: sentence
>> Concepts: {'Infantile Neuroaxonal Dystrophy', 'Gingi Med'}
>> Two    18 gauge PIVs placed for access, and patient typed & crossed for 2    units; given 2.5 L NS  GI evaluated him with a plan to scope him in ICU    while intubated.'
----- Data j -----
>> Time: 2132-04-10 19:21:00
>> Type: sentence
>> Concepts: {'Infantile Neuroaxonal Dystrophy', 'Gingi Med', 'Duodenal Ulc

  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

********** Processing data for 2173-06-30 **********
********** Processing data for 2173-07-01 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.816496580927726
----- Data i -----
>> Time: 2173-07-01 05:14:00
>> Type: sentence
>> Concepts: {'Kidney Failure, Chronic'}
>>    Fluids: ESRD.'
----- Data j -----
>> Time: 2173-07-01 05:14:00
>> Type: sentence
>> Concepts: {'Kidney Failure, Chronic', 'Kidney', 'Huntington Disease'}
>>    Renal: Foley, ESRD on HD (MWF).'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 1.0
----- Data i -----
>> Time: 2173-07-01 05:14:00
>> Type: sentence
>> Concepts: {'Americium'}
>> Re-attempt access in AM.'
----- Data j -----
>> Time: 2173-07-01 05:14:00
>> Type: sentence
>> Concepts: {'Americium'}
>> OR in Am for second look'
**********************************
***** PAIR INDEX 2 *****
Cosine similarity: 0.5773502691896258
----- Data i -----
>> Time: 2173-07-01 05:14:00
>> Type: sentence
>> Concepts: {'Dialysis procedure',

  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

********** Processing data for 2173-07-24 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.7647058823529411
----- Data i -----
>> Time: 2173-07-24 13:17:00
>> Type: sentence
>> Concepts: {'Atrial Fibrillation', 'Hematochezia', 'Hypertensive disease', 'Huntington Disease', 'International Normalized Ratio', 'Hematocrit procedure', 'Atrial Fibrillation Sotalol Hydrochloride 80 MG Oral Tablet [Betapace]', 'Kidney Failure, Chronic', 'Coumadin', 'Small bowel obstruction'}
>> Chief Complaint:  BRBPR    HPI:    78 year old male with a past medical history significant for DM, HTN,    atrial fibrillation on coumadin, hx tachy-brady s/p pacemaker, ESRD on    HD s/p recent ex-lap *2 for small bowel obstruction night prior to    admission BRBPR with INR 2.7, HCT 28 at rehab.'
----- Data j -----
>> Time: 2173-07-24 13:17:00
>> Type: sentence
>> Concepts: {'Atrial Fibrillation', 'Infantile Neuroaxonal Dystrophy', 'Hypertensive disease', 'Huntington Disease', 'Atrial Fibrillation Sotalol Hydro

  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

********** Processing data for 2173-08-27 **********
********** Processing data for 2173-08-28 **********
********** Processing data for 2173-08-29 **********
********** Processing data for 2173-08-30 **********
********** Processing data for 2173-08-31 **********
********** Processing data for 2173-09-01 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.8660254037844387
----- Data i -----
>> Time: 2173-09-01 22:13:00
>> Type: sentence
>> Concepts: {'Morphine Sulfate 10 MG Oral Tablet'}
>> Adequate MS  [**First Name (Titles) **]   [**Last Name (Titles) 10259**] way protection.'
----- Data j -----
>> Time: 2173-09-01 22:13:00
>> Type: sentence
>> Concepts: {'Morphine Sulfate 10 MG Oral Tablet', 'Acute Cholecystitis'}
>>    HPI: 78 male admitted to the west 1 service for acute cholecystitis now    s.p  perchole  [**8-30**]   presents with  evolving L posterior temp stroke,    worsening mental status, a flutter/ fib transferred to the unit due to    poor MS and concern for airway p

  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_single_column(ilocs[0], value, pi)
A value is trying to be set on a copy of a slice from a

********** Processing data for 2199-03-18 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.7071067811865475
----- Data i -----
>> Time: 2199-03-18 23:57:00
>> Type: sentence
>> Concepts: {'Hypertensive disease', 'Antihypertensive Agents'}
>> She does have a history of HTN and is compliant with her    antihypertensives.'
----- Data j -----
>> Time: 2199-03-18 23:57:00
>> Type: sentence
>> Concepts: {'Hypertensive disease'}
>> Father  [**Name (NI) 6008**]  HTN.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.9999999999999999
----- Data i -----
>> Time: 2199-03-18 23:57:00
>> Type: sentence
>> Concepts: {'Diabetes Mellitus, Non-Insulin-Dependent'}
>> NIDDM.'
----- Data j -----
>> Time: 2199-03-18 23:57:00
>> Type: sentence
>> Concepts: {'Diabetes Mellitus, Non-Insulin-Dependent'}
>> # Type 2 Diabetes: Hold po medications for now.'
**********************************
***** PAIR INDEX 2 *****
Cosine similarity: 0.6708203932499369
----- Data i -----
>

  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice 

********** Processing data for 2164-11-23 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.5345224838248487
----- Data i -----
>> Time: 2164-11-23 11:19:00
>> Type: sentence
>> Concepts: {'Seizures', 'Lennox-Gastaut syndrome', 'Feverall', 'Morning After'}
>> Chief Complaint:  Fever, seizure    HPI:    49 year-old man with a history of presumed  [**Location (un) 6993**]  Gastaut Syndrome and    with a recent complicated medical history presents this morning for a    seizure in the setting of fever.'
----- Data j -----
>> Time: 2164-11-23 11:19:00
>> Type: sentence
>> Concepts: {'Lennox-Gastaut syndrome', 'Infantile Neuroaxonal Dystrophy', 'Seizures', 'Epilepsy'}
>> Assessment and Plan    49 year-old man with a history of presumed  [**Location (un) 6993**]  Gastaut Syndrome and    epilepsy presents with a seizure episode for greater than 10 minutes.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 1.0000000000000002
----- Data i -----
>> Time: 2164-

  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.prescription_df[['START_DT']] = start_dt
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/us

********** Processing data for 2102-09-27 **********
***** PAIR INDEX 0 *****
Cosine similarity: 1.0000000000000002
----- Data i -----
>> Time: 2102-09-27 16:25:00
>> Type: sentence
>> Concepts: {'Chronic multifocal osteomyelitis'}
>> CMO but currently full code'
----- Data j -----
>> Time: 2102-09-27 16:25:00
>> Type: sentence
>> Concepts: {'Chronic multifocal osteomyelitis'}
>>    General: It has been reported to the nursing staff that the patient is    to be made CMO.'
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 1.0000000000000002
----- Data i -----
>> Time: 2102-09-27 16:25:00
>> Type: sentence
>> Concepts: {'Infantile Neuroaxonal Dystrophy'}
>> Given this situation the above plan is certainly    subject to change.'
----- Data j -----
>> Time: 2102-09-27 16:25:00
>> Type: sentence
>> Concepts: {'Infantile Neuroaxonal Dystrophy'}
>>    Assessment And Plan: 76yo female, retired nun presents from  [**Hospital 5417**]     Hospital s/p fall down down  

  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
A value is trying to be set on a copy of

********** Processing data for 2195-11-23 **********
***** PAIR INDEX 0 *****
Cosine similarity: 0.7453559924999298
----- Data i -----
>> Time: 2195-11-23 17:43:00
>> Type: sentence
>> Concepts: {'MICROCEPHALY, EPILEPSY, AND DIABETES SYNDROME'}
>> Continue home    meds.'
----- Data j -----
>> Time: 2195-11-23 17:43:00
>> Type: sentence
>> Concepts: {'Sugars', 'Hypoglycemia', 'hyperglycemic agent', 'MICROCEPHALY, EPILEPSY, AND DIABETES SYNDROME'}
>>  HYPOGLYCEMIA:  Unclear etiology, taking po meds, poor po intake (down    from baseline) albeit family notes he was taking juices with low    sugars.', "Was taking his oral hyperglycemic agents, doesn't recall    taking extra."
**********************************
***** PAIR INDEX 1 *****
Cosine similarity: 0.6324555320336758
----- Data i -----
>> Time: 2195-11-23 17:43:00
>> Type: sentence
>> Concepts: {'Chronic anemia', 'Anemia'}
>>  ANEMIA:  Chronic anemia, hct stable.'
----- Data j -----
>> Time: 2195-11-23 17:43:00
>> Type: sentence
>> Co

  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]


KeyboardInterrupt: 

### Getting History + Allergy Information - @Sharon, you can ignore everytihng below

In [116]:
# todo: 
# - DONE function to re-process all data from Patient instance -- pat.process_notes(); pat.process_by_date()
# - function to update Note -- should update dataframe of patient directly
#   - can go back to dataframe, but can't map tokenized sentence to original note in df -- todo
#   - function to update tokenized sentence
# - later: function to update original dataframe from patient dataframe

import re

def get_section(regex_dict, txt):
    """ Given a dictionary of start and end regex's for a
        particular section, gets the start and endpoint of 
        section in the text and returns indices. 
        Returns None if section does not exist.
    """
    try:
        start    = re.search(regex_dict["start"], txt).start()
        end      = re.search(regex_dict["end"],   txt).start()
    except AttributeError:
        start, end = None, None
    
    return start, end

note = pat.notes[4]

# Sections to store 
# note: most of these sections have already been removed,
#       but if they haven't might have to remove then 
#       reprocess everything
allergy_regex = {"start": "Allergies:",
                 "end":   "Last dose of Antibiotics:"}
history_regex = {"start": "Past medical history:",
                 "end":   "Other:"}

allergy_start, allergy_end = get_section(allergy_regex, note.txt)
history_start, history_end = get_section(history_regex, note.txt)

pt_allergies = "" if allergy_start is None else note.txt[allergy_start:allergy_end]
pt_histories = "" if history_start is None else note.txt[history_start:history_end]

print("******** Allergies ********")
print(pt_allergies[:100])
print("******** Histories ********")
print(pt_histories[:100])

******** Allergies ********
Allergies:
   Bactrim (Oral) (Sulfamethoxazole/Trimethoprim)
   Nausea/Vomiting
   Amiodarone
   Ras
******** Histories ********

