In [1]:
# Import libraries
import os
import re
import random
import pickle
import subprocess
import numpy as np
import pandas as pd

from tqdm import tqdm
from datetime import datetime
from collections import Counter

# 1. Setup concept extractors

Some options were [MetaMap](https://metamap.nlm.nih.gov/) and [spaCy](https://spacy.io/). 

[MetaMap](https://metamap.nlm.nih.gov/) is specific to recognizing UMLS concepts. There is a [Python wrapper](https://github.com/AnthonyMRios/pymetamap), but known to be slow and bad.

[spaCy](https://spacy.io/) is a popular NLP Python package with an extensive library for named entity recognition. It has a wide variety of [extensions](https://spacy.io/universe) and models to choose from. We're going with the following.

* [scispaCy](https://spacy.io/universe/project/scispacy) contains spaCy models for processing biomedical, scientific or clinical text. It seems easy to use and has a wide variety of concepts it can recognize, including UMLS, RxNorm, etc.

* [negspaCy](https://spacy.io/universe/project/negspacy) identifies negations using some extension of regEx. Probably useful for things like, "this pt is diabetic" v. "this pt is not diabetic." [todo: negation identification of medspacy might be better, https://github.com/medspacy/medspacy]

* [Med7](https://github.com/kormilitzin/med7) is a model trained for recognizing entities in prescription text, e.g. identifies drug name, dosage, duration, etc., which could be useful stuff to check for conflicts. 

We're going with spaCy for this.. and coming up with a coherent way to integrate entities picked up by these three extensions/models.

## i) Installations

In [2]:
import sys; sys.executable

'/opt/conda/envs/opennotes/bin/python'

In [3]:
import spacy
import scispacy

from pprint import pprint
from collections import OrderedDict

from spacy import displacy
# from scispacy.abbreviation import AbbreviationDetector # UMLS already contains abbrev. detect
from scispacy.umls_linking import UmlsEntityLinker

spacy.__version__

'2.3.5'

## ii) Setting up the model

The model is used to form word/sentence embeddings for the NER task. Thus, it's important to choose model that has been tuned for our specific use case (e.g. clinical text, prescription information) so the embeddings are useful for naming the entity.

[Note to self:] one potential idea to look into if we have time remaining, something about using custom model for spacy pipeline (could we do smth with the romanov models since they've been trained specifically for conflict detection?) -- https://spacy.io/usage/v3

### a) scispaCy

For scispaCy, we set up one of their models that has been trained on biomedical data. Other models can be found [here](https://allenai.github.io/scispacy/). 

In [4]:
!/opt/conda/envs/opennotes/bin/python -m pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.5/en_core_sci_sm-0.2.5.tar.gz
!/opt/conda/envs/opennotes/bin/python -my pip uninstall negspacy
!/opt/conda/envs/opennotes/bin/python -m pip install spacy==2.3.5

Collecting https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.5/en_core_sci_sm-0.2.5.tar.gz
  Using cached https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.5/en_core_sci_sm-0.2.5.tar.gz (33.1 MB)
/opt/conda/envs/opennotes/bin/python: No module named y


In [5]:
sci_nlp = spacy.load("en_core_sci_sm")

### b) Med7

For Med7, we set up their model that has been trained specifically for NER of medication-related concepts: dosage, drug names, duration, form, frequency, route of administration, and strength. The model is trained on MIMIC-III, so it should work well for us.

In [6]:
# installs Med7 model
!pip install https://www.dropbox.com/s/xbgsy6tyctvrqz3/en_core_med7_lg.tar.gz?dl=1

Collecting https://www.dropbox.com/s/xbgsy6tyctvrqz3/en_core_med7_lg.tar.gz?dl=1
  Downloading https://www.dropbox.com/s/xbgsy6tyctvrqz3/en_core_med7_lg.tar.gz?dl=1 (892.8 MB)
[K     |█████████████████████████▍      | 709.2 MB 103.3 MB/s eta 0:00:02  |█                               | 27.1 MB 2.0 MB/s eta 0:07:06

IOPub data rate exceeded.
The Jupyter server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--ServerApp.iopub_data_rate_limit`.

Current values:
ServerApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
ServerApp.rate_limit_window=3.0 (secs)



[K     |████████████████████████████████| 892.8 MB 6.3 kB/s  eta 0:00:01


In [7]:
med7_nlp = spacy.load("en_core_med7_lg")

## iii) Adding an entity linker

The EntityLinker is a spaCy component that links to a knowledge base. The linker compares words with the concepts in the specified knowledge base (e.g. scispaCy's UMLS does some form of character overlap-based nearest neighbor search, has option to resolve abbreviations first).

[Note: Entities generally get resolved to a list of different entities. This [blog post](http://sujitpal.blogspot.com/2020/08/disambiguating-scispacy-umls-entities.html) describes one potential way to disambiguate this by figuring out "most likely" set of entities. Gonna start off with just resolving to the 1st entity tho... hopefully that's sufficient.]

### a) scispaCy

#### UMLS Linker

UMLS linker maps entities to the UMLS concept. Main parts we'll be interested in are: semantic type and concept (mainly the common name, maybe the CUI might become important later).

* _Semantic type_ is the broader category that the entity falls under, e.g. disease, pharmacologic substance, etc. See [this](https://metamap.nlm.nih.gov/Docs/SemanticTypes_2018AB.txt) for a full list.

* _Concepts_ refer to the more fundamental entity itself, e.g. pneumothorax, ventillator, etc. Many concepts can fall under a semantic type.

More info on `UmlsEntityLinker` ([source code](https://github.com/allenai/scispacy/blob/4ade4ec897fa48c2ecf3187caa08a949920d126d/scispacy/linking.py#L9))

In [8]:
from scispacy.umls_linking import UmlsEntityLinker
# import scispacy.rxnorm_linking

# abbreviation_pipe = AbbreviationDetector(nlp) # automatically included with UMLS linker
# nlp.add_pipe(abbreviation_pipe)
linker = UmlsEntityLinker(k=10,                          # number of nearest neighbors to look up from
                          threshold=0.7,                 # confidence threshold to be added as candidate
                          max_entities_per_mention=1,    # number of entities returned per concept (todo: tune)
                          filter_for_definitions=False,  # no definition is OK
                          resolve_abbreviations=True)    # resolve abbreviations before linking
sci_nlp.add_pipe(linker)



### b) Med7 

No need for entity linker

### c) Negspacy [TODO]

# 2. Setup data structures

## Categorizing type of conflict

The first larger task is to categorize by the type of conflict to check for since our method will likely be different (at least for the rule based). We wrote up a short list [here](https://docs.google.com/document/d/1fEBk0JHeyQWshYWW5w_VTkaYyRfm9MBxJ9DAGoVa8Yw/edit?usp=sharing). 

To do this, we're using the semantic type that is identified by the UMLS linker. Here's a table of the semantic types we're filtering for, and which conflict they'll be used for.

Here's a [full list](https://metamap.nlm.nih.gov/Docs/SemanticTypes_2018AB.txt) of semantic types. You can look up definitions of semantic types [here](http://linkedlifedata.com/resource/umls-semnetwork/T033).

| Conflict | Semantic Type |
| --- | ----------- |
| Diagnoses-related errors | Disease or Syndrome (T047), Diagnostic Procedure(T060) |
| Inaccurate description of medical history (symptoms) | Sign or Symptom (T184) |
| Inaccurate description of medical history (operations) | Therapeutic or Preventive Procedure (T061) |
| Inaccurate description of medical history (other) | [all of the above and below] |
| Medication or allergies | Clinical Drug (T200), Pharmacologic Substance (T121) |
| Test procedures or results | Laboratory Procedure (T059), Laboratory or Test Result (T034) | 


For clarity, the concepts we'll keep from the UMLS linker are anything falling into these semantic types (which we will then categorize by type of conflict using the table above):

* T047 - Disease or Syndrome
* T121 - Pharmacologic Substance
* T023 - Body Part, Organ, or Organ Component
* T061 - Therapeutic or Preventive Procedure 
* T060 - Diagnostic Procedure
* T059 - Laboratory Procedure
* T034 - Laboratory or Test Result 
* T184 - Sign or Symptom 
* T200 - Clinical Drug

We'll store this info into a dictionary now.

<!-- Some useful def's 
Finding - 
That which is discovered by direct observation or measurement of an organism attribute or condition, including the clinical history of the patient. The history of the presence of a disease is a 'Finding' and is distinguished from the disease itself.  -->

In [25]:
SEMANTIC_TYPES = ['T047', 'T121', 'T023', 'T061', 'T060', 'T059', 'T034', 'T184', 'T200']
SEMANTIC_NAMES = ['Disease or Syndrome', 'Pharmacologic Substance', 'Body Part, Organ, or Organ Component', \
                  'Therapeutic or Preventive Procedure', 'Diagnostic Procedure', 'Laboratory Procedure', \
                  'Laboratory or Test Result', 'Sign or Symptom', 'Clinical Drug']
SEMANTIC_TYPE_TO_NAME = dict(zip(SEMANTIC_TYPES, SEMANTIC_NAMES))

SEMANTIC_TYPE_TO_NAME

{'T047': 'Disease or Syndrome',
 'T121': 'Pharmacologic Substance',
 'T023': 'Body Part, Organ, or Organ Component',
 'T061': 'Therapeutic or Preventive Procedure',
 'T060': 'Diagnostic Procedure',
 'T059': 'Laboratory Procedure',
 'T034': 'Laboratory or Test Result',
 'T184': 'Sign or Symptom',
 'T200': 'Clinical Drug'}

In [26]:
CONFLICT_TO_SEMANTIC_TYPE = {
    "diagnosis": {'T047', 'T060'},
    "med_history_symptom": {'T184'},
    "med_history_operation": {'T061'},
    "med_history_other": set(SEMANTIC_TYPES),
    "med_allergy": {'T200', 'T121'},
    "test_results": {'T059', 'T034'}
}

CONFLICT_TO_SEMANTIC_TYPE

{'diagnosis': {'T047', 'T060'},
 'med_history_symptom': {'T184'},
 'med_history_operation': {'T061'},
 'med_history_other': {'T023',
  'T034',
  'T047',
  'T059',
  'T060',
  'T061',
  'T121',
  'T184',
  'T200'},
 'med_allergy': {'T121', 'T200'},
 'test_results': {'T034', 'T059'}}

In [100]:
class Patient(object):
    def __init__(self, hadm_id, notes_df, drug_df, lab_df, d_lab_df, \
                 med7_nlp, sci_nlp, umls_linker, \
                 physician_only=True):
        """ Patient representation
        
        med7_nlp:    spacy model from Med7
        sci_nlp:     spacy model from scispaCy
        umls_linker: entity linker for UMLS, should already be added to sci_nlp
        """
        self.hadm_id = hadm_id
        self.physician_only = physician_only
        
        # this patient's data
        self.notes_df = self.filter_notes(notes_df.loc[notes_df['HADM_ID'] == hadm_id])
        self.drug_df  = drug_df.loc[drug_df['HADM_ID'] == hadm_id]
        self.lab_df   = lab_df.loc[lab_df['HADM_ID'] == hadm_id]
        
        self.d_lab_df = d_lab_df # lab ditems df
        
        # spaCy models & entity linkers
        self.med7 = med7_nlp
        self.sci  = sci_nlp
        self.umls = umls_linker
        
        # Process notes
        notes = []
        for row_id in pat.notes_df.ROW_ID:
            note = Note(self, row_id)
            notes.append(note)
        self.notes = notes
        
        # todo: process labs and drugs
        
    def filter_notes(self, pat_notes_df):
        if self.physician_only: pat_notes_df = self._filter_physician(pat_notes_df)
        pat_notes_df = self._filter_duplicates(pat_notes_df)
        
        return pat_notes_df
    
    def _filter_physician(self, pat_notes_df):
        # Filter for only physician notes
        return pat_notes_df.loc[pat_notes_df.CATEGORY == "Physician "]
        
    def _filter_duplicates(self, pat_notes_df):
        # Filtering out duplicate / autosave's -- only take the longest
        for cat in pat_notes_df.CATEGORY.unique(): 
            cat_notes_df = pat_notes_df.loc[pat_notes_df.CATEGORY == cat]
            for time in cat_notes_df.CHARTTIME.unique():
                time_notes_df = cat_notes_df.loc[cat_notes_df.CHARTTIME == time]
                if len(time_notes_df) > 1:
                    # get indices of first N-1 shortest rows
                    idx_to_drop = time_notes_df.TEXT.apply(lambda x: len(x)).sort_index().index[:-1]
                    pat_notes_df = pat_notes_df.drop(idx_to_drop) # drop by row index
                    
        return pat_notes_df

class Row(object):
    def __init__(self, patient):
        pass
    
    @property
    def hadm_id(self):
        return self.patient.hadm_id
    
    @property
    def med7(self):
        return self.patient.med7
    
    @property
    def sci(self):
        return self.patient.sci
    
    @property
    def umls(self):
        return self.patient.umls    

class Note(Row):
    def __init__(self, patient, row_id):
        self.patient  = patient                                                   # patient this note is for
        self.note_row = patient.notes_df.loc[patient.notes_df.ROW_ID == row_id]   # df row for this note
        self.txt      = self.note_row.TEXT.item()                                       # note in string format
        self.cat      = self.note_row.CATEGORY.item()                                   # note category
        
        # Get datetime
        if type(self.note_row.CHARTTIME.item()) == str:
            self.time = datetime.strptime(self.note_row.CHARTTIME.item(), "%Y-%m-%d %H:%M:%S")
        elif type(self.note_row.CHARTDATE.item()) == str:
            self.time = datetime.strptime(self.note_row.CHARTDATE.item(), "%Y-%m-%d")
        else:
            self.time = None
            
        # Tokenize note
        sents = !python mimic-tokenize/heuristic-tokenize.py "{self.txt}"
        sentences = sents[0].split(", \'")
#         # For python script: runs command and returns stdout as bytes, convert to utf-8, list of sentences
#         sents = subprocess.check_output(f"python mimic-tokenize/heuristic-tokenize.py {self.txt}".split(" "))
#         sents = sents.decode("utf-8")
#         sentences = sents.split(", \'")

        # Remove lab tables, remove titles
        sentences = self._delete_copied_lab_tables(sentences)
        sentences = self._remove_titles(sentences)
        
        self.sentences = sentences # todo: process each sentence
        
        # Process each sentence
        sentence_reps = []
        for idx, sent in enumerate(sentences):
            sent_rep = Sentence(self, idx,
                                filter_map=SEMANTIC_TYPE_TO_NAME,
                                conflict_map=CONFLICT_TO_SEMANTIC_TYPE)
            sentence_reps.append(sent_rep)
            
        self.sentence_reps = sentence_reps
    
    def __getitem__(self, idx):
        return self.sentence_reps[idx]
    
    def _diff_list(self, li1, li2):
        return list(set(li1) - set(li2)) + list(set(li2) - set(li1))

    def _delete_copied_lab_tables(self, ind_sentences):
        # [**yyyy-mm-dd**], 02:10
#         rgx_list = ["[\*\*\d{4}\-\d{1,2}\-\d{1,2}\*\*]", "\d{1,2}\-\d{1,2}"]
#         rgx_list = ["[\*\*[0-9]{4}-[0-9]{1,2}-[0-9]{1,2}\*\*] *[0-9]{1,2}-[0-9]{1,2}"]
#         rgx_list = ["[\*\*[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\*\*]   [0-9][0-9]-[0-9][0-9]"]
        rgx_list = ["[\*\*[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\*\*]"]
#         rgx_list = ["[\d{4}\-\d{1,2}\-\d{1,2}][^\S]+\d{1,2}\-\d{1,2}"]
        
        delete_list = []
        # ind_sentences is list of strings
        for sentence in ind_sentences:
            for rgx_match in rgx_list:
                match = re.search(rgx_match, sentence)
                if match and sentence not in delete_list:
                    delete_list.append(sentence)
        return self._diff_list(ind_sentences, delete_list)
    
    def _remove_titles(self, sentences):
        """ Omits anything that has ':' in last two entries of the string. 
        e.g. "...Results:"
        """
        return list(filter(lambda x: ':' not in x[-2:], sentences))
        
class Drugs(Row):
    def __init__(self, patient):
        pass

class Labs(Row):
    def __init__(self, patient):
        pass

class Sentence(object):
    def __init__(self, note, sentence_idx, filter_map=None, conflict_map=None):
        """
        Extracts important information and stores them as attributes. 
        """
        self.sentence_idx = sentence_idx
        self.txt          = note.sentences[sentence_idx]

        self.umls_cui_map = note.umls.umls.cui_to_entity # maps CUI to entity information
        self.filter_map   = filter_map
        self.conflict_map = conflict_map
        self.is_filter    = (filter_map is not None)
        self.is_conflict  = (conflict_map is not None)
        
        self.sci_doc  = note.sci(self.txt)
        self.med7_doc = note.med7(self.txt)
        
        self.semantic_types = []
        self.semantic_names = []  # names of categories of entities
        self.canonical_names = [] # names of types of entities
        self.get_umls_info()
        
        self.med7_entities = []   # list of tuples with (entity word, entity label), e.g. (aspirin, drug)
        self.get_med7_info()
        
    def get_med7_info(self):
        # list of tuples with (entity word, entity label), e.g. (aspirin, drug)
        self.med7_entities = [(ent.text, ent.label_) for ent in self.med7_doc.ents]
        
    @property
    def features(self):
        """ Returns canonical names of extracted concepts, semantic type names + ID """
        return self.canonical_names, self.semantic_names, self.semantic_types

    def get_umls_info(self):
        for ent in self.sci_doc.ents: # extract info (umls) for each entity
            # todo: look into this bug, ent._.umls_ents sometimes empty list
            try:
                cui, _ = ent._.umls_ents[0] # assuming `max_entites_per_mention=1` for now
            except IndexError:
                continue
            cui_info = self.umls_cui_map[cui]
                        
            ent_valid_type_list = [t in self.filter_map for t in cui_info.types]
            ent_valid_type = any(ent_valid_type_list) # checks if entity is a valid type
            
            if not self.is_filter or ent_valid_type: # only add to list if we're not filtering of it's valid
                self.canonical_names.append(cui_info.canonical_name)
                for (stype, keep) in zip(cui_info.types, ent_valid_type_list):
                    if keep:
                        self.semantic_types.append(stype)
                        self.semantic_names.append(self.filter_map[stype])
            
        self.semantic_types = set(self.semantic_types)
        self.semantic_names = set(self.semantic_names)
        self.canonical_names = set(self.canonical_names)
        
    def similarity(self, srep):
        """ Given another SentenceRep instance, compares similarity.
        
        e.g.
        srep.similarity(srep)   # measuring similarity with itself
        >> 1.0                  # 1.0 is maximum score
        """
        return self.sci_doc.similarity(srep.doc)
    
    def is_ctype(self, ctype):
        """ Given a conflict type (e.g. "diagnosis"),
            returns True if this sentence falls into that category, False otherwise.
            Returns None if conflict_map is undefined.
        """
        if self.is_conflict: 
            ctype_stypes = self.conflict_map[ctype] # get list of semantic types for this conflict
            return any([stype in ctype_stypes for stype in self.semantic_types])
        return None

# 3. Load and process data

In [55]:
# Load MIMIC tables
notes_df  = pd.read_csv('NOTEEVENTS.csv.gz',    compression='gzip', error_bad_lines=False)
drug_df   = pd.read_csv('PRESCRIPTIONS.csv.gz', compression='gzip', error_bad_lines=False)
lab_df    = pd.read_csv('LABEVENTS.csv.gz',     compression='gzip', error_bad_lines=False)
d_lab_df  = pd.read_csv('D_LABITEMS.csv.gz',    compression='gzip', error_bad_lines=False)

  interactivity=interactivity, compiler=compiler, result=result)
  interactivity=interactivity, compiler=compiler, result=result)


In [101]:
# Load HADM ID's with consecutive physician notes
if os.path.exists("hadm_ids.pkl"):
    with open("hadm_ids.pkl", "rb") as f:
        hadm_ids = pickle.load(f)
else:
    hadm_ids = []
    for hadm_id in tqdm(data.HADM_ID.unique()):
        hadm_data = data.loc[data.HADM_ID == hadm_id]
        hadm_phys_notes = hadm_data.loc[hadm_data.CATEGORY == "Physician "]

        if len(hadm_phys_notes) > 1:
            hadm_ids.append(hadm_id)

    with open("hadm_ids.pkl", "wb") as f:
        pickle.dump(hadm_ids, f)
        
print(f"There are {len(hadm_ids)} patients with consecutive physician notes.")

There are 8733 patients with consecutive physician notes.


In [102]:
# test an example

# Create patient instance -- processes all the data
pat = Patient(hadm_ids[1], notes_df, drug_df, lab_df, d_lab_df, \
              med7_nlp, sci_nlp, linker, \
              physician_only=True)

# note.hadm_id
# note.txt
# note.cat
# note.time

  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boolean_flags] = numpy.array(neighbors)[:-1]
  extended_distances[empty_vectors_boolean_flags] = numpy.array(distances)[:-1]
  extended_neighbors[empty_vectors_boole

In [109]:
note = pat.notes[0]

for sent_rep in note.sentence_reps:
    sent_rep.canonical_names

In [107]:
# for each note (same times, comparable)
    # 

<__main__.Note at 0x7f0b258f3ed0>

In [None]:
# get cosine sim
vectorizer = CountVectorizer()
corpus = list(map(lambda x: ' '.join(x), semantic_sreps_canon_names))
X = vectorizer.fit_transform(corpus)
X = X.toarray()


In [96]:
for note in pat.notes:
    print(note.time)

2131-12-23 23:51:00
2131-12-23 22:56:00
2131-12-24 11:44:00
2131-12-24 07:33:00
2131-12-25 09:37:00
2131-12-25 07:56:00
2131-12-26 07:42:00
2131-12-26 10:04:00


In [62]:
pat.notes

[<__main__.Note at 0x7f0b1a142e50>,
 <__main__.Note at 0x7f0b258f1d10>,
 <__main__.Note at 0x7f0b08c12c10>,
 <__main__.Note at 0x7f0b07e5e350>,
 <__main__.Note at 0x7f0b07c36290>,
 <__main__.Note at 0x7f0b04897ed0>,
 <__main__.Note at 0x7f0b066ec190>,
 <__main__.Note at 0x7f0b041956d0>]

In [66]:
pat.notes[0].sentence_reps

[<__main__.Sentence at 0x7f0b258f1410>,
 <__main__.Sentence at 0x7f0b258f1310>,
 <__main__.Sentence at 0x7f0b229830d0>,
 <__main__.Sentence at 0x7f0b2597d190>,
 <__main__.Sentence at 0x7f0b258ed050>,
 <__main__.Sentence at 0x7f0b0b7c47d0>,
 <__main__.Sentence at 0x7f0b2593d5d0>,
 <__main__.Sentence at 0x7f0b258da390>,
 <__main__.Sentence at 0x7f0b0a13ecd0>,
 <__main__.Sentence at 0x7f0b09b14450>,
 <__main__.Sentence at 0x7f0b1b805490>,
 <__main__.Sentence at 0x7f0b0b902a50>,
 <__main__.Sentence at 0x7f0b09a38810>,
 <__main__.Sentence at 0x7f0b0a13ee10>,
 <__main__.Sentence at 0x7f0b0ac2c7d0>,
 <__main__.Sentence at 0x7f0b0abbebd0>,
 <__main__.Sentence at 0x7f0b0a13efd0>,
 <__main__.Sentence at 0x7f0b1b20ca50>,
 <__main__.Sentence at 0x7f0b0aad3390>,
 <__main__.Sentence at 0x7f0b0aad3210>,
 <__main__.Sentence at 0x7f0b0aad33d0>,
 <__main__.Sentence at 0x7f0b09718c10>,
 <__main__.Sentence at 0x7f0b25917090>,
 <__main__.Sentence at 0x7f0b0abbea90>,
 <__main__.Sentence at 0x7f0b09718f10>,


In [81]:
sent_rep = pat.notes[0].sentence_reps[4]
sent_rep.txt

"Potassium up slightly.'"