<a href="https://colab.research.google.com/github/mnaylor5/quantifying-explainability/blob/master/notebooks/ecco_bigbird.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformer Classifier Explainability with Ecco
This notebook uses the `ecco` library to perform non-negative matrix factorization on neuron activations within a transformer model. The model we're using is a BigBird model fine-tuned on a mortality prediction task within simulated admission notes from MIMIC discharge summaries.

Links:
* [Ecco](https://github.com/jalammar/ecco) library, along with [Alammar's blog post](https://jalammar.github.io/explaining-transformers/) on this topic. Contains relevant sections on neuron activations and factor analysis
* [Fine-tuned Big Bird](https://hf.co/mnaylor/bigbird-base-mimic-mortality) on 🤗 's model hub

In [None]:
%%capture
# sentencepiece library required for bigbird tokenizer
!pip install ecco sentencepiece

In [None]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer 
import ecco 
import pandas as pd

Load the models

In [None]:
MODEL_NAME = "mnaylor/bigbird-base-mimic-mortality"

In [None]:
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSequenceClassification.from_pretrained(MODEL_NAME)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=839.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=845731.0, style=ProgressStyle(descripti…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=775.0, style=ProgressStyle(description_…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1246.0, style=ProgressStyle(description…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=512362459.0, style=ProgressStyle(descri…




In [None]:
shorter_example = '''
CHIEF COMPLAINT: s/p fall

PRESENT ILLNESS: 77 yo male on Coumadin for DVT who fell with LOC and unresponsive. Initially appeared with GCS of 5 and intubated. Was given Vit K 10mg, hypertonic saline , Dilantin 1gm, Propofol @ 10mcg. He was transferred to ER where Neurosurgery was consulted.
'''


Since BigBird isn't officially supported in the `ecco` library, we need to create the LM object manually. Since the overall paradigm is conceptually similar, I use `bert-base-uncased` as the model name to pass the internal check with the config file. 

In [None]:
lm_kwargs = {
        'model_name': 'bert-base-uncased',
        'collect_activations_flag': True,
        'collect_activations_layer_nums': None, # `None` returns activations for all layers
        'verbose': False,
        'gpu': None}

In [None]:
ecco_lm = ecco.LM(model.bert, tokenizer, **lm_kwargs)

In [None]:
ecco_out = ecco_lm(tokenizer(shorter_example, return_tensors='pt'))

Attention type 'block_sparse' is not possible if sequence_length: 90 <= num global tokens: 2 * config.block_size + min. num sliding tokens: 3 * config.block_size + config.num_random_blocks * config.block_size + additional buffer: config.num_random_blocks * config.block_size = 704 with config.block_size = 64, config.num_random_blocks = 3.Changing attention type to 'original_full'...


In [None]:
nmf1 = ecco_out.run_nmf(n_components=4)

In [None]:
nmf1.explore()

<IPython.core.display.Javascript object>

Now use a longer example that includes all of the sections in the admission notes.

In [None]:
example_texts = [
    '''
    CHIEF COMPLAINT: s/p fall

    PRESENT ILLNESS: 77 yo male on Coumadin for DVT who fell at CVS with LOC and unresponsive on [**11-15**]. Initially brought to [**Hospital1 777**] with GCS of 5 and intubated. Was given Vit K 10mg, hypertonic saline , Dilantin 1gm, Propofol @ 10mcg. He was transferred to [**Hospital1 22**] ER where Neurosurgery was consulted. Upon arrival INR 3.0. FFP started, and emergent bolt placed.

    MEDICAL HISTORY: DVT

    MEDICATION ON ADMISSION: Coumadin

    ALLERGIES: No Drug Allergy Information on File

    PHYSICAL EXAM: On Admission: O: T: 97  BP: 128/68     HR: 59   R 16      O2Sats 98 / intubated Gen: Intubated and sedated; Right facial and periorbital ecchymosis./ c-collar in place/ Neuro: Upon arrival, pt was intubated and sedated on Propofol. No eye opening to sternal rub. No movement of extremities to noxious stimulation. Pupils were 2 to 1.5 bilaterally. Trace corneal reflex with trace pupillary response. GCS=3T on sedation.  Neuro exam repeated after pt returned from CT scan.  Note: while in scanner, pt had some spontaneous upper ext. movement. On exam: Pt without eye opening to voice or noxious stim. ? follow commands ->moved feet x 2 to request of wiggle toes. Pt localizes briskly with RUE, Extensor postures with LUE, w/d's b/l LE's  to noxious, no clonus. Pt lightly chewing on ETT. GCS= E=1, M=5, V=1T = 7T HEENT: + Raccoon's sign, Neg battles sign, negative hemotympanum, negative CSF rhinorrhea/otorrhea. Neck: In collar. Lungs: clear / ? R lower lobe rhonchi Cardiac: s1 s2 RRR Abd: Soft Extrem: Warm and well-perfused / no edema.

    FAMILY HISTORY: Non-contributory

    SOCIAL HISTORY: Married, resides at home with wife.
    ''',
    
    # break
    
    '''
    CHIEF COMPLAINT: hypoxia

    PRESENT ILLNESS: 85 yo Man w/ HTN, sick sinus s/p PPM, multiple GIB, Afib, prostate ca, MGUS, 2+ AI, CRF, high ammonia (w/o liver disease), who was discharged from hospital with VRE bacteremia and fungemia (likely due to infected midline) on linezolid and fluconazole for 14 day course, who presents obtunded from his NH with hypoxia to 60%. He was placed on NRB, given lasix 60mg iv and 1 inch of nitropaste with resultant O2 sat 70-80% on transfer by EMS. Per EMS to ER resident, his NH reported him full code. On arrival to the ER he was making respiratory effort but had very minimal air movement. He was intubated on AC 500*20, peep 5, fio2 1.0, L EJ was placed for access and the patient has a PICC in place. He was given levofloxacin and aspirin.  He was placed on a bair hugger for hypothermia. His vitals were otherwise unremarkable. KUB showed a large amount of bowel gas. CXR was fairly unremarkable. He was on a minimal amount of versed for sedation and was minimally responsive. He was guaiac positive. . The patient's family was informed of his transfer and came to the ER to find him intubated, stating that he was DNR/DNI. After discussion with the resident they decided to keep him intubated for the night and to revisit this in the MA, but to maintain his DNR status.  He was subsequently found to be hypotensive with SBPs in the 70s. Per telephone discussion between the ER resident and the patient's daughter, the family declines central line and declines pressors. He was admitted to the ICU for furhter care and management. . Note that on prior admission the pt also had ARF with urinary retention of 400cc, which resolved with placement of foley catheter. Baseline MS is to be sleepy most of day and respond to questions appropriately, moments of clarity where recognizes family. . ROS: unable to perform given intubated/sedated

    MEDICAL HISTORY: 1. Prostate cancer dx - maintained on lupron (no surgery/xrt). 2. Hypertension 3. Aortic insufficiency (2+). 3. Paroxysmal atrial fibrillation (not on anticoagulation due to many GIBs) 4. Sick sinus syndrome s/p PPM for symptomatic bradycardia 5. Iron deficiency anemia/ anemia of chronic disease 6. Chronic Renal Failure 7. Pulmonary Hypertension (TTE PASP 38mmhg) 8. Secondary hyperparathyroidism (low 25-hydroxyvitamin D, s/p tx) 9. MGUS, IgG monoclonal gammopathy 10. s/p GSW with retained pleural fragment 11. s/p pacemaker placement. 12. Severe bilateral DJD of the knees 13. Gout 14. Refractory UGIB from jejunal AVMs and  duodenal ulcers 15. Encephalopathy and hyperammonemia without evidence of hepatic dysfunction.

    MEDICATION ON ADMISSION: 1. Lactulose 30 mL PO Q 6 hours 2. Calcitriol 0.25 mcg PO QD 3. Atorvastatin 10 mg PO QD 4. Pantoprazole 40 mg PO QD 5. Donepezil 5 mg PO QHS 6. Fluticasone (intranasal) 7. Metoprolol Tartrate 50 mg PO TID 8. Amlodipine 5 mg PO QD 9. Tylenol PRN 10. Ipratropium Bromide Q6 PRN 11. Fluconazole 200 mg PO Q24H 12. Albuterol Q6 PRN 13. Linezolid 600 mg PO Q12H 14. nephrocaps 1 po qday

    ALLERGIES: Percocet / Simvastatin

    PHYSICAL EXAM: Vitals:  cannot read temp,  60,  96/44, 100% on AC 500*20, peep 5, fio2 1.0. General:  appears uncomfortable, opens eyes to stimulation HEENT:  pupils sluggish but reactive Neck:  R EJ in place Chest/CV:  RRR, s1s2, decreased heart sounds Lungs:  CTAB Abd:  soft, nt, nd, +bs Rectal:  per ER guaiac positive Ext:  2+ pitting edema BLE

    FAMILY HISTORY: noncontributory

    SOCIAL HISTORY: living at rehab
    ''',
    
    # break
    
    '''
    CHIEF COMPLAINT: altered mental status, fevers

    PRESENT ILLNESS: 89 year old F nursing home resident, s/p hospitalization for influenza a, has been on O2 since with baseline RA sats 88% and 92% on 2l NC.  She presented to the hospital with altered mental status and fevers.  She is very demented at baseline but is verbal.  On the morning of admission, she had a small amount of her usual breakfast but was very lethargic and according to her nurses, may have aspirated some of her meal.  VS there were T 103.8. BP 141/94 HR 99 O2 Sat 87% on 2LNC.  When EMS arrived, her VS were 92/68 HR 95 (irregular) RR 42 Sat 94% on BVM 100%. . In the ED, she was noted to also be febrile so code sepsis was initiated.  Her code status was confirmed to be full code.  A left subclavian line was placed, and IV normal saline were administered.  Intravenous ceftriaxone, vancomycin, levaquin and clindamycin were administered.  Levophed was also started. Prior to transfer to the floor her SVO2 was 71%. . In the MICU, the levophed was weaned.  The patient was maintained on vanco and zosyn.  CXR showed a flourishing RUL PNA.  Her mental status improved on time of transfer.

    MEDICAL HISTORY: Alzheimer's Depression Hypernatremia Paroxymal Afib h/o Urinary tract infections Cholelithiasis h/o Influenza A/b

    MEDICATION ON ADMISSION: Aricept 10mg daily Vitamin E 80u qPM Zyprexa 5mg qPM Namenda 5mg qPM Tylenol 650 supp q6:prn

    ALLERGIES: Patient recorded as having No Known Allergies to Drugs

    PHYSICAL EXAM: VS: Tm 103.9 Tc BP 117/49 (88-121/34-49) HR 84 RR 30 Sat 100% NC GEN: Elderly asian woman in bed sedate and difficult to arouse, breathing comfortably.  Daughters at bedside. HEENT: Dry MM, eyes closed, no scleral icterus. NECK: Supple, no masses CV: Irregular, normal s1/s2 PUL: Coarse upper airway sounds ABD: Diffuse ttp, +BS, no rebound or guarding. EXT: No edema NEURO: Sedated, arousable, but non-verbal with eyes closed.

    FAMILY HISTORY: N/A

    SOCIAL HISTORY: Permanent resident of nursing home. Son and daughter active in her life and visit daily
    ''',
    
    # break
    
    '''
    CHIEF COMPLAINT: Chest discomfort with exertion referred for cardiac catheterization.

    PRESENT ILLNESS: 65 y.o male with chest discomfort on exertion, was referred for cardiac catheterization. Catheterization report showed an EF of 60%, no MR LAD:40-50%, LCX: 90%, RCA 100%. He was then referred to for cardiac surgery.

    MEDICAL HISTORY: PMH: DM type 2, AF, DJD, anxiety, arthritis. PSH:L TKR, Cervical fusion C2-4, Tonsillectomy, Rt hernia repairx2, Lipoma removal rt chest',

    MEDICATION ON ADMISSION: Zetia 10mg daily Celexa 20mg daily Digoxin 0.25mg daily Metformin 500mg Avodart 0.5mg daily Glyburide 2.5mg Verapamil 120mg QPM Lipitor 40mg bedtime Doxazosin 4mg daily Coumadin 10mg 3X/wk Coumadin 15mg 4X/wk ASA 81mg daily SL Nitro PRN

    ALLERGIES: Simvastatin

    PHYSICAL EXAM: Admission Physical Exam Pulse:64, Resp: 18, BP R: 143/77 L:132/91 Height: 5'8", Wgt: 235lbs General: NAD Skin: Unremarkable well healed scar R chest HEENT: Unremarkable, glasses Chest: Lungs CTA bilat Heart: RRR Abdomen: Obese, benign Extremities: Well-perfused, no edema Varicosities: None Neuro: None focal Pulses: Femoral, BP, PT Radial equal bilaterally +2 Carotid bruit: none bilaterally

    FAMILY HISTORY: Brother died from MI at age 68 Father had an MI in his early 60s Sister had a stroke at age 58.

    SOCIAL HISTORY:  Tobacco: Quit ppdX33yrs ETOH: 5 drinks/week
    '''
]

example_labels = [1, 1, 0, 0]

sample = pd.DataFrame(zip(example_texts, example_labels), columns=['text', 'hospital_expire_flag'])

In [None]:
ecco_out = ecco_lm(tokenizer(sample['text'].iloc[0], return_tensors='pt'))
nmf2 = ecco_out.run_nmf(n_components=12)
nmf2.explore()

<IPython.core.display.Javascript object>

We can also do this for the first layer only. According to Alammar, the first layer of Transformer models tend to do much of the early filtering and prioritization of tokens to be used in later representations.

In [None]:
nmf3 = ecco_out.run_nmf(n_components=12, from_layer=0, to_layer=1)
nmf3.explore()

<IPython.core.display.Javascript object>