# Characterization of Stigmatizing Language in Medical Records

This notebook serves as a quickstart guide for interacting with our stigmatizing language characterization toolkit. As you move through the notebook, you will learn not only *how* to characterize stigmatizing language in medical records, but also *why* we should care about stigmatizing language in medical records.

## Motivation

Widespread disparities in clinical outcomes exist between different demographic groups in the United States. A new line of work in [medical sociology](https://pubmed.ncbi.nlm.nih.gov/34130567/) has demonstrated physicians often use stigmatizing language in electronic medical records within certain groups, such as black patients, which may exacerbate disparities. This documentation practice may not only negatively frame patients to future providers and thus influence their quality of care, but also discourage patients from seeking treatment [altogether](https://rdcu.be/dfbrc).

A major challenge with stigmatizing language in medical records is that it is frequently invoked unconsciously, reflecting an internalized bias by the healthcare provider. The first step in confronting this bias is highlighting it for the provider.

## Prior Work

The idea of using natural language processing to characterize stigmatizing language in medical records is relatively new. Most prior work has relied on using simple [word counts](https://rdcu.be/dfbvx). The main shortcoming with such analyses is that they lack an ability to discriminate between genuine and innocuous cases of stigmatizing language (e.g., "aggressive behavior" vs. "aggressive treatment regimen"). 

[Sun et al.](https://doi.org/10.1377/hlthaff.2021.01423) was the first to use machine learning to better characterize instances of stigmatizing language in medical records. Although it was a step forward, their system was limited to predicting 3 sentiment-like classes (i.e., positive, negative, out-of-context) and relied on non-neural, non-contextual models.

## Prerequisites

To ensure a smooth experience, we must first make sure you have access to the necessary resources and computation environment.

### Compute Environment

Our toolkit was developed and tested using Python 3.10. We cannot guarantee that other versions of Python will support the entirety of our codebase. That said, we expect the majority of functionality to be preserved as long as you are using Python >= 3.7.

We **strongly** recommend using a virtual environment manager (e.g., `conda`) when working with this codebase. This will help limit unintended consequences that arise due to e.g., dependency upgrades. The `conda` [documentation](https://conda.io/projects/conda/en/latest/user-guide/getting-started.html) provides all the information you need to set up your first environment.

Once your environment has been created and activated, you can install the `stigma` toolkit with a single command (executed from the root of the repository):

```
pip install -e .
```

This command will install all external dependencies, as well as the `stigma` package itself. It is *extremely important* to keep the `-e` environment flag, as it will ensure default data and model paths are preserved.

### Data

To complete this tutorial, no external data (e.g., MIMIC) is required. However, if you are interested in training your own models, you will need to download the [MIMIC-IV (v2.2)](https://physionet.org/content/mimiciv/2.2/) and [MIMIC-IV-Notes (v2.2)](https://physionet.org/content/mimic-iv-note/2.2/) datasets. Access to both of these resources requires IRB training, authenticated credentials, and the signing of a data usage agreement. If all of these criteria are satisfied, you can download the minimally necessary files to train new models or reproduce our past experiments using `bash scripts/acquire/get_mimic.sh`.

### Models

Although you *do not* need access to MIMIC to complete this tutorial, you *do* need access to our pretrained stigmatizing language classifiers. These can be acquired from PhysioNet after completing the same requirements necessary to access MIMIC data. If you already have access to MIMIC, downloading our models should only require you sign our [data usage agreement](TBD).

We have opted to keep our models behind a gate for a few reasons. First, although we do not expect our training procedure to encode sensitive information regarding the MIMIC dataset, the risk is nonzero and worth respecting. Furthermore, if we release models in the future which do allow end-users to extract sensitive information, existing end-uers will be able to acquire them seemlessly. Finally, by requiring end-users to complete IRB training prior to accessing our models, we can limit the risk of malevolent use.

With that said, note that our models can be acquired through two methods. The first (recommended) method requires running a single command `bash scripts/acquire/get_models.sh` and entering your PhysioNet credentials when prompted. This script will download the most up-to-date models and place them in the appropriate directories. If for some reason you encounter issues downloading our models using the `bash` script, you can do so through your [web browser](TBD). Upon downloading the zip file hosted on PhysioNet, please unzip the contents and place them within the `data/resources/` directory. If all goes as expected, you should now have a `data/resources/models/` folder.

## Imports

We begin by importing the two primary modules for interfacing easily with our stigmatizing language models.

The `StigmaSearch` module uses regular expressions to identify anchors (i.e., keywords) which are commonly used in a stigmatizing manner. The module can also be used to prepare data for input into downstream models.

The `StigmaBaselineModel` and `StigmaBertModel` modules act wrappers to our classifiers, each trained to characterize a different type of stigmatizing language. The `StigmaBertModel` module supports both CPU and GPU inference. Neural models leverage the `torch` and `transformers` libraries, while non-neural models leverage `scikit-learn` and `numpy`.

In [1]:
## Notebook Imports
from IPython.display import display

## Environment
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"

## API Imports
from stigma import settings
from stigma import StigmaSearch
from stigma import StigmaBaselineModel, StigmaBertModel

## Examples of Stigmatizing Language

Below, we have created some realistic examples of stigmatizing language usage in an electronic medical record. Although the examples below are short, we note that our toolkit is designed to be applied to full clinical notes without substantial preprocessing. Our models operate on normalized versions of text and do not require any additional metadata regarding the note. There are, however, some important caveats.

#### Limitation 1: Reliance on Predefined Keywords

Our models are trained to characterize stigmatizing language within statements containing one of our predefined stigmatizing keywords. The full list of keywords is included in `data/resources/keywords/keywords.json`. Experiments suggest that our current models *do not* generalize well to statements containing keywords not seen at training time, or those not containing a stigmatizing keyword altogether. If you are interested in making inferences for statements lacking a keyword or containing a new keyword, we highly recommend you annotate additional data and retrain the existing classifiers with the augmented dataset.

#### Limitation 2: Robustness to Domain Shift

In our ACL paper, ["Characterization of Stigmatizing Language in Medical Records"](resources/ACL2023.pdf), we conduct a suite of domain transfer experiments. We evaluate transfer between the public MIMIC-IV dataset and a private dataset consisting of records from our research institution. We observe statistically significant drops in performance between transferring between the datasets. A qualitative error analysis suggests that this arises due to differences in the joint keyword-label distributions for each dataset, as well as speciality-specific nuances in the usage of language.

In [2]:
examples = [
    """
    Despite my best advice, the patient remains adamant about leaving the hospital today. 
    Social services is aware of the situation.
    """,

    """
    Patient Doe remains lethargic and slow-moving. They insist that they have adhered to a 
    'drug-free lifestyle', though blood tests suggest otherwise.
    """,
    
    """
    Miss Doe is a charming, 73 year old women who visits us today with a chief complaint 
    of heart pain. Unfortunately, not a good historian.
    """
]

## Identifying Stigmatizing Language

The first step to characterizing stigmatizing language in the EHR is finding it. As mentioned above, we use a set of predefined keywords which were curated by domain-experts to identify candidate instances of stigmatizing language.

In the cell below, we initialize our search module and apply it to the examples enumerated above. 

You have the option of changing the `context_size` parameter in `StigmaSearch`. This value dictates the number of words to the left and right, respectively, that are maintained as context for the stigmatizing keyword. Our models are trained using a context size of 10, a value chosen based on guidance from our clinical collaborators. Future work may examine the optimality of this parameter choice.

In [3]:
## Initialize Search Wrapper
search_tool = StigmaSearch(context_size=10)

## Run Search
search_results = search_tool.search(examples)

## Show Results
_ = display(search_results)

Unnamed: 0,document_id,keyword_category,start,end,keyword,text
0,0,adamant,49,56,adamant,"despite my best advice, the patient remains ad..."
1,1,adamant,57,63,insist,patient doe remains lethargic and slow-moving....
2,1,compliance,79,86,adhered,doe remains lethargic and slow-moving. they in...
3,2,other,19,27,charming,"miss doe is a charming, 73 year old women who ..."
4,2,other,136,145,historian,"of heart pain. unfortunately, not a good histo..."


## Getting Ready to Make Inferences

If all ran as expected, you should see a pandas DataFrame above containing candidate instances of stigmatizing language drawn from the previously defined examples. The DataFrame contains 6 columns:

* `document_id`: The integer index of the example passed to the search tool.
* `keyword_category`: The potential class of stigmatizing language. Our current models operationalize these classes as separate modeling tasks. We include a taxonomy of the tasks below.
* `start`: An integer indicating the starting character of the matched keyword within the original document.
* `end`: An integer indicating the ending character of the matched keyword within the original document.
* `text`: A snippet of text which makes up the "context" for the keyword. The length of this text will depend on the `context_size` parameter initialized within the `StigmaSearch` class.

We can re-use the `StigmaSearch` module to extract the inputs which will be passed to our machine learning classifiers. The `format_for_model()` method ingests the results DataFrame, along with a desired keyword category, and outputs document IDs, keywords, and text windows. Each of these items is formatted as a list.

## Types of Stigmatizing Language (Task Taxonomy)

We formulate three independent classification tasks that discriminate between instances of bias based on their impact. Note that we use the "keyword category" field as shorthand to reference each modeling task. Expanded descriptions of the tasks and their respective classes can be found in Table 4 of [our paper](resources/ACL2023.pdf).

| Task | Keyword Category | Classes | Description |
|------|---------------------------|---------|-------------|
| Credibility & Obstinacy | "adamant" | Disbelief, Difficult, Exclude | Insinuation of doubt regarding a patient's testimony or describes the patient as obstinate.|
| Compliance | "compliance" | Negative, Neutral, Positive | Patient does not appear to follow medical advice.|
| Descriptors | "other" | Negative, Neutral, Positive, Exclude | Evaluates descriptions of patient behavior and demeanor.|

## Model Setup

Once we have decided which type of stigmatizing language we are interested in characterizing, we can initialize the `StigmaBaselineModel` or `StigmaBertModel` class with one of our pretrained models. There are a few important arguments to be aware of:

* `model`: A string identifier indicating which model to load. You can see the list of default models using `StigmaBaselineModel.show_default_models()` or `StigmaBertModel.show_default_models()`. Alternatively, you can pass a full file-path to a pretrained model.
* `tokenizer` (`StigmaBertModel` only): If you are passing a non-default model, you should include a path to the tokenizer associated with the model. Under the hood, this loads a Hugging Face tokenizer.
* `preprocessing_params` (`StigmaBaselineModel` only): Path to a "preprocessing_params.joblib" file. Generated at root of output model directory during baseline model training. Only necessary if not using one of the default models.
* `keyword_category`: A string indicating which of the tasks from above you are interested in.
* `batch_size`: An integer indicating how many instances (context windows) will be processed in each batch. This will depend on your compute environment (e.g., RAM or vRAM).
* `device` (`StigmaBertModel` only): Either "cpu" or "cuda" depending on whether you want to use a GPU to accelerate inference.


In [4]:
## See Keywords and Their Categories
_ = StigmaSearch.show_default_keyword_categories()

{
 "adamant": [
  "adamant",
  "adamantly",
  "adament",
  "adamently",
  "claim",
  "claimed",
  "claiming",
  "claims",
  "insist",
  "insisted",
  "insistence",
  "insisting",
  "insists"
 ],
 "compliance": [
  "adherance",
  "adhere",
  "adhered",
  "adherence",
  "adherent",
  "adheres",
  "adhering",
  "compliance",
  "compliant",
  "complied",
  "complies",
  "comply",
  "complying",
  "declined",
  "declines",
  "declining",
  "nonadherance",
  "nonadherence",
  "nonadherent",
  "noncompliance",
  "noncompliant",
  "refusal",
  "refuse",
  "refused",
  "refuses",
  "refusing"
 ],
 "other": [
  "aggression",
  "aggressive",
  "aggressively",
  "agitated",
  "agitation",
  "anger",
  "angered",
  "angers",
  "angrier",
  "angrily",
  "angry",
  "argumentative",
  "argumentatively",
  "belligerence",
  "belligerent",
  "belligerently",
  "charming",
  "combative",
  "combatively",
  "confrontational",
  "cooperative",
  "defensive",
  "delightful",
  "disheveled",
  "drug seeking"

In [5]:
## See Available Models
print(">> Baseline (Non-BERT) Models")
_ = StigmaBaselineModel.show_default_models()
print(">> BERT Models")
_ = StigmaBertModel.show_default_models()

>> Baseline (Non-BERT) Models
{
 "mimic-iv-discharge_majority_overall": {
  "model_type": "baseline",
  "preprocessing": "/Users/kharrigian/Dev/research/johns-hopkins/ehr-stigma/data/resources/models/mimic-iv-discharge_baseline-majority/preprocessing.params.joblib",
  "tasks": {
   "adamant": "/Users/kharrigian/Dev/research/johns-hopkins/ehr-stigma/data/resources/models/mimic-iv-discharge_baseline-majority/keyword/majority/adamant_fold-0",
   "compliance": "/Users/kharrigian/Dev/research/johns-hopkins/ehr-stigma/data/resources/models/mimic-iv-discharge_baseline-majority/keyword/majority/compliance_fold-0",
   "other": "/Users/kharrigian/Dev/research/johns-hopkins/ehr-stigma/data/resources/models/mimic-iv-discharge_baseline-majority/keyword/majority/other_fold-0"
  }
 },
 "mimic-iv-discharge_majority_keyword": {
  "model_type": "baseline",
  "preprocessing": "/Users/kharrigian/Dev/research/johns-hopkins/ehr-stigma/data/resources/models/mimic-iv-discharge_baseline-statistical/preprocessi

In [6]:
## Choose Keyword Category
keyword_category = "other"

## Prepare Inputs for the Model
example_ids, example_keywords, example_text = search_tool.format_for_model(search_results=search_results,
                                                                           keyword_category=keyword_category)

print(f"Example IDs: {example_ids}")
print(f"Example Keywords: {example_keywords}")
print(f"Example Text: {example_text}")

## Initialize Baseline Models
majority_model = StigmaBaselineModel(model="mimic-iv-discharge_majority_overall",
                                     keyword_category=keyword_category,
                                     batch_size=32)
keyword_context_model = StigmaBaselineModel(model="mimic-iv-discharge_logistic-regression_keyword-context",
                                            keyword_category=keyword_category,
                                            batch_size=32)

## Initialize BERT Model (Note the alternative option for specifying model parameters)
bert_model = StigmaBertModel(model=settings.MODELS["mimic-iv-discharge_clinical-bert"]["tasks"][keyword_category],
                             tokenizer="emilyalsentzer/Bio_ClinicalBERT",
                             keyword_category=keyword_category,
                             batch_size=8,
                             device="cpu")

Example IDs: [2, 2]
Example Keywords: ['charming', 'historian']
Example Text: ['miss doe is a charming, 73 year old women who visits us today with a', 'of heart pain. unfortunately, not a good historian.']
[Loading Model Parameters]
[Initializing Model Architecture]


Some weights of the model checkpoint at emilyalsentzer/Bio_ClinicalBERT were not used when initializing BertModel: ['cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[Initializing Model Weights]


## Making Inferences

Once the model has been initialized and your data has been prepared, you are ready to characterize stigmatizing language!

Our API makes it as simple as calling a `predict()` method on the list of text and keywords extracted above. The output is a DataFrame with one row per input instance. Probabilities for each class of the respective task are shown in each column. To make a single class prediction, you can use the pandas `idxmax` argument to extract the argmax label.

In [7]:
## Iterate Through Models
for model_name, model in zip(["Majority Overall","Keyword + Context","Clinical BERT"],
                             [majority_model, keyword_context_model, bert_model]):

    print(f">> Making Predictions Using {model_name} Classifier")

    ## Run Prediction Procedure
    predictions = model.predict(text=example_text,
                                keywords=example_keywords)

    ## Augment Predictions with Inputs
    predictions["argmax"] = predictions.idxmax(axis=1)
    predictions["context"] = example_text
    predictions["keyword"] = example_keywords

    ## Show Predictions
    _ = display(predictions)

>> Making Predictions Using Majority Overall Classifier
[Extracting Negated Keywords]
[Tokenizing Text Data]
[Learning N-Grams within Vocabulary (Tokens w/o Keyword)]
[Applying Phrase Transformation 1/2
[Rephrasing]: 100%|██████████| 2/2 [00:00<00:00, 10094.59it/s]
[Applying Phrase Transformation 2/2
[Rephrasing]: 100%|██████████| 2/2 [00:00<00:00, 5829.47it/s]
[Learning N-Grams within Vocabulary (Tokens w/ Keyword)]
[Applying Phrase Transformation 1/2
[Rephrasing]: 100%|██████████| 2/2 [00:00<00:00, 9137.92it/s]
[Applying Phrase Transformation 2/2
[Rephrasing]: 100%|██████████| 2/2 [00:00<00:00, 11305.40it/s]


[Making Predictions]: 100%|██████████| 1/1 [00:00<00:00, 125.36it/s]


Unnamed: 0,neutral,positive,exclude,negative,argmax,context,keyword
0,0.042567,0.174198,0.222004,0.561231,negative,"miss doe is a charming, 73 year old women who ...",charming
1,0.042567,0.174198,0.222004,0.561231,negative,"of heart pain. unfortunately, not a good histo...",historian


>> Making Predictions Using Keyword + Context Classifier
[Extracting Negated Keywords]
[Tokenizing Text Data]
[Learning N-Grams within Vocabulary (Tokens w/o Keyword)]
[Applying Phrase Transformation 1/2
[Rephrasing]: 100%|██████████| 2/2 [00:00<00:00, 9300.01it/s]
[Applying Phrase Transformation 2/2
[Rephrasing]: 100%|██████████| 2/2 [00:00<00:00, 9709.04it/s]
[Learning N-Grams within Vocabulary (Tokens w/ Keyword)]
[Applying Phrase Transformation 1/2
[Rephrasing]: 100%|██████████| 2/2 [00:00<00:00, 10472.67it/s]
[Applying Phrase Transformation 2/2
[Rephrasing]: 100%|██████████| 2/2 [00:00<00:00, 9565.12it/s]


[Making Predictions]: 100%|██████████| 1/1 [00:00<00:00, 89.94it/s]


Unnamed: 0,neutral,positive,exclude,negative,argmax,context,keyword
0,0.018344,0.885651,0.037252,0.058752,positive,"miss doe is a charming, 73 year old women who ...",charming
1,0.237639,0.07132,0.063075,0.627966,negative,"of heart pain. unfortunately, not a good histo...",historian


>> Making Predictions Using Clinical BERT Classifier
[Running Evaluation]: 100%|██████████| 1/1 [00:00<00:00,  4.68it/s]


Unnamed: 0,negative,exclude,neutral,positive,argmax,context,keyword
0,0.001035,0.000714,0.000712,0.997539,positive,"miss doe is a charming, 73 year old women who ...",charming
1,0.988215,0.005566,0.005839,0.000379,negative,"of heart pain. unfortunately, not a good histo...",historian


## Concluding Thoughts

If you reached this point, you should now be ready to start using our models! Congratulations!

This notebook and our high-level API is intended to provide an easy entry point to characterizing stigmatizing language in medical records. If you are interested in developing custom solutions, we encourage you to read through and try out some of the code in the `scripts/` and `stigma/` directories. The API you interacted with today only supports a subset of functionality that is available throughout the broader package.

If you have ideas for improving the toolkit or encounter any issues with our code, please feel free to submit an issue on our GitHub page or contact [Keith Harrigian](mailto:kharrigian@jhu.edu).