# Chapter 12

# 12.3.4. Handling abbreviations and acronyms

Using abbreviation dictionaries

In [1]:
import re
import string
# Define an abbreviation dictionary
abbreviation_dict = {
    "BP": "blood pressure",
    "HR": "heart rate",
    "ECG": "electrocardiogram",
    "AML": "acute myeloid leukemia"
}

# Function to expand abbreviations
def expand_abbreviation(text, abbr_dict):
    # Tokenize text using regex to separate words and punctuation
    tokens = re.findall(r'\b\w+\b|[^\s\w]', text)  # Keeps punctuation as separate tokens
    # Expand abbreviations using the dictionary
    expanded_tokens = [abbr_dict.get(token, token) for token in tokens]
    # Reconstruct the text with spaces
    return ''.join(token if token in string.punctuation else ' ' + token for token in expanded_tokens).strip()

# Example biomedical text
text = "Patient has elevated BP and HR; ECG is abnormal."

# Expand abbreviations
expanded_text = expand_abbreviation(text, abbreviation_dict)
print(expanded_text)

Patient has elevated blood pressure and heart rate; electrocardiogram is abnormal.


Contextual disambiguation

In [2]:
from transformers import pipeline

# Load a pre-trained model for masked language modeling
fill_mask = pipeline("fill-mask", model="bert-base-uncased", device='mps')

# Example sentences
sentences = [
    "AML is a type of cancer affecting the blood and bone marrow.",
    "He is researching advanced machine learning (AML) techniques."
]

# Disambiguate "AML" using context
for sentence in sentences:
    print(f"Original: {sentence}")
    masked_sentence = sentence.replace("AML", "[MASK]")
    predictions = fill_mask(masked_sentence)
    print("Predictions for '[MASK]':")
    for pred in predictions[:3]:  # Show top 3 predictions
        print(f"  {pred['sequence']} ({pred['score']:.4f})")
    print()

BertForMaskedLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
  - If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
  - If you are not the owner of the model architecture class, please contact the model code owner to update it.
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another archite

Original: AML is a type of cancer affecting the blood and bone marrow.
Predictions for '[MASK]':
  it is a type of cancer affecting the blood and bone marrow. (0.8779)
  this is a type of cancer affecting the blood and bone marrow. (0.0432)
  cancer is a type of cancer affecting the blood and bone marrow. (0.0287)

Original: He is researching advanced machine learning (AML) techniques.
Predictions for '[MASK]':
  he is researching advanced machine learning ( cad ) techniques. (0.1885)
  he is researching advanced machine learning ( ai ) techniques. (0.1468)
  he is researching advanced machine learning ( ada ) techniques. (0.0763)



Rule-based contextual expansion

In [3]:
# Rule-based context-aware abbreviation expansion
def rule_based_expansion(text):
    if "BP" in text and "mmHg" in text:
        text = text.replace("BP", "blood pressure")
    if "AML" in text and "leukemia" in text:
        text = text.replace("AML", "acute myeloid leukemia")
    return text

# Example text
clinical_text = "BP was recorded as 120/80 mmHg. The leukemia diagnosis was confirmed as AML type."

# Apply rule-based expansion
expanded_text = rule_based_expansion(clinical_text)
print(expanded_text)

blood pressure was recorded as 120/80 mmHg. The leukemia diagnosis was confirmed as acute myeloid leukemia type.
