# Overview
One of the most powerful features of spaCy is the ability to add [custom attributes and extensions](https://spacy.io/usage/processing-pipelines#custom-components-attributes) to `Doc`, `Span`, and `Token` classes. These extensions are stored in the underscore attribute (ie., `token._`). This allows us to store custom data and implement custom methods which are useful to medspaCy while still using the spaCy API.

MedspaCy adds a number of methods to the underscore attribute for each class. This notebook will walk through what these extensions are and how they can be used.

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import sys

In [3]:
sys.path.insert(0, "..")

## Set up example data
First, we'll load a pipeline and set up a simple example of text to process with some entities.

In [4]:
import medspacy
from medspacy.target_matcher import TargetRule

In [5]:
nlp = medspacy.load(enable=["pyrush", "target_matcher", "context", "sectionizer"])

In [6]:
target_rules = [
    TargetRule("afib", "CONDITION"),
    TargetRule("pneumonia", "CONDITION", pattern=r"community[- ]acquired pneumonia"),
    TargetRule("acute stroke", "CONDITION")
]

In [7]:
nlp.get_pipe("medspacy_target_matcher").add(target_rules)



In [8]:
text = """Past Medical History: Afib and community-acquired pneumonia.
Assessment/Plan: Acute stroke
"""

In [9]:
doc = nlp(text)

  matches = self.matcher(doc)
  matches = self.matcher(doc)


In [10]:
doc.ents

(Afib, community-acquired pneumonia, Acute stroke)

# All extensions
You can get a dict containing the extension names and default values or getter/setters for each by the top-level `get_extensions` method:

In [11]:
medspacy.get_extensions()

{'Token': {'window': {'method': <function medspacy._extensions.get_window_token(token, n=1, left=True, right=True)>},
  'section': {'default': None},
  'section_span': {'getter': <function medspacy._extensions.get_section_span_token(token)>},
  'section_category': {'getter': <function medspacy._extensions.get_section_category_token(token)>},
  'section_title': {'getter': <function medspacy._extensions.get_section_title_span_token(token)>},
  'section_body': {'getter': <function medspacy._extensions.get_section_body_span_token(token)>},
  'section_parent': {'getter': <function medspacy._extensions.get_section_parent_token(token)>},
  'section_rule': {'getter': <function medspacy._extensions.get_section_rule_token(token)>}},
 'Span': {'window': {'method': <function medspacy._extensions.get_window_span(span, n=1, left=True, right=True)>},
  'context_attributes': {'getter': <function medspacy._extensions.get_context_attributes(span)>},
  'any_context_attributes': {'getter': <function medsp

In the rest of the notebook, we'll walk through the 3 classes and show each of the extensions.

# I. Doc

In [12]:
medspacy.get_doc_extensions().keys()

dict_keys(['sections', 'section_titles', 'section_categories', 'section_spans', 'section_parents', 'section_bodies', 'get_data', 'data', 'ent_data', 'section_data', 'doc_data', 'context_data', 'to_dataframe'])

## Sections
The only default `Doc` extensions relate to the sections of the doc which are identified by the `Sectionizer` class.

#### `doc._.sections`
A list of named tuples representing the different sections in a doc. Each tuple contains:
- `category`: A string representing the normalized name of the section
- `title_span`: The Span of the Doc which begins the section header
- `body_span`: The Span of the Section which begins after the section header
- `section_span`: The entire of the Section (title + body)
- `section_parent`: A parent section of the specific section, if any


In [13]:
for section in doc._.sections:
    print(section)
    print()

Section(category=past_medical_history, title=Past Medical History:, body=Afib and community-acquired pneumonia.
, parent=None, rule=SectionRule(literal="PAST MEDICAL HISTORY:", category="past_medical_history", pattern=None, on_match=None, parents=[], parent_required=False))

Section(category=observation_and_plan, title=Assessment/Plan:, body=Acute stroke
, parent=None, rule=SectionRule(literal="Assessment/Plan:", category="observation_and_plan", pattern=None, on_match=None, parents=[], parent_required=False))



In [14]:
section.body_span

Acute stroke

Each of the section attributes can be accessed as a list individually:

In [15]:
doc._.section_categories

['past_medical_history', 'observation_and_plan']

In [16]:
doc._.section_titles

[Past Medical History:, Assessment/Plan:]

In [17]:
doc._.section_bodies

[Afib and community-acquired pneumonia., Acute stroke]

In [18]:
doc._.section_spans

[Past Medical History: Afib and community-acquired pneumonia.,
 Assessment/Plan: Acute stroke]

In [19]:
doc._.section_parents

[None, None]

# II. Span
The `Span` class contains extensions related to the TargetRule used to extract an entity, ConText assertion attributes, and section attributes.

In [20]:
medspacy.get_span_extensions().keys()

dict_keys(['window', 'context_attributes', 'any_context_attributes', 'section', 'section_span', 'section_category', 'section_title', 'section_body', 'section_parent', 'section_rule', 'contains', 'target_rule', 'literal', 'is_negated', 'is_historical', 'is_hypothetical', 'is_family', 'is_uncertain'])

We'll use this ent as an example:

In [21]:
span = doc.ents[1]
span

community-acquired pneumonia

## `span._.target_rule`
The `TargetMatcher` class uses instances of `TargetRule` to define entities to extract from the doc. When an entity is created, the rule which matched the Span is referenced in `span._.target_rule`. This allows you to see which rule generated an entity and to access the metadata from the original rule.

In [22]:
span._.target_rule

TargetRule(literal="pneumonia", category="CONDITION", pattern=community[- ]acquired pneumonia, attributes=None, on_match=None)

In [23]:
span._.target_rule.literal

'pneumonia'

## ConText Attributes
An important part of clinical is identifying whether concepts were actually experienced by the patient or if they were negated, historical, experienced by someone else, or uncertain. These attributes are asserted using the `ConTextComponent` in medspaCy and added to attributes for each entity but can also be set manually or using the `Sectionizer`.

#### `span._.context_attributes`
Get a dict mapping each ConText assertion attribute to its value (default is always `False`).

In [24]:
span._.context_attributes

{'is_negated': False,
 'is_historical': True,
 'is_hypothetical': False,
 'is_family': False,
 'is_uncertain': False}

#### `span._.any_context_attributes`
Often, you want to know if any of these values are True, as this is an indicator to exclude or ignore an entity. `any_context_attributes` is `True` if any of these values have been asserted to be True.

In [25]:
span._.any_context_attributes

True

You can also access each of these attributes individually:

In [26]:
span._.is_negated

False

In [27]:
span._.is_historical

True

In [28]:
span._.is_hypothetical

False

In [29]:
span._.is_family

False

In [30]:
span._.is_uncertain

False

## Sections
Similar to the section attributes in `Doc`, `Span` includes attributes indicating which section of a note it occurs in.

In [31]:
span._.section

Section(category=past_medical_history, title=Past Medical History:, body=Afib and community-acquired pneumonia.
, parent=None, rule=SectionRule(literal="PAST MEDICAL HISTORY:", category="past_medical_history", pattern=None, on_match=None, parents=[], parent_required=False))

In [32]:
span._.section_category

'past_medical_history'

In [33]:
span._.section_title

Past Medical History:

In [34]:
span._.section_body

Afib and community-acquired pneumonia.

In [35]:
span._.section_span

Past Medical History: Afib and community-acquired pneumonia.

In [36]:
span._.section_parent

In [37]:
span._.section_rule

SectionRule(literal="PAST MEDICAL HISTORY:", category="past_medical_history", pattern=None, on_match=None, parents=[], parent_required=False)

## Window
#### `span._.window(n=1, left=True, right=True)`
You often want to look at the context surrounding a concept. One way to do this is looking at the sentence (`span.sent`), but sentence splitting is expensive. An alternative is looking at a fixed window surrounding a concept. You can do this using the custom method `span._.window()`, which returns the superspan surrounding a given span.

By default this method will return a window of one token on each side of the span, but this can be modified to be larger and to exclude either the left or right side.

In [38]:
span

community-acquired pneumonia

In [39]:
span._.window()

and community-acquired pneumonia.

In [40]:
span._.window(2)

Afib and community-acquired pneumonia.

In [41]:
span._.window(2, left=False)

community-acquired pneumonia.

In [42]:
span._.window(2, right=False)

Afib and community-acquired pneumonia

## Contains
#### `span._.contains(target, regex=True, case_insensitive=True)`
Returns True if a target phrase is contained in the text underlying a span (ie., `span.text`). `target` can be either a string or list of strings. `regex` and `case_insensitive` define whether to search using regular expressions and whether to ignore case.

In [43]:
span

community-acquired pneumonia

In [44]:
span._.contains(r"community[- ]acquired")

True

In [45]:
span._.contains("community acquired", regex=False)

False

In [46]:
span._.contains(["pneumonia", "pna"])

True

# III. Token
Token extensions correspond to section and window attributes of `Span`.

In [47]:
medspacy.get_token_extensions().keys()

dict_keys(['window', 'section', 'section_span', 'section_category', 'section_title', 'section_body', 'section_parent', 'section_rule'])

In [48]:
token = doc[8]
token

acquired

## Section

In [49]:
token._.section

Section(category=past_medical_history, title=Past Medical History:, body=Afib and community-acquired pneumonia.
, parent=None, rule=SectionRule(literal="PAST MEDICAL HISTORY:", category="past_medical_history", pattern=None, on_match=None, parents=[], parent_required=False))

In [50]:
token._.section_category

'past_medical_history'

In [51]:
token._.section_title

Past Medical History:

In [52]:
token._.section_body

Afib and community-acquired pneumonia.

In [53]:
token._.section_span

Past Medical History: Afib and community-acquired pneumonia.

In [54]:
token._.section_parent

In [55]:
token._.section_rule

SectionRule(literal="PAST MEDICAL HISTORY:", category="past_medical_history", pattern=None, on_match=None, parents=[], parent_required=False)

## Window

In [56]:
token

acquired

In [57]:
token._.window()

-acquired pneumonia

In [58]:
token._.window(2)

community-acquired pneumonia.

In [59]:
token._.window(2, left=False)

acquired pneumonia.

In [60]:
token._.window(2, right=False)

community-acquired