In [2]:
import sys
sys.path.insert(0, "..")

# This notebook provides a brief introduction into using QuickUMLS in medspacy as well as details on how it can be used in any spacy pipeline
### Data: A full QuickUMLS resource of the UMLS is not provided here as this would violate license agreements.  However, below are some resources on how to generate such resources.  In this medspacy repo is a small sample of UMLS (RRF files) which contains less than 100 concepts which can be found here:
https://www.nlm.nih.gov/research/umls/new_users/online_learning/Meta_006.html
### Usage: The cells below show how to use the QuickUMLS component on its own or in combination with other medspacy components out of the box such as `medspacy.context` for detecting semantic modifiers and attributes of entities, including negation, uncertainty and others.  Also, section detection is demonstrated to potentially use this information in concert with entity extraction.
### Generating QuickUMLS resources: Given RRF UMLS files, you can generate your own QuickUMLS resources with parameters such as language, character case and more.  To see more, consult the documentation here from the original QuickUMLS repo:
https://github.com/Georgetown-IR-Lab/QuickUMLS

## As of now, QuickUMLS and its dependencies are only set up by default in medspacy for Linux and MacOS.  Hopefully a later release will address Windows, but for now, you can follow the steps here: [windows_and_quickumls.md](..//windows_and_quickumls.md)

In [3]:
import sys

import spacy
import medspacy
import nltk

from medspacy.util import DEFAULT_PIPENAMES
from medspacy.visualization import visualize_ent
from medspacy.section_detection import Sectionizer

In [4]:
print('Running on platform: {}'.format(sys.platform))

Running on platform: darwin


# Enable the QuickUMLS component by name since it is not enabled by default

In [5]:
medspacy_pipes = DEFAULT_PIPENAMES.copy()

if 'medspacy_quickumls' not in medspacy_pipes: 
    medspacy_pipes.add('medspacy_quickumls')
    
print(medspacy_pipes)
    
nlp = medspacy.load(enable = medspacy_pipes)

{'medspacy_target_matcher', 'medspacy_context', 'medspacy_quickumls', 'medspacy_tokenizer', 'medspacy_pyrush'}




Loading QuickUMLS resources from a default SAMPLE of UMLS data from here: /Users/alecchapman/Code/medspacy/medspacy/resources/quickumls/QuickUMLS_SAMPLE_lowercase_POSIX_unqlite


# Check which pipe components have been enabled.  This way we ensure that the QuickUMLS matcher is in the list

In [6]:
nlp.pipe_names

['medspacy_pyrush',
 'medspacy_target_matcher',
 'medspacy_quickumls',
 'medspacy_context']

# First's see a visualization of one of the concepts in the small sample of UMLS provided.  This concept is "dipalmitoyllecithin" which is Concept Unique Identifier (CUI) C0000039 in UMLS.

In [7]:
concept_text = 'Decreased dipalmitoyllecithin content found in lung specimens'

In [8]:
doc = nlp(concept_text)

In [9]:
visualize_ent(doc)

## However, there is additional metadata about any concept extracted by QuickUMLS.  For example, any CUI like this one can be a member of multiple Semantic Type.  In this case, the concept belongs to more than one Semantic Type.  Additionally, since QuickUMLS performs approximate matching, note that the similarity of the extracted concept from our text to the canonical resources can be observed.  In this case, since there is no lexical difference, we see 100% (1.0) similarity.

In [10]:
for ent in doc.ents:
    print('Entity text : {}'.format(ent.text))
    print('Label (UMLS CUI) : {}'.format(ent.label_))
    print('Similarity : {}'.format(ent._.similarity))
    print('Semtypes : {}'.format(ent._.semtypes))

Entity text : dipalmitoyllecithin
Label (UMLS CUI) : C0000039
Similarity : 0.8888888888888888
Semtypes : {'T119', 'T121'}


# So this is an example of how to use QuickUMLS on its own.  What if we want to see negation as well?  Remember that QuickUMLS enables the `medspacy.context` component by default.  It's here in this list and we did not need to explicitly enable it.  Further, notice that the QuickUMLS component is ordered just before context.  This ensures that the Entity objects are added to the spacy `Doc` before `context` runs.

In [11]:
nlp.pipe_names

['medspacy_pyrush',
 'medspacy_target_matcher',
 'medspacy_quickumls',
 'medspacy_context']

# Now let's come up with a different example with negation on a different lexical variant for the same UMLS concept in the relatively small "sample" resource.

In [12]:
negation_text = 'No findings of Dipalmitoyl Phosphatidylcholine in pulmonary specimen.'

In [13]:
negation_doc = nlp(negation_text)

In [14]:
for ent in negation_doc.ents:
    print('Entity text : {}'.format(ent.text))
    print('Label (UMLS CUI) : {}'.format(ent.label_))
    print('Similarity : {}'.format(ent._.similarity))
    print('Semtypes : {}'.format(ent._.semtypes))

Entity text : Dipalmitoyl Phosphatidylcholine
Label (UMLS CUI) : C0000039
Similarity : 0.78125
Semtypes : {'T119', 'T121'}


In [15]:
visualize_ent(negation_doc)

## Note that the `context` component adds the attributes about the entity such as negation and others into members of the "underscore" (`_`) which can also be examined like this: 

In [16]:
for ent in negation_doc.ents:
    if any([ent._.is_negated, ent._.is_uncertain, ent._.is_historical, ent._.is_family, ent._.is_hypothetical, ]):
        print("'{0}' modified by {1} in: '{2}'".format(ent, ent._.modifiers, ent.sent))
        print()

'Dipalmitoyl Phosphatidylcholine' modified by (<ConTextModifier> [No findings of, NEGATED_EXISTENCE],) in: 'No findings of Dipalmitoyl Phosphatidylcholine in pulmonary specimen.'



## Finally, the location where an entity is found can also be meaningful.  For example, knowledge of whether a condition was from a previous visit or in the present visit can be important for treatment, billing, etc.  Here is a short example using a different concept from the same sample UMLS resource.  Note that the `sectionizer` is not enabled by default so let's set up a new pipeline first which enables both the `sectionizer` and QuickUMLS.

In [19]:
if 'sectionizer' not in nlp.pipe_names:
    print('Creating and adding sectionizer to pipeline...')
    # Now let's add a sectionizer as our final step
    nlp.add_pipe("medspacy_sectionizer")

Creating and adding sectionizer to pipeline...


In [20]:
nlp.pipe_names

['medspacy_pyrush',
 'medspacy_target_matcher',
 'medspacy_quickumls',
 'medspacy_context',
 'medspacy_sectionizer']

# Now we process

In [21]:
section_text = """
Family History:
Dipalmitoyl Phosphatidylcholine found in father's pulmonary specimen.

History of Present Illness:
No evidence of Dipalmitoyl Phosphatidylcholine in patient's pulmonary specimen.
"""

In [22]:
section_doc = nlp(section_text)

## First, look at the section titles detected here: 

In [23]:
# Normalized section titles
print(section_doc._.section_titles)

[, Family History:, History of Present Illness:]


In [24]:
visualize_ent(section_doc)

## As with `context`, we can inspect the section detection information on a per-entity level:

In [25]:
for ent in section_doc.ents:
    print('Entity text : {}'.format(ent.text))
    print('Label (UMLS CUI) : {}'.format(ent.label_))
    print('Similarity : {}'.format(ent._.similarity))
    print('Semtypes : {}'.format(ent._.semtypes))
    print('Section Category : {}'.format(ent._.section_category))
    print('Section Title : {}'.format(ent._.section_title))

Entity text : Dipalmitoyl Phosphatidylcholine
Label (UMLS CUI) : C0000039
Similarity : 0.78125
Semtypes : {'T119', 'T121'}
Section Category : family_history
Section Title : Family History:
Entity text : Dipalmitoyl Phosphatidylcholine
Label (UMLS CUI) : C0000039
Similarity : 0.78125
Semtypes : {'T119', 'T121'}
Section Category : history_of_present_illness
Section Title : History of Present Illness:
