#Using MetaMap with Python to access the UMLS MetaThesaurus - A Quick Start Guide
Last updated on Feb 25, 2022  4 min read  python, nlp
The National Library of Medicine (NLM) created a tremendous resource called the Unified Medical Language Systems (UMLS). The UMLS is a suite of data and software resources that offers several tools for processing, representing, and analyzing biomedical text data. One exciting use case that we’ll explore here is using UMLS to identify signs and symptoms that are documented in the text of clinical encounter notes.

MetaMap is a software tool that identifies concepts in the text of clinical notes that are found in UMLS ontologies. MetaMap is written in Java and there is a convenient wrapper in Python: PyMetaMap written by Anthony Rios. There are other Python wrappers around MetaMap but PyMetaMap was the most straightforward to get running on my machine so that’s what we’ll use here.

Orginal Source -
https://gweissman.github.io/post/using-metamap-with-python-to-access-the-umls-metathesaurus-a-quick-start-guide/ 

## Step 1: Installation

In [None]:
!pip install git+https://github.com/AnthonyMRios/pymetamap.git

Collecting git+https://github.com/AnthonyMRios/pymetamap.git
  Cloning https://github.com/AnthonyMRios/pymetamap.git to /tmp/pip-req-build-dxlu1sjs
  Running command git clone -q https://github.com/AnthonyMRios/pymetamap.git /tmp/pip-req-build-dxlu1sjs
Building wheels for collected packages: pymetamap
  Building wheel for pymetamap (setup.py) ... [?25l[?25hdone
  Created wheel for pymetamap: filename=pymetamap-0.2-py3-none-any.whl size=16515 sha256=f844b9b80e3410a8587d386ded835d81beb14506a65606e420a3e6ae84c4c1a7
  Stored in directory: /tmp/pip-ephem-wheel-cache-pk7ct86g/wheels/28/70/f2/2adb2775d8d5f0477bea3191849699cabf25ab748036cc53cb
Successfully built pymetamap
Installing collected packages: pymetamap
Successfully installed pymetamap-0.2


# Step 2: Get some text data
Because working with clinical notes is often challening due to issues of de-identification, here are a few sample text notes we’ll use here. These are completely made up and not based on any real people.

In [None]:
note1 = "Mrs. Jones came in today complaining of a lot of chest pain. She denies shortness of breath and fevers."

note2 = "Mr. Smith has been having pain in his knee for many weeks. It hurts when he walks. There is no swelling, erythema, or micromotion tenderness."

note3 = "Sandy Lemon has been having headaches for two months that are associated "

In [None]:
# Put them all together in a list
note_list = [note1, note2, note3]

Now load the necessary modules and set up the relevant servers in the background:

In [None]:
# Load MetaMap
from pymetamap import MetaMap

# Import os to make system calls
import os

# For pausing
from time import sleep

# Setup UMLS Server
metamap_base_dir = '/gwshare/umls_2021/metamap/public_mm/'
metamap_bin_dir = 'bin/metamap20'
metamap_pos_server_dir = 'bin/skrmedpostctl'
metamap_wsd_server_dir = 'bin/wsdserverctl'


In [None]:
# Start servers
os.system(metamap_base_dir + metamap_pos_server_dir + ' start') # Part of speech tagger
os.system(metamap_base_dir + metamap_wsd_server_dir + ' start') # Word sense disambiguation 

# Sleep a bit to give time for these servers to start up
sleep(60)

Set up a MetaMap object to do the work. If you are running into memory issues with this approach, see some tips here: https://lhncbc.nlm.nih.gov/ii/tools/MetaMap/Docs/OutOfMemory.pdf

In [None]:
metam = MetaMap.get_instance(metamap_base_dir + metamap_bin_dir)

AssertionError: ignored

In [None]:
cons, errs = metam.extract_concepts(note_list,
                                word_sense_disambiguation = True,
                                restrict_to_sts = ['sosy'], # signs and symptoms
                                composite_phrase = 1, # for memory issues
                                prune = 30)
                                

NameError: ignored