# Disorder Recognizer
This project aims to recognize disorders from biomedical text for the purposes of cohort-building and query-based case retrieval. The General Disorder Recognizer (GDR) detects general disorders from biomedical text, and maps them to their respective Concept Unique Identifiers (CUIs) in the Unified Medical Language System (UMLS). The Specific Disorder Recognizer (SDR) detects specific disorders that map to a custom-made or automatically-generated list of CUIs. The types of disorder that are recognized include:
- Acquired abnormality
- Anatomical abnormality
- Cell or molecular dysfunction
- Congenital abnormality
- Disease or syndrome
- Experimental model of disease
- Finding
- Injury or poisoning
- Mental or behavioral dysfunction
- Neoplastic process
- Pathologic function
- Sign or symptom

The functions can be easily tweaked to recognize the following semantic groups: 
- Activities & behaviors
- Anatomy
- Chemicals & drugs
- Concepts & ideas
- Devices
- Disorders
- Genes & molecular sequences
- Geographic areas
- Living beings
- Objects
- Occupations
- Organizations
- Phenomena
- Physiology
- Procedures

## Setup

In [1]:
import re
import tls # tls is short for tools.
            # tls.py contains functions for text cleaning.
import csv
import numpy
import pandas
pandas.set_option('display.max_colwidth', 0)

In [2]:
from py4j.java_gateway import JavaGateway # Metamap is a Java program.
gateway = JavaGateway() # Py4j provides a gateway to Metamap.
j = gateway.entry_point # j is short for Java.

## Clean Data

In [3]:
org = pandas.read_csv('org.csv', encoding='latin-1') # org is short for original dataset.
org[:1]

Unnamed: 0,ID,FULLTEXT,TOTAL,EPIH,SUBDH,IPH,CONT,SAH,IVH,SG,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13
0,1.0,"Final Report\n\nCLINICAL INFORMATION: {age_number} year-old man trauma\n\nCOMPARISON: No previous.\n\nPreliminary report given by radiology resident on-call at 12:46 hours.\n\nFINDINGS: The patient is intubated and with an orogastric tube in place.\n\nAcute left convexity subdural hematoma extending from the anterior midline over the frontal\ntemporal and parietal convexities. Maximum thickness of space collection is 0.9 cm.\n\nBifrontal multiple hemorrhagic contusions involving the inferior surface of the right\nfrontal lobe extending into both supraorbital gyri.\n\nLeft superior frontal acute subarachnoid hemorrhage.\n\nLeft inferior parietal small subarachnoid/intraparenchymal bleed.\n\nComplex fracture of the right mastoid bone involving the middle ear. For better assessment,\nrefer to CT facial bones. Adjacent pneumocephalus adjacent to right transverse sinus.\n\nPresence of tiny air pockets adjacent to the right orbital roof. Suspicious for nondisplaced\nfracture along the orbital roof.\n\nIMPRESSION: Multi compartmental acute posttraumatic intracranial bleeds. Right temporal\nbone complex fractures.\n\n{e-signature}",4.0,0.0,1.0,1.0,1.0,1.0,0.0,0.0,,,,


### Cleaner

In [4]:
"""
Args:
    _txt: text to process
    
Returns:
    Cleaned text
"""

def cln(_txt): # cln is short for clean.
    _txt = str(_txt).replace('\n\n', '.').replace('\n', ' ')
    _txt = re.sub(r'\.+', ".", _txt)
    _txt = re.sub(r'\,+', ".", _txt)
    return tls.nt_cln(_txt) # nt_cln is short for neat clean.
                            # nt_cln makes text neater,
                            # preserving both semantics and syntax.

#### Example

In [5]:
smp = org.iloc[0]['FULLTEXT'] # smp is short for sample.
print(smp)

Final Report

CLINICAL INFORMATION: {age_number} year-old man trauma

COMPARISON: No previous.

Preliminary report given by radiology resident on-call at 12:46 hours.

FINDINGS: The patient is intubated and with an orogastric tube in place.

Acute left convexity subdural hematoma extending from the anterior midline over the frontal
temporal and parietal convexities. Maximum thickness of space collection is 0.9 cm.

Bifrontal multiple hemorrhagic contusions involving the inferior surface of the right
frontal lobe extending into both supraorbital gyri.

Left superior frontal acute subarachnoid hemorrhage.

Left inferior parietal small subarachnoid/intraparenchymal bleed.

Complex fracture of the right mastoid bone involving the middle ear. For better assessment,
refer to CT facial bones. Adjacent pneumocephalus adjacent to right transverse sinus.

Presence of tiny air pockets adjacent to the right orbital roof. Suspicious for nondisplaced
fracture along the orbital roof.

IMPRESSION: Mul

In [6]:
cln(smp)

'final report . clinical information : { age _ number } year - old man trauma . comparison : no previous . preliminary report given by radiology resident on - call at 12 : 46 hours . findings : the patient is intubated and with an orogastric tube in place . acute left convexity subdural hematoma extending from the anterior midline over the frontal temporal and parietal convexities . maximum thickness of space collection is 0.9 cm . bifrontal multiple hemorrhagic contusions involving the inferior surface of the right frontal lobe extending into both supraorbital gyri . left superior frontal acute subarachnoid hemorrhage . left inferior parietal small subarachnoid / intraparenchymal bleed . complex fracture of the right mastoid bone involving the middle ear . for better assessment . refer to ct facial bones . adjacent pneumocephalus adjacent to right transverse sinus . presence of tiny air pockets adjacent to the right orbital roof . suspicious for nondisplaced fracture along the orbital

In [7]:
org['i'] = org['ID']
org['rpr'] = org['FULLTEXT'].apply(cln) # rpr is short for report.
org['ttl'] = org['TOTAL']
org['eph'] = org['EPIH']
org['sdh'] = org['SUBDH']
org['iph'] = org['IPH']
org['cnt'] = org['CONT']
org['sah'] = org['SAH']
org['ivh'] = org['IVH']
org['sgh'] = org['SG']
dtt = org[['rpr', 'ttl', 'eph', 'sdh', 'iph', 'cnt', 'sah', 'ivh', 'sgh']][:99] # dtt is short for data.
for clm in ['ttl', 'eph', 'sdh', 'iph', 'cnt', 'sah', 'ivh', 'sgh']:
    dtt[clm] = dtt[clm].apply(lambda x:int(x))

In [8]:
dtt.to_csv('dtt.csv', encoding='latin-1', index=False, quoting=csv.QUOTE_NONNUMERIC)
dtt = pandas.read_csv('dtt.csv', encoding='latin-1') # dtt is short for data.
dtt[:1]

Unnamed: 0,rpr,ttl,eph,sdh,iph,cnt,sah,ivh,sgh
0,final report . clinical information : { age _ number } year - old man trauma . comparison : no previous . preliminary report given by radiology resident on - call at 12 : 46 hours . findings : the patient is intubated and with an orogastric tube in place . acute left convexity subdural hematoma extending from the anterior midline over the frontal temporal and parietal convexities . maximum thickness of space collection is 0.9 cm . bifrontal multiple hemorrhagic contusions involving the inferior surface of the right frontal lobe extending into both supraorbital gyri . left superior frontal acute subarachnoid hemorrhage . left inferior parietal small subarachnoid / intraparenchymal bleed . complex fracture of the right mastoid bone involving the middle ear . for better assessment . refer to ct facial bones . adjacent pneumocephalus adjacent to right transverse sinus . presence of tiny air pockets adjacent to the right orbital roof . suspicious for nondisplaced fracture along the orbital roof . impression : multi compartmental acute posttraumatic intracranial bleeds . right temporal bone complex fractures . { e - signature },4,0,1,1,1,1,0,0


### Abbreviation Expander

In [9]:
abbs = list(dtt.columns[2:]) # abbs is short for abbreviations.
abbs

['eph', 'sdh', 'iph', 'cnt', 'sah', 'ivh', 'sgh']

In [10]:
dsrs = ['epidural hematoma', # dsrs is short for disorders.
 'subdural hematoma',
 'intraparenchymal hemorrhage',
 'contusion',
 'subarachnoid hemorrhage',
 'intraventricular hemorrhage',
 'subgaleal hematoma']
dsrs

['epidural hematoma',
 'subdural hematoma',
 'intraparenchymal hemorrhage',
 'contusion',
 'subarachnoid hemorrhage',
 'intraventricular hemorrhage',
 'subgaleal hematoma']

In [11]:
def abb_exp(_txt, abbs=None, exps=None): # abb_exp is short for abbreviation expansion.
    if (abbs is None) or (exps is None) or (len(abbs)!=len(exps)):
        return _txt
    else:
        _rtr = _txt
        for i in range(len(abbs)):
            _rtr = _rtr.replace(abbs[i], exps[i])
        return _rtr

#### Example

In [12]:
tmp = dtt.loc[64]['rpr'].split('.')[1].strip()
tmp

'clinical information : { age _ number } year - old woman fu sah / sdh please do ct head at 10 : 00 am'

In [13]:
abb_exp(tmp, abbs, dsrs)

'clinical information : { age _ number } year - old woman fu subarachnoid hemorrhage / subdural hematoma please do ct head at 10 : 00 am'

## Disorder Recognizers

### General Disorder Recognizer (GDR)

In [14]:
"""
Args:
    _txt: text to process
    
Returns:
    Prints the sentence fragment and all disorders therein recognized by Metamap in the following format.
    Returns the corresponding list of CUIs
"""

def gdr(_txt, abbs=None, exps=None):
    if _txt == '':
        return ''
    _txt = abb_exp(_txt, abbs, exps)
    snts = re.split(',|\.', _txt)
    cuis = []
    for _snt in snts:
        if len(_snt)>2:
            print(_snt.strip())
            _dsr = j.dsr(_snt) # j.dsr is a function in hui.java that configures Metamap to 
                                # generate all derivational variants of a phrase for UMLS concept mapping.
                                # This may produce overmatches, but can be easily tweaked.
            if _dsr != '':
                print(_dsr)
            cuis += [_ln.split('\t')[2] for _ln in _dsr.split('\n') if _ln.split('\t')[0]=='1']
    return ' '.join(list(set(cuis)))

#### Example

In [16]:
gdr(dtt.iloc[0]['rpr'], abbs, dsrs)

final report
clinical information : { age _ number } year - old man trauma
1	inpo	C0332666	old injury|old, trauma
comparison : no previous
preliminary report given by radiology resident on - call at 12 : 46 hours
1	fndg	C3835651	resident - answer to question|resident
1	neop	C1292769	precursor b-cell lymphoblastic leukemia|c, all
findings : the patient is intubated and with an orogastric tube in place
acute left convexity subdural hematoma extending from the anterior midline over the frontal temporal and parietal convexities
1	patf	C0749098	hematoma, subdural, acute|acute, subdural, hematoma
maximum thickness of space collection is 0
1	fndg	C3842591	0%|0
9 cm
bifrontal multiple hemorrhagic contusions involving the inferior surface of the right frontal lobe extending into both supraorbital gyri
1	inpo	C0559567	multiple bruising|multiple, bruising
left superior frontal acute subarachnoid hemorrhage
1	dsyn	C0038525	subarachnoid hemorrhage|subarachnoid, hemorrhage
1	patf	C0333276	acute hemo

'C4049863 C0016658 C3842591 C1292769 C0333276 C0585058 C0749098 C4050405 C0151699 C0332666 C3835651 C0559567 C0038525'

### Specific Disorder Recognizer (SDR)

### CUI Recommender

In [17]:
"""
Args:
    _dsr: disorder name
    
Returns:
    Prints the UMLS CUI, map score (if greater than 925) , preferred name, and matched words
    Returns the list of recommended CUIs
"""

def rcm(_dsr): # rcm is short for recommender. _dsr is short for disorder.
    _rcm = j.rcm(_dsr)
    print(_rcm)
    return [_ln.split()[0] for _ln in _rcm.split('\n')] # _ln is short for line.

#### Example

In [18]:
rcm('subarachnoid hemorrhage')

C0038525	1000	subarachnoid hemorrhage|subarachnoid, hemorrhage
C4039573	944	nontraumatic subarachnoid intracranial hemorrhage|nontraumatic, subarachnoid, intracranial, hemorrhage
C0270192	925	perinatal subarachnoid hemorrhage|perinatal, subarachnoid, haemorrhage
C0472383	925	subarachnoid hemorrhage, spontaneous|spontaneous, subarachnoid, hemorrhage
C0475073	925	subarachnoid hemorrhage, traumatic|traumatic, subarachnoid, haemorrhage
C0751530	925	subarachnoid hemorrhage, aneurysmal|aneurysmal, subarachnoid, hemorrhage
C0795688	925	subarachnoid hemorrhage, intracranial|intracranial, subarachnoid, hemorrhage
C1410400	925	nontraumatic subarachnoid hemorrhage, unspecified|nontraumatic, subarachnoid, hemorrhage
C3838874	925	perimesencephalic subarachnoid hemorrhage|perimesencephalic, subarachnoid, hemorrhage
C3839590	925	convexal subarachnoid hemorrhage|convexal, subarachnoid, hemorrhage


['C0038525',
 'C4039573',
 'C0270192',
 'C0472383',
 'C0475073',
 'C0751530',
 'C0795688',
 'C1410400',
 'C3838874',
 'C3839590']

### Keyword Filter

In [19]:
"""
Args:
    _txt: text to process
    _dsr: disorder name
    
Returns:
    Returns filtered text where each sentence contains the first word of the disorder
"""

def kyw_flt(_txt, _dsr): # flt for filter
    snts = _txt.split('.') # snts is short for sentences.
    wrds = _dsr.split(' ') # wrds is short for words.
    return '.'.join([_snt for _snt in snts if wrds[0] in _snt]).strip()

#### Example

In [20]:
tmp = dtt.iloc[0]['rpr'] # tmp is short for temporary.
tmp

'final report . clinical information : { age _ number } year - old man trauma . comparison : no previous . preliminary report given by radiology resident on - call at 12 : 46 hours . findings : the patient is intubated and with an orogastric tube in place . acute left convexity subdural hematoma extending from the anterior midline over the frontal temporal and parietal convexities . maximum thickness of space collection is 0.9 cm . bifrontal multiple hemorrhagic contusions involving the inferior surface of the right frontal lobe extending into both supraorbital gyri . left superior frontal acute subarachnoid hemorrhage . left inferior parietal small subarachnoid / intraparenchymal bleed . complex fracture of the right mastoid bone involving the middle ear . for better assessment . refer to ct facial bones . adjacent pneumocephalus adjacent to right transverse sinus . presence of tiny air pockets adjacent to the right orbital roof . suspicious for nondisplaced fracture along the orbital

In [21]:
kyw_flt(tmp, dsrs[4])

'left superior frontal acute subarachnoid hemorrhage . left inferior parietal small subarachnoid / intraparenchymal bleed'

### SDR Creator

In [22]:
"""
Args:
    _dsr: disorder name
    _abb: abbreviation of disorder (optional)
    cuis: list of CUIs to match (optional)
    
Returns:
    Returns a function that detects the presence of _dsr 
"""

def sdr_crt(_dsr, abbs, exps, cuis=None): # sdr_crt is short for specific disorder recognizer creator.
    if cuis is None:
        cuis = rcm(_dsr)
    def sdr(_txt): # sdr is short for specific disorder recognizer.
        _txt = abb_exp(_txt, abbs, exps)
        snts = re.split(',|\.', _txt)
        mtm_cuis = []
        for _snt in snts:
            mtm_cuis += gdr(kyw_flt(_snt, _dsr)).split(' ')
        mtm_cuis = list(set(mtm_cuis))
        return int(len([cui for cui in mtm_cuis if cui in cuis])>0)
    return sdr

#### Example

In [23]:
sdh_sdr = sdr_crt(dsrs[4], abbs, dsrs) # subdh_sdr is short for subdural hematoma (SDH) specific disorder recognizer.
                                            # sdh is an abbreviation of subdural hematoma by inspection.

C0038525	1000	subarachnoid hemorrhage|subarachnoid, hemorrhage
C4039573	944	nontraumatic subarachnoid intracranial hemorrhage|nontraumatic, subarachnoid, intracranial, hemorrhage
C0270192	925	perinatal subarachnoid hemorrhage|perinatal, subarachnoid, haemorrhage
C0472383	925	subarachnoid hemorrhage, spontaneous|spontaneous, subarachnoid, hemorrhage
C0475073	925	subarachnoid hemorrhage, traumatic|traumatic, subarachnoid, haemorrhage
C0751530	925	subarachnoid hemorrhage, aneurysmal|aneurysmal, subarachnoid, hemorrhage
C0795688	925	subarachnoid hemorrhage, intracranial|intracranial, subarachnoid, hemorrhage
C1410400	925	nontraumatic subarachnoid hemorrhage, unspecified|nontraumatic, subarachnoid, hemorrhage
C3838874	925	perimesencephalic subarachnoid hemorrhage|perimesencephalic, subarachnoid, hemorrhage
C3839590	925	convexal subarachnoid hemorrhage|convexal, subarachnoid, hemorrhage


#### Note
The recommended list of CUIs for SDH contains many overmatches, so a list of CUIs to match is custom-made from the printed results above.

In [24]:
cuis = ['C0018946',
 'C0270087',
 'C0393494',
 'C0393495',
 'C0238156',
 'C0265080',
 'C0749095',
 'C0749098',
 'C0854700',
 'C1367166']

In [25]:
sdh_sdr = sdr_crt(dsrs[1], abbs, dsrs, cuis) # This SDR only recognizes CUIs from the custom-made list.

In [26]:
dtt['mtm_sdh'] = dtt['rpr'].apply(sdh_sdr)

acute left convexity subdural hematoma extending from the anterior midline over the frontal temporal and parietal convexities
1	patf	C0749098	hematoma, subdural, acute|acute, subdural, hematoma
presence of a right sided subdural hematoma located along the right tentorial leaflet and extending into the right occipital and temporal lobes
1	patf	C0018946	hematoma, subdural|subdural, haematoma
extensive multi - compartmental intracranial hemorrhage including right - sided subdural hematoma
1	patf	C0151699	intracranial hemorrhages|intracranial, haemorrhage
1	patf	C0018946	hematoma, subdural|subdural, haematoma
left temporal acute subdural hematoma and subarachnoid hemorrhage with no significant variation in comparison to previous study
1	patf	C0749098	hematoma, subdural, acute|acute, subdural, hemorrhage
1	dsyn	C1399315	subarachnoid hematoma|subarachnoid, hematoma
1	dsyn	C0038525	subarachnoid hemorrhage|subarachnoid, hemorrhage
bilateral anterior and posterior parasagittal craniotomies with

In [10]:
mtm_abbs = ['mtm_'+abb for abb in abbs] # mtm_abbs is short for Metamap recognition of the disorder of the abbreviation.
mtm_abbs

['mtm_eph', 'mtm_sdh', 'mtm_iph', 'mtm_cnt', 'mtm_sah', 'mtm_ivh', 'mtm_sgh']

In [27]:
sdh_err = dtt[dtt[mtm_abbs[1]]!=dtt[abbs[1]]].copy() # subdh_err is short for subdural hematoma errors.
sdh_err['flt'] = sdh_err['rpr'].apply(lambda _x:kyw_flt(abb_exp(_x, abbs, dsrs), dsrs[1]))
sdh_err[['flt', abbs[1], mtm_abbs[1]]]

Unnamed: 0,flt,sdh,mtm_sdh
32,left temporal intraparenchymal contusion with subarachnoid and subdural extension and surrounding vasogenic edema causing mass effect and left hemispheric brain edema,1,0
41,no significant changes of the bilateral holohemispheric subdural hematomas with supratentorial extension,1,0
64,clinical information : { age _ number } year - old woman fu subarachnoid hemorrhage / subdural hematoma please do ct head at 10 : 00 am,0,1
71,the previously described small subdural collection over the right paracentral lobule has decreased in size . there is no significant widening of the subarachnoid or subdural space over the left cerebral convexity,1,0


#### Accuracy

In [28]:
numpy.round(1-len(sdh_err)/len(dtt), 2)

0.96

## Application
Here, we apply GDR to the whole dataset, and compared the results to the dataset labels.

In [29]:
dtt['dsrs'] = dtt['rpr'].apply(lambda _txt : gdr(_txt, abbs, dsrs))

final report
clinical information : { age _ number } year - old man trauma
1	inpo	C0332666	old injury|old, trauma
comparison : no previous
preliminary report given by radiology resident on - call at 12 : 46 hours
1	fndg	C3835651	resident - answer to question|resident
1	neop	C1292769	precursor b-cell lymphoblastic leukemia|c, all
findings : the patient is intubated and with an orogastric tube in place
acute left convexity subdural hematoma extending from the anterior midline over the frontal temporal and parietal convexities
1	patf	C0749098	hematoma, subdural, acute|acute, subdural, hematoma
maximum thickness of space collection is 0
1	fndg	C3842591	0%|0
9 cm
bifrontal multiple hemorrhagic contusions involving the inferior surface of the right frontal lobe extending into both supraorbital gyri
1	inpo	C0559567	multiple bruising|multiple, bruising
left superior frontal acute subarachnoid hemorrhage
1	dsyn	C0038525	subarachnoid hemorrhage|subarachnoid, hemorrhage
1	patf	C0333276	acute hemo

In [30]:
dtt[:1]

Unnamed: 0,rpr,ttl,eph,sdh,iph,cnt,sah,ivh,sgh,mtm_sdh,dsrs
0,final report . clinical information : { age _ number } year - old man trauma . comparison : no previous . preliminary report given by radiology resident on - call at 12 : 46 hours . findings : the patient is intubated and with an orogastric tube in place . acute left convexity subdural hematoma extending from the anterior midline over the frontal temporal and parietal convexities . maximum thickness of space collection is 0.9 cm . bifrontal multiple hemorrhagic contusions involving the inferior surface of the right frontal lobe extending into both supraorbital gyri . left superior frontal acute subarachnoid hemorrhage . left inferior parietal small subarachnoid / intraparenchymal bleed . complex fracture of the right mastoid bone involving the middle ear . for better assessment . refer to ct facial bones . adjacent pneumocephalus adjacent to right transverse sinus . presence of tiny air pockets adjacent to the right orbital roof . suspicious for nondisplaced fracture along the orbital roof . impression : multi compartmental acute posttraumatic intracranial bleeds . right temporal bone complex fractures . { e - signature },4,0,1,1,1,1,0,0,1,C0333276 C0332666 C3835651 C0559567 C0038525 C0585058 C0749098 C4050405 C0151699 C3842591 C1292769 C4049863 C0016658


In [31]:
tmp = dtt[['rpr', 'dsrs', 'ttl', 'eph', 'sdh', 'iph', 'cnt', 'sah', 'ivh', 'sgh']]
tmp.to_csv('mtm.csv', encoding='latin-1', index=False, quoting=csv.QUOTE_NONNUMERIC)
dtt = pandas.read_csv('mtm.csv', delimiter=',', encoding='latin-1')
dtt[:1]

Unnamed: 0,rpr,dsrs,ttl,eph,sdh,iph,cnt,sah,ivh,sgh
0,final report . clinical information : { age _ number } year - old man trauma . comparison : no previous . preliminary report given by radiology resident on - call at 12 : 46 hours . findings : the patient is intubated and with an orogastric tube in place . acute left convexity subdural hematoma extending from the anterior midline over the frontal temporal and parietal convexities . maximum thickness of space collection is 0.9 cm . bifrontal multiple hemorrhagic contusions involving the inferior surface of the right frontal lobe extending into both supraorbital gyri . left superior frontal acute subarachnoid hemorrhage . left inferior parietal small subarachnoid / intraparenchymal bleed . complex fracture of the right mastoid bone involving the middle ear . for better assessment . refer to ct facial bones . adjacent pneumocephalus adjacent to right transverse sinus . presence of tiny air pockets adjacent to the right orbital roof . suspicious for nondisplaced fracture along the orbital roof . impression : multi compartmental acute posttraumatic intracranial bleeds . right temporal bone complex fractures . { e - signature },C0333276 C0332666 C3835651 C0559567 C0038525 C0585058 C0749098 C4050405 C0151699 C3842591 C1292769 C4049863 C0016658,4,0,1,1,1,1,0,0


In [32]:
dsrs = ['epidural hematoma', 'subdural hematoma', 'intraparenchymal hemorrhage', 'contusion', 'subarachnoid hemorrhage', 'intraventricular hemorrhage', 'subgaleal hematoma']
dsrs

['epidural hematoma',
 'subdural hematoma',
 'intraparenchymal hemorrhage',
 'contusion',
 'subarachnoid hemorrhage',
 'intraventricular hemorrhage',
 'subgaleal hematoma']

In [33]:
cui_lsts = [[] for i in range(7)]

In [34]:
for i in range(len(dsrs)):
    print(dsrs[i])
    cui_lsts[i] = rcm(dsrs[i])
    print()

epidural hematoma
C0238154	1000	hematoma, epidural, cranial|epidural, haematoma
C0877172	1000	hematoma, epidural, spinal|epidural, haematoma
C2832464	960	epidural hemorrhage with loss of consciousness of unspecified duration|epidural, hemorrhage, with, loss, of, consciousness, of, unspecified, duration
C0018944	960	hematoma|hematoma
C1962958	960	hematoma adverse event|hematoma
C4021393	951	spinalarachnoid cyst|epidural, arachnoid, cysts, of, the, spinal, canal
C2832428	945	epidural hemorrhage without loss of consciousness|epidural, hemorrhage, without, loss, of, consciousness
C0016428	933	eruption cyst|eruption, hematoma
C0018946	933	hematoma, subdural|subdural, haematoma
C0021870	933	intracerebral hematoma|intracerebral, haematoma
C0021893	933	intraplacental hematoma|intraplacental, hematoma
C0155784	933	thrombosed external hemorrhoids|perianal, hematoma
C0156389	933	vaginal hematoma|vaginal, hematoma
C0156397	933	vulval hematoma|vulval, hematoma
C0240159	933	laryngeal hematoma|laryng

In [35]:
cui_lsts[0] = ['C0238154',
            'C0877172',
            'C2832464',
            'C2832428',
            'C0472385',
            'C0474997',
            'C2062695']

In [36]:
cui_lsts[1] = ['C0018946',
             'C0270087',
             'C0393494',
             'C0393495',
             'C0238156',
             'C0265080',
             'C0749095',
             'C0749098',
             'C0854700',
             'C1367166']

In [37]:
cui_lsts[2] = ['C3552968'
            'C3698285'
            'C0475010'
            'C2316496']

In [38]:
errs = [None for i in range(7)]

In [39]:
def dsr_cui(_dsrs, cuis):
    return int(any([_cui in cuis for _cui in _dsrs.split(' ')]))

In [40]:
for i in range(7): # Create Dataframes of which entries were incorrectly classified.
    dtt[mtm_abbs[i]] = dtt['dsrs'].apply(lambda _dsrs : dsr_cui(_dsrs, cui_lsts[i]))
    tmp = dtt[dtt[mtm_abbs[i]]!=dtt[abbs[i]]].copy()
    tmp['flt'] = tmp['rpr'].apply(lambda _x:kyw_flt(abb_exp(_x, abbs, dsrs), dsrs[i]))
    errs[i] = tmp[['flt', abbs[i], mtm_abbs[i]]]

In [41]:
dtt.to_csv('mtm.csv', encoding='latin-1', index=False, quoting=csv.QUOTE_NONNUMERIC)

### Accuracies

In [42]:
for i in range(7):
    print(dsrs[i])
    print(numpy.round(1-len(errs[i])/len(dtt), 2))
    print()

epidural hematoma
1.0

subdural hematoma
0.96

intraparenchymal hemorrhage
0.71

contusion
0.91

subarachnoid hemorrhage
0.82

intraventricular hemorrhage
0.89

subgaleal hematoma
0.93



### Review of Errors

In [43]:
errs[0]

Unnamed: 0,flt,eph,mtm_eph


In [44]:
errs[1]

Unnamed: 0,flt,sdh,mtm_sdh
32,left temporal intraparenchymal contusion with subarachnoid and subdural extension and surrounding vasogenic edema causing mass effect and left hemispheric brain edema,1,0
41,no significant changes of the bilateral holohemispheric subdural hematomas with supratentorial extension,1,0
64,clinical information : { age _ number } year - old woman fu subarachnoid hemorrhage / subdural hematoma please do ct head at 10 : 00 am,0,1
71,the previously described small subdural collection over the right paracentral lobule has decreased in size . there is no significant widening of the subarachnoid or subdural space over the left cerebral convexity,1,0


In [45]:
errs[2]

Unnamed: 0,flt,iph,mtm_iph
0,left inferior parietal small subarachnoid / intraparenchymal bleed,1,0
4,presence of at least 5 foci of intraparenchymal hemorrhage surrounded by vasogenic edema with the largest measuring 1. bilateral intraparenchymal contusions,1,0
10,,1,0
15,,1,0
33,,1,0
36,,1,0
42,,1,0
43,the intraparenchymal hematomas compatible with areas of hemorrhagic contusion in the temporal lobes and in the frontal lobe demonstrate stable morphology in comparison with previous examination,1,0
44,intraparenchymal hematomas compatible with areas of hemorrhagic contusion in both temporal lobes,1,0
45,,1,0


In [46]:
errs[3]

Unnamed: 0,flt,cnt,mtm_cnt
0,bifrontal multiple hemorrhagic contusions involving the inferior surface of the right frontal lobe extending into both supraorbital gyri,1,0
29,more probably subarachnoid bleed but possibly cortical hemorrhagic contusion,1,0
40,suggestive of hemorrhagic contusion,0,1
56,left frontal supraorbital hemorrhagic contusions . multi compartmental traumatic bleed with subarachnoid hemorrhage and left supraorbital and right superior frontal hemorrhagic contusions,0,1
57,small focal areas of brain parenchymal hemorrhagic contusion involving the left inferior and medial temporal gyri and the left gyrus rectus,1,0
79,likely cortical contusion,1,0
84,,1,0
89,unchanged left frontal hemorrhagic parenchymal contusion,1,0
91,no interval significant change in the left frontal hemorrhagic parenchymal contusion,1,0


In [47]:
errs[4]

Unnamed: 0,flt,sah,mtm_sah
1,note : small posttraumatic subarachnoid bleed could be missed in the presence of recent intravenous contrast injection,0,1
7,acute confusion hx of subarachnoid hemorrhage,0,1
8,clinical information : { age _ number } years old r / o subarachnoid hemorrhage,0,1
14,focal intra - axial subarachnoid hyperdensity involving the left inferior frontal sulcus . focal extra - axial subarachnoid hyperdensity involving the left inferior frontal sulcus . however is not possible to completely rule out subarachnoid hemorrhage,1,0
21,interval resolution of the subarachnoid hemorrhage . follow - up study with interval resolution of the subarachnoid hemorrhage and expected evolution of the right temporoparietal subdural hematoma,0,1
23,no evidence of new subarachnoid hemorrhage,1,0
28,acute subarachnoid bleed in the in the middle left temporal gyrus area,1,0
32,left temporal intraparenchymal contusion with subarachnoid and subdural extension and surrounding vasogenic edema causing mass effect and left hemispheric brain edema,1,0
34,small right subarachnoid and interhemispheric subdural falcine hematoma,1,0
37,o subarachnoid bleed,1,0


In [48]:
errs[5]

Unnamed: 0,flt,ivh,mtm_ivh
11,subarachnoid hemorrhage with intraventricular extension,1,0
12,with mild intraventricular extension . traumatic subarachnoid hemorrhage with intraventricular extension,1,0
41,with mild increase in the intraventricular component of the subarachnoid hemorrhage . follow - up study with mild increase in the intraventricular component of the subarachnoid hemorrhage,1,0
42,traumatic subarachnoid hemorrhage with intraventricular extension and no hydrocepidural hematomaalus . traumatic subarachnoid hemorrhage with intraventricular extension,1,0
43,stable a small amount of intraventricular hemorrhage in the occipital horn of the right lateral ventricle,0,1
46,,1,0
60,large mostly infratentorial diffuse subarachnoid hemorrhage with intraventricular extension,1,0
69,small to moderate amount of intraventricular blood layering in the occipital horns and atria of the lateral ventricles . there is also intraventricular clot in the left lateral ventricle . stable amount of intraventricular blood,1,0
76,,1,0
77,,1,0


In [49]:
errs[6]

Unnamed: 0,flt,sgh,mtm_sgh
5,right parietal subgaleal hematoma and emphysema with no evidence of underlying fracture . right parietal subgaleal hematoma with no evidence of underlying fracture,0,1
31,right subgaleal occipital hematoma underlying the linear nondisplaced occipital fracture,0,1
46,,1,0
51,right parietal subgaleal hematoma,0,1
59,left temporal subgaleal hematoma,0,1
77,left parietal acute subgaleal hematoma with no underlying fracture,0,1
90,thin subgaleal hematoma overlying the left parieto - occipital fracture,0,1
