Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detecting negation for one CUI but failing to detect negation for other CUIs #25

Open
kaushikacharya opened this issue Feb 23, 2019 · 2 comments

Comments

@kaushikacharya
Copy link
Contributor

kaushikacharya commented Feb 23, 2019

Environment: Using MetaMap 2016v2
Sentence:

There is no spinal canal hematoma.

Among other CUIs, these are the ones I am focusing on:

<annotation id="2">
        <infon key="term">Hematoma</infon>
        <infon key="semtype">patf</infon>
        <infon key="CUI">C0018944</infon>
        <infon key="annotator">MetaMap</infon>
        <location length="8" offset="25"/>
        <text>hematoma</text>
      </annotation>
      <annotation id="3">
        <infon key="term">spinal hematoma</infon>
        <infon key="semtype">inpo</infon>
        <infon key="CUI">C0856150</infon>
        <infon key="annotator">MetaMap</infon>
        <location length="6" offset="12"/>
        <text>spinal</text>
      </annotation>

The term "hematoma" is negated by NegBio but fails to negate "spinal hematoma".

Here's the parse tree:
<infon key="parse tree">(S1 (S (S (NP (EX There)) (VP (VBZ is) (NP (DT no) (JJ spinal) (JJ canal) (NN hematoma)))) (. .)))</infon>

There's amod dependency tag edge between "spinal" and "hematoma".

<relation id="R2">
          <infon key="dependency">amod</infon>
          <node refid="T3" role="dependant"/>
          <node refid="T5" role="governor"/>
        </relation>

where T3 represents the word "spinal" and T5 represents the word "hematoma".

How should we handle this issue?
"no spinal canal hematoma" is identified as a noun phrase which begins with "no".
Shouldn't both the term "hematoma" as well as "spinal hematoma" come up as negation?

xml dump of the collection just before executing negdetect.detect(document, neg_detector) i.e. after parse tree and dependency tree have been formed is shared here: http://collabedit.com/b2e33

@yfpeng
Copy link
Collaborator

yfpeng commented Mar 1, 2019

negbio cannot handle this case right now because it should be "spinal canal hematoma" not just "spinal" to be recognized as C0856150. It is an error produced by MetaMap. An alternative way is creating a dictionary that contains "spinal canal hematoma" and then using the chexpert labeler to recognize it.

Please see https://negbio.readthedocs.io/en/latest/user_guide.html#named-entity-recognition

@kaushikacharya
Copy link
Contributor Author

kaushikacharya commented Mar 3, 2019

Hi @yfpeng
I checked the output of MetaMap and found that the issue is in NegBio.
There are four different ways of Positional Information as mentioned in Metamap documentation.

https://github.com/ncbi-nlp/NegBio/blob/master/negbio/pipeline/dner_mm.py#L58

m = re.match(r'(\d+)/(\d+)', concept.pos_info)

Here we are only handling the 1st type i.e. the simplest form where the concept's text is a contiguous block of characters.

Here's the output of pyMetaMap for the example case in this issue:

ConceptMMI(index='1', mm='MMI', score='16.15', preferred_name='Spinal Canal', cui='C0037922', semtypes='[bsoj]', trigger='["Spinal Canal"-tx-1-"spinal canal"-noun-0]', location='TX', pos_info='13/12', tree_codes='A02.835.232.834.803')

ConceptMMI(index='1', mm='MMI', score='16.09', preferred_name='Pulp Canals', cui='C0086881', semtypes='[bsoj]', trigger='["Canal"-tx-1-"canal"-noun-0]', location='TX', pos_info='20/5', tree_codes='A14.549.167.900.265')

ConceptMMI(index='1', mm='MMI', score='13.09', preferred_name='Hematoma', cui='C0018944', semtypes='[patf]', trigger='["HEMATOMA"-tx-1-"hematoma"-noun-1]', location='TX', pos_info='26/8', tree_codes='C23.550.414.838')

ConceptMMI(index='1', mm='MMI', score='3.78', preferred_name='spinal hematoma', cui='C0856150', semtypes='[inpo]', trigger='["spinal hematoma"-tx-1-"spinal hematoma"-noun-1]', location='TX', pos_info='13/6,26/8', tree_codes='')

ConceptMMI(index='1', mm='MMI', score='3.63', preferred_name='Hematoma Adverse Event', cui='C1962958', semtypes='[fndg]', trigger='["Hematoma"-tx-1-"hematoma"-noun-1]', location='TX', pos_info='26/8', tree_codes='')

ConceptMMI(index='1', mm='MMI', score='3.48', preferred_name='Body Parts - Canal', cui='C1550227', semtypes='[bpoc]', trigger='["Canal"-tx-1-"canal"-noun-0]', location='TX', pos_info='20/5', tree_codes='')

ConceptMMI(index='1', mm='MMI', score='3.48', preferred_name='Geographic canal', cui='C0442636', semtypes='[geoa]', trigger='["Canal"-tx-1-"canal"-noun-0]', location='TX', pos_info='20/5', tree_codes='')

The spinal hematoma concept [Positional Information: (13/6,26/8)] is of type (b) of positional information i.e. disjoint text strings.
Currently in the NegBio code, re.match() is only returning the 1st match.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants