# Biomedical NLP

## Rule-based TNM Extraction

This example shows a simplistic and somewhat problematic regular expression for matching TNM expressions.
A more realistic solution can be found here: https://github.com/hpi-dhc/onco-nlp/blob/master/onconlp/classification/rulebased_tnm.py

In [None]:
import re

tnm_pattern = r"T\d+[a-zA-Z]*N\d+[a-zA-Z]*M\d+[a-zA-Z]*"

def check_valid(text):
    print("valid" if re.match(tnm_pattern, text) else "not valid")

In [None]:
check_valid('T1N0M1')

In [None]:
check_valid('T1aN2M3')

In [None]:
check_valid('T123')

In [None]:
check_valid('T8N9M9')

In [None]:
check_valid('T1')

In [None]:
check_valid('T8N9M9')

In [None]:
check_valid('T1 N0 M1')

## A more complex NLP Pipeline

Here, we are using the spaCy library with [scispaCy](https://allenai.github.io/scispacy/) models for domain-specific entity extraction. We also use scispaCy's entity linker to map entities to the MeSH vocabulary for normalization.

In [None]:
!conda install nmslib

In [None]:
!pip install scispacy==0.5.1

In [None]:
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_sm-0.5.1.tar.gz

In [None]:
import spacy
from scispacy.linking import EntityLinker

nlp = spacy.load('en_core_sci_sm')
nlp.add_pipe("scispacy_linker", config={"resolve_abbreviations": True, "linker_name": "mesh", "k" : 5})

In [None]:
text = "The patient underwent a CT scan in April. It did not reveal any abnormalities."

In [None]:
doc = nlp(text)

### Linguistic Analysis

Boundary detection / sentence splitting

In [None]:
for s in doc.sents:
    print(s)

In [None]:
sentence = list(doc.sents)[0]

Tokenization

In [None]:
for token in sentence:
    print(token)

Part-of-speech tagging

In [None]:
for token in sentence:
    print(token, token.pos_)

Noun chunking

In [None]:
for token in sentence.noun_chunks:
    print(token)

Dependency parsing

In [None]:
from spacy import displacy

In [None]:
displacy.render(sentence, style="dep", jupyter=True, options={'distance' : 100})

## Information Extraction

Entity extraction

In [None]:
for e in sentence.ents:
    print('Entity:', e)

Entity normalization / linking

In [None]:
from IPython.display import display_markdown

In [None]:
linker = nlp.get_pipe("scispacy_linker")

In [None]:
for e in sentence.ents:
    display_markdown(f'__Entity: {e}__', raw=True)
    for entity_id, prob in e._.kb_ents:
        mesh_term = linker.kb.cui_to_entity[entity_id]
        print('Probability:', prob)
        print(mesh_term)

# Gene Named Entity Recognition

In [None]:
!pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_ner_bionlp13cg_md-0.5.1.tar.gz

In [None]:
text = """Dual MAPK pathway inhibition with BRAF and MEK inhibitors in BRAF(V600E)-mutant NSCLC 
might improve efficacy over BRAF inhibitor monotherapy based on observations in BRAF(V600)-mutant melanoma"""

Specialized model for biological entities

In [None]:
bionlp = spacy.load('en_ner_bionlp13cg_md')
biodoc = bionlp(text)

In [None]:
for e in biodoc.ents:
    print('Entity:', e, ', Label:', e.label_)

In [None]:
displacy.render(biodoc, style='ent', jupyter=True)