# Intro to clinical NLP

### What is clinical NLP and what can it be used for?

- Written or dictated clinical narratives, e.g. admission notes, treatment plans, patient summaries, contain much information about the patient
- Mine published research (information retrieval) ; e.g. mine clinicaltrials.gov/pubmed abstracts
- Phewas - phenotype identification
- Clinical trial enrollment - match patients to enrolling trials based on inclusion/exclusion criteria

### However, these are challenging to analyze
- Heterogeneous (no “standard note”)
- Author- and domain-specific idiosyncrasies, acronyms, abbreviations
- Typing/spelling errors

### Use case. **Information extraction**
- Read medical charts/notes to determine if patient has fallen within the past month.
- Read medical charts to determine if the patient is a smoker.

### Use case. **Question answering**
- Has the patient fallen within the past month?
- What are the symptoms of Zika virus?
- How many years has the patient been a smoker?

### NLP is usually probabilistic, for example...
- A word, clause, or sentence often has multiple possible meanings
- Homonymns: lies, bank (river bank or financial institution)
- “Fruit flies like a banana.”
- We build statistical models that assign probabilities to the multiple possible meanings
- Take the meaning with the highest probability

### Shallow vs. deep NLP
- Shallow tasks
  1. Partial string matching
  2. Pattern matching (regular expressions)
  3. N-grams (tokens, characters)
- BOW does not distinguish between "Dog bites man" vs. "Man bites dog"
- Deep (natural language understanding)
  1. Parsing. “Parsing is a small step towards finding the meaning of a linguistic utterance.”; original sentence has structure which establishes the relationship(s) between different entities and properties. This structure is captured in the parse tree of the sentence.
  2. A natural language parser is a program that works out the grammatical structure of sentences, for instance, which groups of words go together (as "phrases") and which words are the subject or object of a verb.

### Named entity recognition
- Named entities are definite noun phrases that refer to specific types of individuals (organizations, persons, dates, medications, treatments, procedures, signs, symptoms)

- Task: map noun phrases to controlled vocabularies (e.g. UMLS, SNOMED-CT concepts)

### Concept graphs/ Semantic Relatedness
**WordNet** is a semantically oriented dictionary of English, similar to a traditional thesaurus but with a richer structure.

This is useful in searching.  For example, searching for "virus" should return documents that mention "Zika".

### Areas of research
- Negation detection
- Named entity recognition outside of noun phrases
  * “A small mass was found in the left hilum of the lung.”
- Pronoun resolution
- Parsing (i.e. diagramming the sentence)
- Recognizing textual entailment
    * Determine whether given piece of text implies another text
    * Example:
        - Hypothesis: "Sandra Goudie was defeated by Max Purnell"
        - Retrieved Text: "Sandra Goudie was first elected to Parliament in the 2002 elections, narrowly winning the seat of Coromandel by defeating Labour candidate Max Purnell and pushing incumbent Green MP Jeanette Fitzsimons into third place."
        - Does the text provide enough evidence to support the hypothesis?

### Resources
- Computational linguistics at Yale
- Clinical Corpora: 
 * http://compbio.ucdenver.edu/ccp/corpora/obtaining.shtml
 * I2b2: https://www.i2b2.org/NLP/DataSets/Main.php 
 * FDA adverse event reports: https://open.fda.gov/