# Abbreviation matching

* [scispacy abbreviationdetector](https://github.com/allenai/scispacy#abbreviationdetector) can register the provisioned abbreviations e.g. ```StackOverflow (SO)``` and record their locations.

In [45]:
import re
import spacy
from scispacy.abbreviation import (
    AbbreviationDetector
)

# Sci Spacy Language Model

In [5]:
nlp = spacy.load("en_core_sci_lg")

  deserializers["tokenizer"] = lambda p: self.tokenizer.from_disk(  # type: ignore[union-attr]


## Abbreviation Detector into the pipeline

In [None]:
nlp.add_pipe("abbreviation_detector")

# Register the abbreviations and detection

In [58]:
text = """
Spinal and bulbar muscular atrophy (SBMA) is an inherited motor neuron disease caused by the expansion
of a polyglutamine tract within the androgen receptor (AR). SBMA can be caused by this easily.
"""
doc = nlp(' '.join(text.split()))

# Detected abbreviations

Word positions ```(token.i)``` in the text.

In [59]:
for abrv in doc._.abbreviations:
    print(f"{abrv.text:<15}: ({abrv.start}, {abrv.end}) {abrv._.long_form}")
    
abbreviations = {
    abrv.start: abrv._.long_form for abrv in doc._.abbreviations
}

SBMA           : (6, 7) Spinal and bulbar muscular atrophy
SBMA           : (30, 31) Spinal and bulbar muscular atrophy
AR             : (27, 28) androgen receptor


In [60]:
for token in doc:
    print(f"{token.i:<4} {token.text:20} : {abbreviations.get(token.i, '')}")

0    Spinal               : 
1    and                  : 
2    bulbar               : 
3    muscular             : 
4    atrophy              : 
5    (                    : 
6    SBMA                 : Spinal and bulbar muscular atrophy
7    )                    : 
8    is                   : 
9    an                   : 
10   inherited            : 
11   motor                : 
12   neuron               : 
13   disease              : 
14   caused               : 
15   by                   : 
16   the                  : 
17   expansion            : 
18   of                   : 
19   a                    : 
20   polyglutamine        : 
21   tract                : 
22   within               : 
23   the                  : 
24   androgen             : 
25   receptor             : 
26   (                    : 
27   AR                   : androgen receptor
28   )                    : 
29   .                    : 
30   SBMA                 : Spinal and bulbar muscular atrophy
31   can        

In [61]:
token = doc[0]
token.i

0

In [62]:
def replace_abbreviation(text):
    doc = nlp(text)
    tokens = [token.text for token in doc]
    for abrv in doc._.abbreviations:
        tokens[abrv.start] = str(abrv._.long_form)
        
    return(" ".join(altered_tok))

replace_acronyms(text)

'\n Spinal and bulbar muscular atrophy ( Spinal and bulbar muscular atrophy ) is an inherited motor neuron disease caused by the expansion \n of a polyglutamine tract within the androgen receptor ( androgen receptor ) . Spinal and bulbar muscular atrophy can be caused by this easily . \n'