## Abbreviation Handler Demo 

Abbreviation classes are used to substitute abbreviations with full expansions, and there are two internal developed classes:

- Abbreviation class: it is used to directly substitute the abbreviations with full expansions. Users can provide their own abbreviation dictionary.
- AbbrExpander class: it utilizes a more sophisticated method, i.e., spell checking with word similarity search, to identify abbreviations and substitute them with full expansions.

### AbbrExpander class

In [1]:
import pandas as pd
import os, sys
import time

cwd = os.getcwd()
frameworkDir = os.path.abspath(os.path.join(cwd, os.pardir, 'src'))
sys.path.append(frameworkDir)

# Load AbbrExpander from DACKAR
from dackar.text_processing.AbbrExpander import AbbrExpander


In [2]:
# Text example
test = """Perf ann sens calib of cyl.
          High conc of hydrogen obs.
          High conc of hydrogen obs every wk.
          Prfr chann calib of chan.
          esf pump room and fuel bldg test.
          cal press xmtr sit elev.
          perform thermography survey of pzr htr terminations.
          plant mods comp iso mode prep.
          drain & rmv pipe."""
test = test.lower()

text = """A leak was noticed from the pump.
            RCP pump 1A pressure gauge was found not operating.
            RCP pump 1A pressure gauge was found inoperative.
            RCP pump 1A pressure gauge was not functional.
            Rupture of pump bearings caused shaft degradation.
            Rupture of pump bearings caused shaft degradation and consequent flow reduction.
            Pump power supply has been found burnout.
            Pump test failed due to power supply failure.
            Pump inspection revealed excessive impeller degradation.
            Pump inspection revealed excessive impeller degradation likely due to cavitation.
            Oil puddle was found in proximity of RCP pump 1A.
            Anomalous vibrations were observed for RCP pump 1A.
            Several cracks on pump shaft were observed; they could have caused pump failure within few days.
"""
text = text.lower()


In [3]:
# import pre-generated abbreviation list
filename = os.path.join(os.getcwd(), os.pardir, 'data', 'abbreviations.xlsx')
abbrList = pd.read_excel(filename)
abbrList.head()

Unnamed: 0,Abbreviation,Full
0,&,and
1,ab,as built
2,abl,ablative
3,abol,abolition
4,abs,absolute


In [4]:
# Utilize AbbrExpander to replace abbreviations
AbbrExp = AbbrExpander(filename)
cleanedTest = AbbrExp.abbrProcess(test, splitToList='True')
print('Test:\n', cleanedTest)

cleanedText = AbbrExp.abbrProcess(text)
print('Text:\n', cleanedText)

Test:
 perform annual sensor calibration of cylinder. high concentration of hydrogen observe. high concentration of hydrogen observe every week. perform channel calibration of channel. esf pump room and fuel building test. calibration pressure transmitter sit elevation. perform thermography survey of pressurizer heater terminations. plant modifications composition iso mode preparation. drain and remove pipe. 
Text:
 1 leak was noticed from the pump.
 rcp pump 1a pressure gauge was found not operating.
 rcp pump 1a pressure gauge was found inoperative.
 rcp pump 1a pressure gauge was not functional.
 rupture of pump bearings caused shaft degradation.
 rupture of pump bearings caused shaft degradation and consequent flow reduction.
 pump power supply has been found burnout.
 pump test failed due to power supply failure.
 pump inspection revealed excessive impeller degradation.
 pump inspection revealed excessive impeller degradation likely due to cavitation.
 oil puddle was found in prox

### Abbreviation class 

In [5]:
# Load Abbreviation from DACKAR
from dackar.text_processing.Abbreviation import Abbreviation

abbreviation = Abbreviation()
abbrDict = abbreviation.getAbbreviation()
print(abbrDict)

{'&': 'and', 'ab': 'as built', 'abl': 'ablative', 'abol': 'abolition', 'abs': 'absolute', 'absol': 'absolute', 'abst': 'abstract', 'abstr': 'abstract', 'accep': 'acceptance', 'accom': 'accomodation', 'accomm': 'accomodation', 'admin': 'administrative', 'adv': 'advanced', 'afl': 'above floor level\xa0', 'agl': 'above ground level', 'agst': 'against', 'ah': 'after hours', 'amer': 'american', 'anal': 'analysis', 'analyt': 'analytic', 'ann': 'annual', 'answ': 'answer', 'app': 'apperently', 'approx': 'approximate', 'appt': 'appointment', 'apr': 'april', 'aql': 'acceptable quality level', 'ar': 'as required', 'arch': 'architecture', 'arrgt': 'arrangement', 'artic': 'articulation', 'asap': 'as soon as possible', 'ass': 'assembly', 'assem': 'assembly', 'assy': 'assembly', 'attrib': 'attribute', 'aug': 'august', 'auto': 'automatic', 'aux': 'auxiliary', 'avg': 'average', 'batt': 'battery', 'bc': 'bolt circle', 'bef': 'before', 'betw': 'between', 'bhc': 'bolt hole circle', 'bldg': 'building', 'bl

In [6]:
# Test
cleanedTest = abbreviation.abbreviationSub(test)
print(cleanedTest)

perform annual sensor calibration of cylinder. high concentration of hydrogen observe. high concentration of hydrogen observe every work. prfr channel calibration of channel. esf pump room and fuel building test. calibration pressure transmitter sit elevation. perform thermography survey of pressurizer heater terminations. plant modifications composite iso mode prepare. drain & remove pipe. 


In [7]:
# Utilize user provided abbreviation dictionary
abbrDict = {'perf':'perform', 'ann':'annual', 'sens':'sensor', 'calib':'calibration'}
abbreviation.updateAbbreviation(abbrDict, reset=True)
print(abbreviation.getAbbreviation())
cleanedText = abbreviation.abbreviationSub(test)
print(cleanedText)

{'perf': 'perform', 'ann': 'annual', 'sens': 'sensor', 'calib': 'calibration'}
perform annual sensor calibration of cyl. high conc of hydrogen obs. high conc of hydrogen obs every wk. prfr chann calibration of chan. esf pump room and fuel bldg test. cal press xmtr sit elev. perform thermography survey of pzr htr terminations. plant mods comp iso mode prep. drain & rmv pipe. 
