The purpose of this script is to update the NASA ADS synonyms files with AGU index terms related to Heliophysics and Space Weather

Relevant resources:
- ADS synonyms files: 
    - simple (single term) synonyms: data/ads_simple_synonyms.txt
    - multi-term synonyms: data/ads_multi_synonyms.txt
- AGU Index Terms: data/agu-index-terms.xlsx
- Heliohpysics Acronyms (generated from 2013 Decadal Survey): data/solar_physics_acronyms.csv

Dependencies:
- [NLTK](https://www.nltk.org); and you must download the NLTK corpora

Oustanding items:
- We are not yet determining whether similar terms to the AGU index terms already exist in the ADS synonyms files. We need to explore that and if they do simply add the AGU index term as a synonym for that existing term
- relying on common language model to generate word variations (wordnet) and we would benefit greatly from a science-based language model -> NEED: to somehow efficiently remove superfluous/non-scientific terms from the words returned from the synonym_antonym_extractor function
- does ADS/Apache Solr handle antonyms?

In [1]:
import os, json
import numpy as np
import pandas as pd

import re

import numpy as np
import pandas as pd
import math
import matplotlib.pyplot as plt

import nltk

In [2]:


def synonym_antonym_extractor(phrase):
    '''
    Obtained from: https://www.holisticseo.digital/python-seo/nltk/wordnet
    NOTE: returns null sets for commons stop words (e.g., 'the', 'and')
    example: synonym_antonym_extractor('x-ray')
    '''
    from nltk.corpus import wordnet
    synonyms = []
    antonyms = []

    for syn in wordnet.synsets(phrase):
        for l in syn.lemmas():
            synonyms.append(l.name())
            if l.antonyms():
                antonyms.append(l.antonyms()[0].name())

    return(set(synonyms),set(antonyms))
#     print(set(synonyms))
#     print(set(antonyms))


def get_all_variations(list_of_lists):
    '''
    get all unique combinations of N lists
    requires passing a list of lists: list_of_lists = [[list1],[list2],...]
    obtained from: https://www.geeksforgeeks.org/python-all-possible-permutations-of-n-lists/
    example: get_all_variations( [[1,2,3],[4,5,6],[7,8,9]] )
    '''
    import itertools
    # using itertools.product()  
    # to compute all possible permutations
    all_combinations = list(itertools.product(*list_of_lists))
    return all_combinations
    
    

### Read in AGU index terms

In [3]:
# Read in AGU Index Terms
pd_agu = pd.read_excel('data/agu-index-terms.xlsx')
pd_agu


for r in range(len(pd_agu)): 
    if ~( (pd_agu['Code'][r] >= 1900) & (pd_agu['Code'][r] < 1999) |
         (pd_agu['Code'][r] >= 2100) & (pd_agu['Code'][r] < 2199) |
         (pd_agu['Code'][r] >= 2400) & (pd_agu['Code'][r] < 2499) |
         (pd_agu['Code'][r] >= 2700) & (pd_agu['Code'][r] < 2799) |
         (pd_agu['Code'][r] >= 3200) & (pd_agu['Code'][r] < 3299) |
         (pd_agu['Code'][r] >= 4300) & (pd_agu['Code'][r] < 4399) |
         (pd_agu['Code'][r] >= 6900) & (pd_agu['Code'][r] < 6999) |
         (pd_agu['Code'][r] >= 7500) & (pd_agu['Code'][r] < 7599) |
         (pd_agu['Code'][r] >= 7800) & (pd_agu['Code'][r] < 7899) ):
#         print('Code = {} --> Term = {}'.format(pd_agu['Code'][r],pd_agu['Description'][r]))
        pd_agu = pd_agu.drop([r])

for r in pd_agu.index:
    if '(' in pd_agu['Description'][r]:
#         print('prior = {}'.format(pd_agu['Description'][r]))
        pd_agu['Description'][r] = pd_agu['Description'][r][0:pd_agu['Description'][r].find('(')-1]
#         print('post = {}'.format(pd_agu['Description'][r]))

# Transforming into a new dataframe that can be combined with other glossaries
pd_agu_terms = pd.DataFrame(columns=['term','definition'])
pd_agu_terms['term'] = pd_agu['Description']
pd_agu_terms

pd_agu_terms['term'] = pd_agu_terms['term'].str.lower()

pd_agu_terms['source'] = np.tile('agu',(len(pd_agu_terms),1))
pd_agu_terms


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  pd_agu['Description'][r] = pd_agu['Description'][r][0:pd_agu['Description'][r].find('(')-1]


Unnamed: 0,term,definition,source
432,informatics,,agu
433,community modeling frameworks,,agu
434,community standards,,agu
435,"computational models, algorithms",,agu
436,cyberinfrastructure,,agu
...,...,...,...
1154,transport processes,,agu
1155,turbulence,,agu
1156,wave/particle interactions,,agu
1157,wave/wave interactions,,agu


In [4]:
for r in pd_agu_terms['term'].values[0:10]:
    print(r)
    

informatics
community modeling frameworks
community standards
computational models, algorithms
cyberinfrastructure
data assimilation, integration and fusion
data management, preservation, rescue
data mining
data and information discovery
decision analysis


### Read in acronyms pulled from [2013 Solar and Space Physic Decadal Survey](https://nap.nationalacademies.org/read/13060/chapter/1)

In [5]:
file_acronyms = 'data/solar_physics_acronyms.csv'
pd_acronyms = pd.read_csv(file_acronyms)

pd_acronyms

Unnamed: 0,acronyms,terms
0,AAS,American Astronomical Society
1,ACE,Advanced Composition Explorer
2,ACR,anomalous cosmic ray
3,AFOSR,Air Force Office of Scientific Research
4,AFRL,Air Force Research Laboratory
...,...,...
295,WHI,Whole Heliosphere Interval
296,WLC,White Light Coronagraph
297,WPI,wave-particle interaction
298,WSA-Enlil,Wang-Sheely-Arge-Enlil


In [6]:
# Make some corrections to the acronyms
for val in range(len(pd_acronyms)):
    
    # Catch abbreviations with periods in them
    pd_acronyms['terms'][val] = pd_acronyms['terms'][val].replace('.','')

    # Catch abbreviations with commas in them
    pd_acronyms['terms'][val] = pd_acronyms['terms'][val].replace(',','')
    
    # Catch some specific fixes
    pd_acronyms['terms'][val] = pd_acronyms['terms'][val].replace('-',' ')
pd_acronyms

Unnamed: 0,acronyms,terms
0,AAS,American Astronomical Society
1,ACE,Advanced Composition Explorer
2,ACR,anomalous cosmic ray
3,AFOSR,Air Force Office of Scientific Research
4,AFRL,Air Force Research Laboratory
...,...,...
295,WHI,Whole Heliosphere Interval
296,WLC,White Light Coronagraph
297,WPI,wave particle interaction
298,WSA-Enlil,Wang Sheely Arge Enlil


### Read in ADS synonyms

In [7]:
simple_syns_file = '/Users/ryanmcgranaghan/Documents/Helio_ECIP/dev/Helio-KNOW/ADS_enrichment/data/ads_simple_synonyms.txt'
f = open(simple_syns_file,"r")
txt_data_simple_syns = f.read().split('\n')#.remove('')
f.close()

simple_syns_data = [x.split('=>') for x in txt_data_simple_syns]
pd_simple_syns = pd.DataFrame(simple_syns_data,columns=['words','ADS term'])
pd_simple_syns

multi_syns_file = '/Users/ryanmcgranaghan/Documents/Helio_ECIP/dev/Helio-KNOW/ADS_enrichment/data/ads_multi_synonyms.txt'
f = open(multi_syns_file,"r")
txt_data_multi_syns = f.read().split('\n')#.remove('')
f.close()

txt_data_multi_syns = txt_data_multi_syns[:-1]


In [109]:
# Extract acronyms from multi-term synonyms file for later determining if Heliophysics acronyms already exist
extracted_acronyms = []
for val in txt_data_multi_syns:
    
    if (not val):
        continue
    elif (val[0] == '#'):
        continue
    else:
        
        extracted_acronyms.append(val.split(',')[0])


    

### Method of Identifying Synonyms: OpenAlex

This uses the [OpenAlex](https://explore.openalex.org) resource to identify related concepts and then prunes those results. OpenAlex is an open and comprehensive catalog of scholarly papers, authors, institutions, and more. 


TODO: look through UAT synonyms and remove duplicate/fuse similar terms


In [8]:
# Code from Donny Winston

from urllib.parse import urlencode

import requests

OPENALEX_HEADERS = {"User-Agent": "mailto:ryan.mcgranaghan@gmail.com"}
OPENALEX_API_BASE_URL = "https://api.openalex.org"

def openalex_get(path, params=None, page=1):
    url = f"{OPENALEX_API_BASE_URL}{path}"
    if params:
        url += f"?{urlencode(params)}&page={page}&per-page=200"
    return requests.get(url, headers=OPENALEX_HEADERS)


from toolz import get_in

from toolz import keyfilter

def pick(whitelist, d):
    return keyfilter(lambda k: k in whitelist, d)

def count_and_results(rv):
    d = rv.json()
    count = get_in(['meta', 'count'], d)
    results = [pick(['id', 'display_name', 'description'], r) for r in d['results']]
    return count, results

In [63]:
# read in synonyms files
with open('data/ads_simple_synonyms.txt') as f:
    ADS_simple_syns = [line for line in f.readlines()]
    
with open('data/ads_multi_synonyms.txt') as f:
    ADS_multi_syns = [line for line in f.readlines()]
    


In [79]:
any('astronomical journal' in x for x in ADS_multi_syns)

True

In [17]:
rv = openalex_get("/concepts", {"search": 'informatics'})
d=rv.json()
for r in d['results']:
    print('{} - {} - {}'.format(r['id'], r['display_name'], r['description']))
    

https://openalex.org/C145642194 - Health informatics - discipline at the intersection of information science, computer science, and health care
https://openalex.org/C191630685 - Informatics - academic field
https://openalex.org/C73851307 - Translational research informatics - discipline of applied computer science research concerned with the application of informatics theory and methods to translational research
https://openalex.org/C158518442 - Engineering informatics - discipline combining information technology (IT) – or informatics – with engineering concepts
https://openalex.org/C141363852 - Health Administration Informatics - None
https://openalex.org/C106476913 - Public health informatics - None
https://openalex.org/C62085286 - Materials informatics - None
https://openalex.org/C167785021 - Health informatics tools - None
https://openalex.org/C27518888 - Business informatics - science that deals with development and application of information and communication systems in business

In [111]:

unaddressed_terms = []
count_unaddressed_terms = 0
list_syns_update = []

unreplaced_terms = []
count_unreplaced_terms = 0

for t in pd_agu_terms['term']:
    print('\n\nworking on term: {}'.format(t))
    
    # remove punctuation
    t_mod = t
    t_mod = t_mod.replace(',','')
    t_mod = t_mod.replace('.','')
    t_mod = t_mod.replace(':','')
    t_mod = t_mod.replace('/',' ')
    
    # First catch if this term already exists in the synonyms files and skip if it does
    if (any(t_mod in x for x in ADS_simple_syns)) | (any(t_mod in x for x in ADS_multi_syns)):
        print('...already exists in synonyms list')
        unreplaced_terms.append(t)
        count_unreplaced_terms += 1
        continue
    
    
    rv = openalex_get("/concepts", {"search": t_mod})
#     d=rv.json()
#     for r in d['results']:
#         print('{} - {} - {}'.format(r['id'], r['display_name'], r['description']))
    try:
        count, results = count_and_results(rv)
    except:
        print('problem with return for {}, continuing'.format(t_mod))
        

    if count:
#         print(count)
#         print(results)
        tmp = [r['display_name'] for r in results]
#         tmp.append(t)
        print(tmp)
#         list_syns_update.append('{} => {}'.format(', '.join(variations_synonyms),t))
        # Choosing the multi-term synonyms format for all entries
        list_syns_update.append('{}, {}'.format(', '.join(tmp),t))

    else:
        print('no matching concepts')
#         rv = openalex_get("/works", {"search": q})
#         count, results = count_and_results(rv)
#         print(count, 'matching works')
#         print(results)
        # append to a list of unaddressed terms
        unaddressed_terms.append(t)
        count_unaddressed_terms += 1
    print()
print('number of terms that do not exist in OpenAlex = {}'.format(count_unaddressed_terms))



working on term: informatics
...already exists in synonyms list


working on term: community modeling frameworks
no matching concepts



working on term: community standards
['Community standards']



working on term: computational models, algorithms
['PSQM']



working on term: cyberinfrastructure
['Cyberinfrastructure']



working on term: data assimilation, integration and fusion
no matching concepts



working on term: data management, preservation, rescue
no matching concepts



working on term: data mining
['Data mining', 'Lift (data mining)', 'Data stream mining', 'Educational data mining', 'Process mining', 'Biclustering', 'Semantic computing', 'Predictive analytics', 'Statistical arbitrage']



working on term: data and information discovery
no matching concepts



working on term: decision analysis
['Decision analysis', 'Multiple-criteria decision analysis', 'Risk assessment', 'Dominance-based rough set approach', 'ELECTRE', 'Risk stratification', 'Weighted sum model', 'Cos

['Ocean gyre', 'Classification of mental disorders', 'Psychiatric diagnosis', 'Magnetostatics', 'Z-pinch', 'Ampacity', 'Lightning arrester', 'State function', 'Prospective short circuit current']



working on term: electric fields
['Capacitor', 'Permittivity', 'Field-effect transistor', 'Polarizability', 'Excitation', 'Magnetic field', 'Electron mobility', 'Electroluminescence', 'Dielectrophoresis', 'Stark effect', 'Kerr effect', 'Linear polarization', 'Faraday cage', 'Dielectric function', 'Electric charge', 'Electric field gradient', 'Pyroelectricity', 'Electric potential', 'Dielectric permittivity', 'Drift velocity', 'Electric fish', 'Electrical mobility', 'Ion trap', 'Electroreception', 'Aharonov–Bohm effect', "Maxwell's equations", 'Pockels effect', 'Electro-optic effect', 'Crystal field theory', 'Screening effect', 'Pyroelectric crystal', 'Kerr nonlinearity', 'Gaussian surface', 'Townsend discharge', 'Electromagnet', 'Electron avalanche', 'Vacuum permittivity', 'Doubly fed elect

['Wave propagation', 'Ground wave propagation', 'Longitudinal wave', 'Shear waves', 'Helioseismology', 'Electromagnetic electron wave', 'Underwater acoustics', 'Longitudinal field', 'Transverse wave', 'Line-of-sight propagation', 'Leaky wave antenna', 'Transverse vibration', 'Airy wave theory', 'Skywave', 'Ionospheric propagation', 'Ionospheric reflection']



working on term: instruments and techniques
no matching concepts



working on term: natural hazards
['Social vulnerability', 'Protection forest']



working on term: atmospheric
...already exists in synonyms list


working on term: geological
...already exists in synonyms list


working on term: hydrological
...already exists in synonyms list


working on term: oceanic
...already exists in synonyms list


working on term: space weather
['Space weather', 'Geomagnetically induced current']



working on term: multihazards
no matching concepts



working on term: methods
['Decoding methods', 'Ab initio quantum chemistry methods', '

no matching concepts



working on term: disaster resilience
no matching concepts



working on term: disaster risk analysis and assessment
no matching concepts



working on term: disaster risk communication
no matching concepts



working on term: disaster management
no matching concepts



working on term: economic impacts of disasters
no matching concepts



working on term: remote sensing and disasters
problem with return for remote sensing and disasters, continuing
no matching concepts



working on term: disaster policy
['Disaster recovery', 'IT service continuity']



working on term: disaster mitigation
['Disaster mitigation']



['Red Color']



working on term: emergency management
['Emergency management']



working on term: preparedness and planning
no matching concepts



working on term: microzonation and macrozonation
no matching concepts



working on term: community management
['Community management', 'Community-based management', 'Participatory planning']



working 

no matching concepts



working on term: spacecraft sheaths, wakes, charging
no matching concepts



working on term: stochastic phenomena
no matching concepts



working on term: transport processes
['Endocytosis', 'Endocytic cycle', 'Bone resorption', 'Erosion', 'Swallowing', 'Exocytosis', 'Logging', 'Traction (geology)', 'Membrane technology', 'Beta oxidation', 'Ribosome biogenesis', 'Receptor-mediated endocytosis', 'Soil loss', 'Twin-arginine translocation pathway', 'Membrane filter', 'Water erosion', 'International shipping']



working on term: turbulence
...already exists in synonyms list


working on term: wave/particle interactions
no matching concepts



working on term: wave/wave interactions
['Underwater acoustics', 'Wind wave model']



working on term: instruments and techniques
no matching concepts

number of terms that do not exist in OpenAlex = 150


In [102]:
# for t2 in unaddressed_terms:
#     print('\n{}'.format(t2))
for t3 in unreplaced_terms:
    print('\n{}'.format(t3))


informatics

forecasting

interoperability

modeling

ontologies

standards

uncertainty

workflow

cosmic rays

discontinuities

interplanetary dust

interstellar gas

ionosphere

cusp

forecasting

magnetosheath

magnetotail

plasmasphere

radiation belts

substorms

prediction

atmospheric

geological

hydrological

oceanic

other

precursors

exposure

resilience

risk

vulnerability

miscellaneous

interferometry

radio astronomy

chromosphere

corona

flares

helioseismology

magnetic fields

photosphere

solar activity cycle

chaos

discontinuities

turbulence


#### Add acronyms to the bottom of the file that will be added to ADS synonyms file


In [112]:
list_syns_update.append('\n\n# acronyms added from Heliophysics 2013 Decadal survey in August 2022')

for l in range(len(pd_acronyms)):
    flag = 0
    # Check if the acronym already exists in ADS file
    for i,s in enumerate(extracted_acronyms):
        if pd_acronyms['acronyms'][l] == s:
            print('Helio acronym: {}; ADS acronym: {}'.format(pd_acronyms['acronyms'][l],extracted_acronyms[i]))
            flag = 1
            break
        
    if flag == 0:     
        txt_tmp = '{},{}'.format(pd_acronyms['acronyms'][l],pd_acronyms['terms'][l].lower())

        list_syns_update.append(txt_tmp)
        print('added...{}'.format(list_syns_update[-1]))

Helio acronym: AAS; ADS acronym: AAS
Helio acronym: ACE; ADS acronym: ACE
added...ACR,anomalous cosmic ray
added...AFOSR,air force office of scientific research
added...AFRL,air force research laboratory
added...AFWA,air force weather agency
added...AGS,atmospheric and geospace sciences division
Helio acronym: AGU; ADS acronym: AGU
added...AIA,atmospheric imaging assembly
Helio acronym: AIM; ADS acronym: AIM
added...AIMI,atmosphere ionosphere magnetosphere interactions
added...AIP,american institute of physics
added...AMISR,advanced modular incoherent scatter radar
added...AMPERE,active magnetosphere and planetary electrodynamics response experiment
added...AMPTE,active magnetospheric particle tracer explorers
added...AO,announcement of opportunity
added...APS,american physical society; active pixel sensor
added...AST,division of astronomical sciences
added...ATST,advanced technology solar telescope
added...AU,astronomical unit
Helio acronym: AURA; ADS acronym: AURA
Helio acronym: AXAF

added...MWO,mt wilson observatory
Helio acronym: NASA; ADS acronym: NASA
added...NCAR,national center for atmospheric research
added...NESDIS,national environmental satellite data and information service
added...NESSF,nasa’s earth and space science fellowship (program)
added...NGDC,national geophysical data center
Helio acronym: NIST; ADS acronym: NIST
added...NJIT,new jersey institute of technology
Helio acronym: NOAA; ADS acronym: NOAA
added...NRC,national research council
Helio acronym: NSF; ADS acronym: NSF
Helio acronym: NSO; ADS acronym: NSO
added...NST,new solar telescope
added...NSWP,national space weather program
Helio acronym: NuSTAR; ADS acronym: NuSTAR
added...NWM,neutral wind meter
added...NWS,national weather service
added...OCT,office of the chief technologist
added...OEDG,opportunities for enhancing diversity in the geosciences (program)
added...ORBITALS,outer radiation belt injection transport acceleration and loss satellite
added...OVSA,owens valley solar array
added.

In [113]:
with open(r'data/ads_OpenAlex_synonyms_updated.txt', 'w') as fp:
        
    for item1 in list_syns_update:
        # write each item on a new line
        print(item1,file=fp)   
        
    print('Done')
fp.close()

Done


### Method of Identifying Synonyms: Wordnet

In [None]:
# nltk.download()
nltk.download('wordnet')
nltk.download('omw-1.4')

#### Loop over AGU terms and add to synonyms files

This currently uses wordnet corpus from NLTK to identify synonyms of the AGU index term and composes the synonyms file update based on those synonyms

For the multi-term AGU index terms, we split apart each word, find the synonyms in wordnet, and put all permutations in the update to the ADS multi-term synonyms file

In [37]:
# phrase_to_search = pd_agu_terms['term'].iloc[1]
phrase_to_search = 'model'
print('trialing AGU index term = {}'.format(phrase_to_search))
print('output using wordnet ....\n\t')
synonym_antonym_extractor(phrase=phrase_to_search)

# for syn in wordnet.synsets("Spaces"):
#     print('{}: {}'.format(syn,syn.lemmas()))

trialing AGU index term = model
output using wordnet ....
	
{'mannequin', 'exemplar', 'manakin', 'pose', 'mould', 'role_model', 'theoretical_account', 'mold', 'good_example', 'posture', 'mock_up', 'modeling', 'pattern', 'fashion_model', 'framework', 'sit', 'mannikin', 'simulate', 'exemplary', 'poser', 'modelling', 'simulation', 'model', 'example', 'manikin'}
set()


In [184]:
list_simple_syns_update = []
list_multi_syns_update = []

# Add line of description for multi-term synonyms added (single-term text file has no such headers)
list_multi_syns_update.append('\n\n# multi-term items added from AGU Index Terms in July 2022')

#NOTE: we need to modify the AGU Index Term (removing things like commas) to match format of ADS synonyms file
#  Uncertain what effect this will have on the matching to AGU metadata

for t in pd_agu_terms['term']:
    print('\n\nworking on term {}'.format(t))
    
    if ' ' in t:
        print('\t multi-term')
        t_orig = t
        t_mod = t
        t_mod = t_mod.replace(',','')
        t_mod = t_mod.replace('.','')
        t_mod = t_mod.replace(':','')
        t_mod = t_mod.replace('/',' ')

        words = t_mod.split(' ')
        variations_synonyms = []
        variations_antonyms = []
        for w in words:
            variations_synonyms_tmp,variations_antonyms_tmp = synonym_antonym_extractor(w)
            # catch stop words which produce no variations and add the stop word back
            if not variations_synonyms_tmp:
                variations_synonyms.append([w])
            else:
                variations_synonyms.append(list(variations_synonyms_tmp))
            print(variations_synonyms)
            # not currently adding antonyms
        full_variations = get_all_variations(variations_synonyms)
#         full_variations
        multi_syns_update_loop = []
        for te in full_variations:
            multi_syns_update_loop.append(' '.join(list(te)))
            
        list_multi_syns_update.append('{}, {}\n'.format(', '.join(multi_syns_update_loop),t_mod))

    else:
        print('\t single term')
        variations_synonyms,variations_antonyms = synonym_antonym_extractor(t)
        variations_synonyms = [s.replace('_',' ') for s in variations_synonyms]
        if not variations_synonyms:
            continue
        else:
            list_simple_syns_update.append('{} => {}'.format(', '.join(variations_synonyms),t))

    
    



working on term informatics
	 single term


working on term community modeling frameworks
	 multi-term
[['residential_area', 'residential_district', 'community', 'community_of_interests', 'biotic_community']]
[['residential_area', 'residential_district', 'community', 'community_of_interests', 'biotic_community'], ['model', 'simulate', 'pose', 'mold', 'moulding', 'clay_sculpture', 'sit', 'molding', 'mock_up', 'pattern', 'modelling', 'mould', 'modeling', 'posture']]
[['residential_area', 'residential_district', 'community', 'community_of_interests', 'biotic_community'], ['model', 'simulate', 'pose', 'mold', 'moulding', 'clay_sculpture', 'sit', 'molding', 'mock_up', 'pattern', 'modelling', 'mould', 'modeling', 'posture'], ['model', 'fabric', 'theoretical_account', 'framework']]


working on term community standards
	 multi-term
[['residential_area', 'residential_district', 'community', 'community_of_interests', 'biotic_community']]
[['residential_area', 'residential_district', 'communit

In [185]:
list_multi_syns_update


['\n\n# multi-term items added from AGU Index Terms in July 2022',
 'residential_area model model, residential_area model fabric, residential_area model theoretical_account, residential_area model framework, residential_area simulate model, residential_area simulate fabric, residential_area simulate theoretical_account, residential_area simulate framework, residential_area pose model, residential_area pose fabric, residential_area pose theoretical_account, residential_area pose framework, residential_area mold model, residential_area mold fabric, residential_area mold theoretical_account, residential_area mold framework, residential_area moulding model, residential_area moulding fabric, residential_area moulding theoretical_account, residential_area moulding framework, residential_area clay_sculpture model, residential_area clay_sculpture fabric, residential_area clay_sculpture theoretical_account, residential_area clay_sculpture framework, residential_area sit model, residential_area 

#### Add acronyms to the bottom of the original multi-term synonyms file


In [94]:
txt_data_multi_syns.append('\n\n# acronyms added from Heliophysics 2013 Decadal survey in July 2022')

for l in range(len(pd_acronyms)):
    flag = 0
    # Check if the acronym already exists in ADS file
    for i,s in enumerate(extracted_acronyms):
        if pd_acronyms['acronyms'][l] == s:
            print('Helio acronym: {}; ADS acronym: {}'.format(pd_acronyms['acronyms'][l],extracted_acronyms[i]))
            flag = 1
            break
        
    if flag == 0:     
        txt_tmp = '{},{}'.format(pd_acronyms['acronyms'][l],pd_acronyms['terms'][l].lower())

        txt_data_multi_syns.append(txt_tmp)
        print('added...{}'.format(txt_data_multi_syns[-1]))
    


#TODO: 
#  edit by hand to exclude short acronyms
#  edit by hand to exclude acronyms which are too ambiguous (e.g. VO means "Virtual Observatory" in AST but "oxygen vacancy" in PHY)
#  disambiguate same acronyms
#  find and match similar acronyms

#DONE
#  find and match same acronyms




Helio acronym: AAS; ADS acronym: AAS
Helio acronym: ACE; ADS acronym: ACE
added...ACR,anomalous cosmic ray
added...AFOSR,air force office of scientific research
added...AFRL,air force research laboratory
added...AFWA,air force weather agency
added...AGS,atmospheric and geospace sciences division
Helio acronym: AGU; ADS acronym: AGU
added...AIA,atmospheric imaging assembly
Helio acronym: AIM; ADS acronym: AIM
added...AIMI,atmosphere ionosphere magnetosphere interactions
added...AIP,american institute of physics
added...AMISR,advanced modular incoherent scatter radar
added...AMPERE,active magnetosphere and planetary electrodynamics response experiment
added...AMPTE,active magnetospheric particle tracer explorers
added...AO,announcement of opportunity
added...APS,american physical society; active pixel sensor
added...AST,division of astronomical sciences
added...ATST,advanced technology solar telescope
added...AU,astronomical unit
Helio acronym: AURA; ADS acronym: AURA
Helio acronym: AXAF

added...PI,principal investigator
added...POES,polar operational environmental satellite
added...P-POD,poly picosatellite orbital deployer
added...PUI,pickup ion
added...PW,planetary waves
Helio acronym: QBO; ADS acronym: QBO
added...RE,earth radius
added...RS,sun radius
added...R&A,research and analysis
added...RAX,radio aurora explorer
added...RBSP,radiation belt storm probe; renamed van allen probes
added...REPT,relativistic electron proton telescope
added...REU,research experiences for undergraduates
added...RFI,request for information
added...RHESSI,ramaty high energy solar spectroscopic imager
added...RISR,resolute bay incoherent scatter radar
added...ROSES,research opportunities in space and earth sciences
added...RPA,ion retarding potential analyzer
added...SABER,sounding of the atmosphere using broadband emission radiometry
added...SALMON,stand alone missions of opportunities notice
added...SAMPEX,solar anomalous and magnetospheric particle explorer
added...SAPS,subauroral pol

#### Write updated single term synonyms to file
The ordering is: 
1. original data
2. agu terms

In [189]:
with open(r'data/ads_simple_synonyms_updated.txt', 'w') as fp:
    for item1 in txt_data_simple_syns:
        # write each item on a new line
        print(item1,file=fp)
        
    for item2 in list_simple_syns_update:
        # write each item on a new line
        print(item2,file=fp)   
        
    print('Done')
fp.close()

Done


#### Write updated multi-term synonyms to file
The ordering is: 
1. original data
2. acronyms
3. agu terms

In [186]:

with open(r'data/ads_multi_synonyms_updated.txt', 'w') as fp:
    for item1 in txt_data_multi_syns:
        # write each item on a new line
        print(item1,file=fp)
    
    for item2 in list_multi_syns_update:
        # write each item on a new line
        print(item2,file=fp)   
        

        
    print('Done')
fp.close()

Done
