# Convert Jama Glossary to LaTeX

1. In Jama, go to the glossary and choose *Export* $\to$ *Excel* to write out `CTA-Glossary.xls`

2. The ID column is not correctly read if you use XLS format, so open the result in Excel or Numbers, and export it in XLSX format

3. read it into a Pandas DataFrame:

Note that the header is on row 3, so we need to specify that (the rows before that will be ignored)

In [1]:
import pandas as pd
import numpy as np
import re

In [2]:
glossary = pd.read_excel(
    "CTA-Glossary-2.xlsx", 
    header=3, 
    sheet_name='Sheet1',
    usecols=[0,1,2,3,4,5],
    converters={'ID': lambda x: str(x)}
)  

extract the acronyms if available by searching for parentheses, and split the acronym from the expanded version (used later to properly lable acronym-like glossary entries)

In [3]:
def is_acronym(name):
    match = re.match(pattern='.*(\(.*\)).*', string=name)
    if not match:
        return None
    if match:
        abbrev = match.group(1)[1:-1]
        return abbrev

def get_shortname(name):
    return re.sub(pattern='\(.*\)', repl='', string=name).strip()

In [4]:
glossary = glossary.dropna(subset=['Description']) # get rid of undefined terms
glossary['Acronym'] = glossary['Name'].apply(is_acronym)
glossary['ShortName'] = glossary['Name'].apply(get_shortname)

In [5]:
glossary.head()

Unnamed: 0,Modified Date,Last Activity Date,Name,Description,ID,Status,Acronym,ShortName
1,16/05/2018,30/10/2018,CTAO,"The Cherenkov Telescope Array Observatory, an ...",CTA_-GLOS-206,Stable,,CTAO
2,16/05/2018,30/10/2018,CTA North,CTA Observation site hosting an Array of Chere...,CTA_-GLOS-207,Stable,,CTA North
3,16/05/2018,30/10/2018,CTA South,CTA Observation site hosting an Array of Chere...,CTA_-GLOS-208,Stable,,CTA South
4,19/10/2018,30/10/2018,Headquarters,"The primary centre for CTAO governance, admini...",CTA_-GLOS-209,Stable,,Headquarters
5,19/10/2018,30/10/2018,Science Data Management Centre (SDMC),The primary centre for the management of CTA d...,CTA_-GLOS-210,Stable,SDMC,Science Data Management Centre


Define some format strings for glossaries and acronym entries:

In [6]:
glossary_rec = """
\\newglossaryentry{{{name}}}{{
    name={{{name}}}, 
    description={{{description} (\emph{{{ident}}})}}
}}
"""

In [7]:
# this one uses cross-linking of a glossary and an acronym list, but I coudln't get it to work properly
acronym_rec = """
\\newglossaryentry{{gl{label}}}{{
    name={{{abbrev}}},
    description={{ {description} (\emph{{{ident}}})}}
}}

\\newglossaryentry{{{label}}}{{
    type=\\acronymtype, 
    name={{{abbrev}}}, 
    description={{{name}}}, 
    first={{{name} ({abbrev})\\glsadd{{gl{label}}}}}, 
    see=[Glossary:]{{gl{label}}}
}}

"""

In [8]:
# a more simplified version for acronyms as glossary entries only
acronym_rec = """
\\newglossaryentry{{{label}}}{{
    name={{{abbrev}}}, 
    description={{{description}}}, 
    first={{{name} ({abbrev})}}, 
}}
"""

In [9]:
import re
def convert_to_glossary(acro, name, description, ident):
    """
    convert a row in the table to a glossary or acronym entry
    """
    name = name.strip()
    description= description.strip()
    description = description.replace('_', r'\_')
    description = description.replace('%', r'\%')
    description = description.replace('\n', ' ')
    description =  re.sub('[^\x00-\x7F]+',' ', description ) # remove non-ascii chars
   
    ident = ident.strip()
    ident = ident.replace('CTA_', 'CTA')
    ident = ident.replace('_', r'\_') 
    
    if acro is not None: # if it's an acronym
        return acronym_rec.format(
            label=acro, abbrev=acro, name=name, description=f"({name}) {description}", ident=ident
        )
    
    # otherwise regular glossary entry
    return glossary_rec.format(name=name, description=description, ident=ident)
    

Loop through the rows and write out a glossary entry as a LaTeX .inc file that you can use by 
```latex
\input cta-glossary-defs.inc
```
in the LaTeX file, and then later:
```latex
This is an example of \glspl{Dark Pedestal} calculated in the \gls{OES}
```

In [10]:
with open("cta-glossary-defs.inc", 'w') as outfile:
    for acro, name, description, ident in zip(glossary.Acronym, glossary.ShortName, glossary.Description, glossary.ID):
        outfile.write(convert_to_glossary(acro, name, description, ident))


In [11]:
! tail -n 40 cta-glossary-defs.inc

\newglossaryentry{Availability}{
    name={Availability}, 
    description={The ability of an item or system to be in a state to perform a required function under given conditions over a given time interval assuming that the required external resources are provided. Generally, the Availability is defined by the formula A = (Uptime) / (Uptime + Downtime), where "Uptime" is the total time that the system is performing required functions and "Downtime" is the time where the system is not able to perform (can include the "time off" if corrective maintenance activities are deferred to be performed during daytime, or "MTTR" if corrective maintenance activities can be done during night in safe conditions, see ECA). (\emph{CTA-GLOS-312})}
}

\newglossaryentry{ACMT}{
    name={ACMT}, 
    description={(Active Corrective Maintenance Time) The direct time spent by maintenance personnel after the arrival at the location of a failure; to troubleshoot, isolate the fault, repair and complete a