## Appendix B. Python Files


Reusable functions are stored in two python files, namely my_jcamp.py and common.py. my_jcamp is taken from https://pypi.python.org/pypi/jcamp. This file reads a jcamp file and stores all information into a dictionary. Refer to the link provided for more details. Descriptions on each function in my_jcamp and common.py are described below.

***

**Table of contents**   
#### &nbsp;&nbsp; I. my_jcamp.py

#### &nbsp;&nbsp; II. common.py


***

### I. my_jcamp.py

Below describes some of the often used methods in my_jcamp class.

JCAMP_reader(filename):
    
    '''
    Uses jcamp_read() function to read a JDX-format file and return a dictionary containing the header info, a 1D numpy 
    vectors `x` for
    the abscissa information (e.g. wavelength or wavenumber) and `y` for the ordinate information (e.g.
    transmission).

    Parameters
    ----------
    filehandle : str
        The object representing the JCAMP-DX filename to read.

    Returns
    -------
    jcamp_dict : dict
        The dictionary containing the header and data vectors.
    '''

JCAMP_calc_xsec(jcamp_dict, wavemin=None, wavemax=None, skip_nonquant=True, debug=False)

    '''
    Taking as input a JDX file, extract the spectrum information and transform the absorption spectrum
    from existing units to absorption cross-section.

    This function also corrects for unphysical data (such as negative transmittance values, or
    transmission above 1.0), and calculates absorbance if transmittance given. Instead of a return
    value, the function inserts the information into the input dictionary.

    Note that the conversion assumes that the measurements were collected for gas at a temperature of
    296K (23 degC).

    Parameters
    ----------
    jcamp_dict : dict
        A JCAMP spectrum dictionary.
    wavemin : float, optional
        The shortest wavelength in the spectrum to limit the calculation to.
    wavemax : float, optional
        The longest wavelength in the spectrum to limit the calculation to.
    skip_nonquant: bool
        If True then return "None" if the spectrum is missing quantitative data. If False, then try \
        to fill in missing quantitative values with defaults.
    '''

is_float(s):

    '''
    Test if a string, or list of strings, contains a numeric value(s).

    Parameters
    ----------
    s : str, or list of str
        The string or list of strings to test.

    Returns
    -------
    is_float_bool : bool or list of bool
        A single boolean or list of boolean values indicating whether each input can be converted into a float.
    '''


### II. common.py

Below describes all of the functions within common.py.

create_periodic_table(file_name)

    ''' 
    Read a text file line-by-line and extract information for: atomic number, atomic symbol, and relative atomic mass 

    The author is aware of at least one python periodic elements package (e.g., 
    https://pypi.python.org/pypi/periodictable), but chose to implement a different approach.
    '''

extract_unique_elements(df)

    '''
    Given a pandas DataFrame containing a column of chemical formula, return the DataFrame 
    with an additional column containing unique elements in the formula called df['Elements']
    
    Args:
    df_Formula=a column of chemical formula in a pandas DataFrame
    
    Return:
    a list of unique elements
    '''

shorten_df_by_elements_list(df_el, elements_list, all_or_any)

    '''
    Reduce number of entries in DataFrame by keeping only those that contain elements indicated on elements_list
    
    Args:
    df_el=the DataFrame containing the distinct elements in each entry
    all_or_any=specify if the output DataFrame should contain all or any of the elements indicated on elements_list
    
    Returns:
    df_el_filt=pandas DataFrame
    '''

calc_molec_weight(nist_chem_list, periodic_table)

    '''
    Calculate molecular weight, Mw, given a chemical formula, df.Formula 
    #ref.: https://stackoverflow.com/questions/41818916/calculate-molecular-weight-based-on-chemical-formula-using-python

    Args:
    nist_chem_list=a pandas DataFrame from which Mw will be calculated.
    periodic_table=a periodic table dictionary from which the atomic weight of each element can be retrieved.

    Returns:
    the same dataframe, but with an extra column added. This extra column is labelled as 'Mw'.
    '''
    
standardize_units(jcamp_dict)
        
    '''
    Given a jcamp dictionary, standardize x and y units to 1/cm and absorbance, respectively and return a modified 
    dictionary. 
    
    Args:
    jcamp_dict=a jcamp dictionary    
    
    Returns:
    jcamp_dict=a modified jcamp dictionary   
    '''

'''
#baseline subtraction function
#ref.: https://raw.githubusercontent.com/zmzhang/airPLS/master/airPLS.py

#!/usr/bin/python

airPLS.py Copyright 2014 Renato Lombardo - renato.lombardo@unipa.it
Baseline correction using adaptive iteratively reweighted penalized least squares

This program is a translation in python of the R source code of airPLS version 2.0
by Yizeng Liang and Zhang Zhimin - https://code.google.com/p/airpls
Reference:
Z.-M. Zhang, S. Chen, and Y.-Z. Liang, Baseline correction using adaptive iteratively reweighted penalized least squares. Analyst 135 (5), 1138-1146 (2010).

Description from the original documentation:

Baseline drift always blurs or even swamps signals and deteriorates analytical results, particularly in multivariate analysis.  It is necessary to correct baseline drift to perform further data analysis. Simple or modified polynomial fitting has been found to be effective in some extent. However, this method requires user intervention and prone to variability especially in low signal-to-noise ratio environments. The proposed adaptive iteratively reweighted Penalized Least Squares (airPLS) algorithm doesn't require any user intervention and prior information, such as detected peaks. It iteratively changes weights of sum squares errors (SSE) between the fitted baseline and original signals, and the weights of SSE are obtained adaptively using between previously fitted baseline and original signals. This baseline estimator is general, fast and flexible in fitting baseline.


LICENCE
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>
'''


WhittakerSmooth(x,w,lambda_,differences=1)

    '''
    Penalized least squares algorithm for background fitting
    
    input
    x: input data (i.e. chromatogram of spectrum)
    w: binary masks (value of the mask is zero if a point belongs to peaks and one otherwise)
    lambda_: parameter that can be adjusted by user. The larger lambda is,  the smoother the resulting background
    differences: integer indicating the order of the difference of penalties
    
    output
    the fitted background vector
    '''
airPLS(x, lambda_=100, porder=1, itermax=15)

    '''
    Adaptive iteratively reweighted penalized least squares for baseline fitting
    
    input
    x: input data (i.e. chromatogram of spectrum)
    lambda_: parameter that can be adjusted by user. The larger lambda is,  the smoother the resulting background, z
    porder: adaptive iteratively reweighted penalized least squares for baseline fitting
    
    output
    the fitted background vector
    '''

baseline_subtract(data_dict)

    '''
    Given a jcamp dictionary, this function performs a baseline subtraction
    
    Args:
    data_dict=a dictionary that is generated using my_jcamp.py from a jcamp file.
    
    Return:
    data_dict=a dictionary of treated spectrum.    
    '''

def treat_spectra(data_dict)

    '''
    Uniformize spectra units (x- and y-axes) to make them all comparable with one another.
    
    Args:
    data_dict=a dictionary that is generated using my_jcamp.py from a jcamp file.
    
    Return:
    data_dict=a dictionary of treated spectrum.
    '''

pick_peaks(compound_name, x, y)

    '''
    Identify peaks maxima. This algorithm is a 1-D search. It doesn't account for the x-values.
    ref.: #https://stackoverflow.com/questions/31016267/peak-detection-in-python-how-does-the-scipy-signal-find-peaks-cwt-
    function-work
        
    Args:
    compound_name=to be used asa title in the generated plot (string).
    x=x-values (list)
    y=y-values (list)
    
    Returns:
    uses peakutils package to pick peaks in 1-D and plot out the x,y values, along with the identified peak maxima.
    '''
