# Transmission FTIR Spectra

- This Jupyter notebook provides an example workflow for processing transmission FTIR spectra through PyIRoGlass. 

- The Jupyter notebook and data can be accessed here: https://github.com/SarahShi/PyIRoGlass/blob/main/docs/examples/transmission_ftir/. 

- You need to have the PyIRoGlass PyPi package on your machine once. If you have not done this, please uncomment (remove the #) symbol and run the cell below. 

In [None]:
#!pip install PyIRoGlass

# Load Python Packages and Data

## Load Python Packages

In [None]:
# Import packages

import os
import sys
import glob
import numpy as np
import pandas as pd

import PyIRoGlass as pig

from IPython.display import Image

import matplotlib
from matplotlib import pyplot as plt
from matplotlib import rc, cm

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

pig.__version__

## Set paths to data

In [None]:
# Change paths to direct to folder with transmission FTIR spectra 

TRANS_PATHS = 'SPECTRA/'
TRANS_FILES = sorted(glob.glob(TRANS_PATHS + "*"))
print(TRANS_PATHS)

CHEMTHICK_PATH = 'ChemThick.csv'
print(CHEMTHICK_PATH)

## Set desired output file directory name

In [None]:
# Change to be what you want the prefix of your output files to be. 
OUTPUT_PATH = 'RESULTS'
print(OUTPUT_PATH)

## Load transmission FTIR spectra

In [None]:
# Load the path to transmission FTIR spectra

DFS_FILES, DFS_DICT = pig.Load_SampleCSV(TRANS_FILES, wn_high = 5500, wn_low = 1000)

Print the name of all files in the directory 

In [None]:
DFS_FILES

Let's look at what a dictionary of transmission FTIR spectra look like. Samples are identified by their file names and the wavenumber and absorbance data are stored for each spectrum. 

In [None]:
DFS_DICT

## Load composition and thickness data

The file names from the spectra (what comes before the .CSV) are important when we load in melt compositions and thicknesses. Unique identifiers identify the same samples. Make sure that this ChemThick.CSV file has the same sample names as the spectra you load in. 

In [None]:
MICOMP, THICKNESS = pig.Load_ChemistryThickness(CHEMTHICK_PATH)

Display the dataframe of glass compositions

In [None]:
MICOMP

Display the dataframe of wafer thicknesses

In [None]:
THICKNESS

See that the sample names of the spectra in the dictionary, glass compositions and thicknesses in the dataframe all align. 

# We're ready to roll -- MCMC, here we come! 

We use the function Run_All_Spectra, which takes in two arguments: 

- Dictionary of spectra
- Desired output directory name, or `None` to prevent figure generation. 

Running this code will take a few minutes per spectra, as it is fitting $\mathrm{10^6}$ baselines and peaks to your spectrum to sample uncertainty. If any samples fail, they will be returned in the list FAILURES. 

Save this file as a CSV, so you have this information. We will also use this dataframe to calculate concentration. 

In [None]:
DF_OUTPUT, FAILURES = pig.Run_All_Spectra(DFS_DICT, OUTPUT_PATH)
DF_OUTPUT.to_csv('DF.csv')

It took 3 minutes to process 3 spectra on my Macbook Pro 2.6 GHz 6-Core Intel Core i7. It takes about 7.5 minutes to process 3 spectra on Google Colab, given the presence of fewer cpus. 

Run_All_Spectra returns a dataframe of outputs. Let's look at what's included. 

In [None]:
DF_OUTPUT

We can look at all the columns in this dataframe, given the size. 

In [None]:
DF_OUTPUT.columns

All columns with the prefix of PH represent a peak height. All columns with the suffix of _M represent the mean value, and the suffix of _STD represents 1 $\sigma$. 

The column H2OT_3550_SAT? returns a - if the sample is not saturated, and a * if the sample is saturated. This is based on the maximum absorbance of the peak, and the warning of * indicates that we must consider the concentrations more. The following functions calculating concentration handle this and will suggest best values to use. 

The columns S2N_P5200 and S2N_P4500 represent the signal to noise ratios for the $\mathrm{H_2O_{m,5200}}$ and $\mathrm{OH^-_{4500}}$ peaks. If the values are greater than 4, indicating that the signal is meaningful, the ERR_5200 and ERR_4500 peaks return a - value. If signal-to-noise is too low, the warning of * is returned. 

The columns after describe the fitting parameters for generating the baseline and the $\mathrm{H_2O_{m,1635}}$ peak, so you can generate the baseline yourself. 

# Outputs

Quite few figures, log files, and npz files are generated by Run_All_Spectra, assuming you provide an export path and not just the value of `None`. Let's look at a few of them together. 

PyIRoGlass creates this figure for visualizing how each peak within the 1000-5500 cm${^{-1}}$ is fit, with their peak heights shown. 

In [None]:
Image("https://github.com/sarahshi/PyIRoGlass/raw/main/docs/_static/AC4_OL49_021920_30x30_H2O_a.png")

We can visualize how well PyIRoGlass does in fitting this transmission FTIR spectrum, with the modelfit figure. This plots the fit from $\mathrm{MC^3}$ against the transmission FTIR spectrum, with the residual in fit. 

In [None]:
Image("https://github.com/sarahshi/PyIRoGlass/raw/main/docs/_static/AC4_OL49_021920_30x30_H2O_a_modelfit.png")

The histogram figure shows the distribution of posterior probability densities, with the mean value displayed in the navy dashed line. The shaded region represents the 68% confidence interval around the value. 

In [None]:
Image("https://github.com/sarahshi/PyIRoGlass/raw/main/docs/_static/AC4_OL49_021920_30x30_H2O_a_histogram.png")

The pairwise figure plots the posterior probability density distribution for the 16 fitting parameters of Equation 10, allowing for the visualization of covariance within the parameters. Accounting for covariance allows us to properly account for uncertainty. 

In [None]:
Image("https://github.com/sarahshi/PyIRoGlass/raw/main/docs/_static/AC4_OL49_021920_30x30_H2O_a_pairwise.png")

The trace figure shows how the parameters evolve through MCMC sampling. 

In [None]:
Image("https://github.com/sarahshi/PyIRoGlass/raw/main/docs/_static/AC4_OL49_021920_30x30_H2O_a_trace.png")

## LOG and NPZ

.log files record the performance of the MCMC algorithm through the samples, and the best parameters at each 10% increment. These are shown above. 

.npz files store all the best-parameters, sampled parameters, etc. in a ready-to-use NumPy format. 

We won't open these here, but these are quite useful to review! 

# Concentrations

We now want to convert all those peak heights (with uncertainties) to concentrations (with uncertainties), by applying the Beer-Lambert Law. We do so by using the Concentration_Output function, which takes in these parameters and samples over N samples for a secondary MCMC: 

- DF_OUTPUT: Output from Run_All_Spectra
- N: Number of samples for this MCMC
- THICKNESS: Wafer thickness loaded from ChemThick
- MICOMP: Glass composition loaded from ChemThick
- T_ROOM: Room temperature at time of FTIR analysis, given the sensitivity of density to T. 
- P_ROOM: Room pressure at time of FTIR analysis, given the sensitivity of density to P. 


In [None]:
T_ROOM = 25 # C
P_ROOM = 1 # Bar

N = 500000 # MCMC samples
DENSITY_EPSILON, MEGA_SPREADSHEET = pig.Concentration_Output(DF_OUTPUT, N, THICKNESS, MICOMP, T_ROOM, P_ROOM)

DENSITY_EPSILON.to_csv('DensityEpsilon.csv')
MEGA_SPREADSHEET.to_csv('H2OCO2.csv')

We're all done now! Let's print your results. 

In [None]:
MEGA_SPREADSHEET

There are a few things to note. Each column with the suffix _MEAN represents the mean value, _BP represents the best-parameter from MCMC, and _STD represents the standard deviation. We recommend the use of the 'H2OT_MEAN', 'H2OT_STD', 'CO2_MEAN', and 'CO2_STD' columns. The columns with the suffix _S2N show the signal-to-noise ratio of the NIR peaks, and the columns with the prefix ERR_ just process this information, returning a '-' if the peaks are meaningful and a '*' if the signal is too low. 

Concentrations of $\mathrm{H_2O}$ depend on whether your sample is saturated or not. If your sample is unsaturated (marked by H2OT_3550_SAT == '-'), the column 'H2OT_MEAN'=='H2OT_3550_M'. If your sample is saturated (marked by H2OT_3550_SAT == '*'), the column of 'H2OT_MEAN'=='H2Om_1635_BP'+'OH_4500_M'. The $\mathrm{H_2O_{t, 3550}}$ peak cannot be used, given potential nonlinearity in the Beer-Lambert Law. See the discussion of this handling of speciation in the paper. 


Here is also all the other relevant information for calculating these concentrations. All the density and molar absorptivity information is stored in this dataframe. 

The column 'Density' contains the densities used for the final concentration. The values between 'Density' and 'Density_Sat' will be different if the sample is saturated, showing the difference in densities when using variable concentrations of $\mathrm{H_2O_m}$. 

'Tau' and 'Na/Na+Ca' calculate the compositional parameters required for determining molar absorptivity. All calculated molar absorptivities and their uncertainties (sigma_ prefix) from the inversion are provided in the dataframe. 


In [None]:
DENSITY_EPSILON