# Raman Spectroscopy Decomposition

## Introduction

Once components in a mixture Raman spectra have been identified and assigned, and psudo-Voigt curve fiting has been completed the next step is to compare pure component calibration (or non-decomposing, non-reacting) area under peaks to experimental data. From this comparision one will be able to deterimine:
1. is decomposition occuring?

and if it is then: 
2. calculate the amount of molar decomposition 

This calculation can be completed by comparing the area value of the experimental mixture Raman spectra to the pure component calibration Raman spectra area.

## Pre-step (1/2): Import Modules

In [1]:
#initial imports
import os
import h5py
import pandas as pd
import matplotlib.pyplot as plt
import math
import numpy as np
from scipy import interpolate
import lineid_plot
from ramandecompy import spectrafit
from ramandecompy import peakidentify
from ramandecompy import dataprep
from ramandecompy import dataimport
from ramandecompy import datavis

## Pre-step (2/2): Import Calibration / Pure Component Raman Spectra Data Sets

In [2]:
dataprep.new_hdf5('calibration_data2')

dataprep.add_calibration('calibration_data2.hdf5',
                          '../ramandecompy/tests/test_files/Hydrogen_Baseline_Calibration.xlsx',
                          label='Hydrogen')
dataprep.add_calibration('calibration_data2.hdf5',
                         '../ramandecompy/tests/test_files/CarbonMonoxide_Baseline_Calibration.xlsx',
                         label='CarbonMonoxide')
dataprep.add_calibration('calibration_data2.hdf5','../ramandecompy/tests/test_files/CO2_100wt%.csv',label='CO2')
dataprep.add_calibration('calibration_data2.hdf5','../ramandecompy/tests/test_files/water.xlsx',label='H2O')
dataprep.add_calibration('calibration_data2.hdf5','../ramandecompy/tests/test_files/sapphire.xlsx',label='sapphire')
dataprep.add_calibration('calibration_data2.hdf5','../ramandecompy/tests/test_files/FormicAcid_3_6percent.xlsx',label='FormicAcid')


Data from ../ramandecompy/tests/test_files/Hydrogen_Baseline_Calibration.xlsx fit with compound pseudo-Voigt model. Results saved to calibration_data2.hdf5.
Data from ../ramandecompy/tests/test_files/CarbonMonoxide_Baseline_Calibration.xlsx fit with compound pseudo-Voigt model. Results saved to calibration_data2.hdf5.
Data from ../ramandecompy/tests/test_files/CO2_100wt%.csv fit with compound pseudo-Voigt model. Results saved to calibration_data2.hdf5.
Data from ../ramandecompy/tests/test_files/water.xlsx fit with compound pseudo-Voigt model. Results saved to calibration_data2.hdf5.
Data from ../ramandecompy/tests/test_files/sapphire.xlsx fit with compound pseudo-Voigt model. Results saved to calibration_data2.hdf5.
Data from ../ramandecompy/tests/test_files/FormicAcid_3_6percent.xlsx fit with compound pseudo-Voigt model. Results saved to calibration_data2.hdf5.


In [3]:
dataprep.view_hdf5('calibration_data2.hdf5')

**** calibration_data2.hdf5 ****
[1mCO2[0m
|    Peak_01
|    Peak_02
|    counts
|    residuals
|    wavenumber
[1mCarbonMonoxide[0m
|    Peak_01
|    counts
|    residuals
|    wavenumber
[1mFormicAcid[0m
|    Peak_01
|    Peak_02
|    Peak_03
|    Peak_04
|    Peak_05
|    Peak_06
|    counts
|    residuals
|    wavenumber
[1mH2O[0m
|    Peak_01
|    Peak_02
|    counts
|    residuals
|    wavenumber
[1mHydrogen[0m
|    Peak_01
|    Peak_02
|    Peak_03
|    Peak_04
|    counts
|    residuals
|    wavenumber
[1msapphire[0m
|    Peak_01
|    Peak_02
|    Peak_03
|    Peak_04
|    counts
|    residuals
|    wavenumber


## Step 1: Import Experimental Data Sets
The first thing is to put experimental data into a hdf5 file (this file will end up being used to identify peaks)

With multiple files in a directory/ many data sets it is usefull to loop over all files in the directory to add versus adding one by one. The code to loop came from a stackoverflow comment: `https://stackoverflow.com/questions/10377998/how-can-i-iterate-over-files-in-a-given-directory`

Note: A good resource for HDF5 file types in general is: `http://docs.h5py.org/en/stable/`


In [4]:
#dataprep.new_hdf5('dataprep_experiment') #comment this line out once made for the first time so an error isn't given saying that the file already exists
#directory = '/Users/elizabeth/Desktop/raman-spectra-decomp-analysis/ramandecompy/tests/test_files/' #defining directory for data
#dataprep.view_hdf5('dataprep_experimental.hdf5')

#base_dir = '../ramandecompy/tests/test_files/'

#for filename in os.listdir(directory):
#    if filename.startswith('FA_') and filename.endswith('.csv'):
#        locationandfile = directory + filename
#        dataprep.add_experiment('dataprep_experimental.hdf5', locationandfile)
#        continue
#    else:
#        continue
#return

#FOR CALIBRATION DATA MASS ADD

#dataprep.new_hdf5('dataprep_experiment') #comment this line out once made for the first time so an error isn't given saying that the file already exists
#directory = '/Users/elizabeth/Desktop/raman-spectra-decomp-analysis/ramandecompy/tests/test_files/' #defining directory for data
#


In [5]:
#type(filename) #checking the type (making sure is a string) for file name

In [6]:
#dataprep.view_hdf5('dataimport_ML_df-Copy1.hdf5') #making sure the loop did its job and all data is correctly imported 
#comment out this to not see the long list

## Step 2: Define substance of interest
The second step is to determine if the desired speciecies in the spectra is present, and if it is then if it has decomposed (decreased/changed) from the defined calibration area. 

At this point this will be done by the user knowing where the approximate location of the peak for the substance that is of interest. 

Given the user center peak wavelength location input the code will go through the calibration data and for a peak with a center at the defined location (ith some tolerance of +/- 10 cm^-1) will take the area of that curve and store it as a variable.

In [27]:
#getting peak information for Formic Acid 

data1 = h5py.File('calibration_data2.hdf5', 'r+')
# then specify the peak
peak_01 = list(data1['FormicAcid/Peak_01'])
peak_01s = data1['FormicAcid/Peak_01']
# you put list because otherwise it just saves it as a h5py.dataset or something and lists are more familiar. Then peak_01 will be a list containing the 7 elements of the Peak_01 dataset
print(peak_01)

type(peak_01)
print(type(peak_01s))

[(0.81690427, 19.21444133, 707.31, 12328.7434111, 38.42888266, 222.02789633, 12169.85712537)]
<class 'h5py._hl.dataset.Dataset'>


In [25]:
peak_01[0] #Looking for area under the curve value for the first peak

(0.81690427, 19.21444133, 707.31, 12328.7434111, 38.42888266, 222.02789633, 12169.85712537)

### Figuring out what peak from Formic Acid to use/ are close to other observed peaks used in analysis
For Formic Acid, prior reports of the wavenumbers of significant Raman Peaks (cm^-1) were at:
- 712
- 1219
- 1400
- 1714
- 2943

In [20]:
peak_01 = list(data1['FormicAcid/Peak_01'])
print(peak_01)

[(0.81690427, 19.21444133, 707.31, 12328.7434111, 38.42888266, 222.02789633, 12169.85712537)]


In [29]:
peak_02 = list(data1['FormicAcid/Peak_02'])
print(peak_02[2])

IndexError: list index out of range

In [None]:
peak_03 = list(data1['FormicAcid/Peak_03'])
print(peak_03[2])

In [None]:
peak_04 = list(data1['FormicAcid/Peak_04'])
print(peak_04[2])

In [None]:
peak_05 = list(data1['FormicAcid/Peak_05'])
print(peak_05[2])

In [None]:
peak_06 = list(data1['FormicAcid/Peak_06'])
print(peak_06[2])

### All peaks previously reported are identified in the calibration file within +/- 5 wavenumbers
There is an additional peak identified at 1055, but it is hypothesized that this peak may not be easily identified from other components with similar wavenumbers and/or with the amplitude of the peak being smaller then peaks of formic acid at the other wavenumbers this could be why it isn't identifed in literature. 

For this example the peak occuring at 1400 cm^-1 (peak_04) will be used for molar concentration calculations **because... NEED TO FILL IN**

In [30]:
FA_cal_area = peak_04[6]
print(FA_cal_area)

NameError: name 'peak_04' is not defined

## Step 3: Define presence of substance in experimental data

Then for that same center peak wavelength location input it will identify the presence of the peak (if it is there) in the experimental data and area for that peak and store it as a second variable.


In [31]:
#keyfinder function
def keyfinder(hdf5_filename):
   seconds = []
   hdf5 = h5py.File(hdf5_filename, 'r')
   for _, layer_1 in enumerate(list(hdf5.keys())):
       if isinstance(hdf5[layer_1], h5py.Group):
   #         print('\033[1m{}\033[0m'.format(layer_1))
           for _, layer_2 in enumerate(list(hdf5[layer_1].keys())):
               if isinstance(hdf5['{}/{}'.format(layer_1, layer_2)], h5py.Group):
   #                 print('|    \033[1m{}\033[0m'.format(layer_2))
                   seconds.append('{}/{}'.format(layer_1, layer_2))
                   for _, layer_3 in enumerate(list(hdf5['{}/{}'.format(layer_1, layer_2)])):
                       if isinstance(hdf5['{}/{}/{}'.format(layer_1, layer_2, layer_3)],
                                     h5py.Group):
   #                         print('|    |    \033[1m{}\033[0m/...'.format(layer_3))
                           pass
                       else:
                           pass
   #                         print('|    |    {}'.format(layer_3))
               else:
   #                 print('|    {}'.format(layer_2))
                   seconds.append('{}/{}'.format(layer_1, layer_2))
       else:
           pass
   #         print('{}'.format(layer_1))
   hdf5.close()
   return seconds


In [32]:
#define filenames
hdf5_calfilename = 'calibration_data2.hdf5' #update to hdf5_calfilename
hdf5_expfilename = 'dataimport_ML_df-Copy1.hdf5'

cal_key_list = keyfinder(hdf5_calfilename)
exp_key_list = keyfinder(hdf5_expfilename)

In [33]:
print(cal_key_list)

['CO2/Peak_01', 'CO2/Peak_02', 'CO2/counts', 'CO2/residuals', 'CO2/wavenumber', 'CarbonMonoxide/Peak_01', 'CarbonMonoxide/counts', 'CarbonMonoxide/residuals', 'CarbonMonoxide/wavenumber', 'FormicAcid/Peak_01', 'FormicAcid/Peak_02', 'FormicAcid/Peak_03', 'FormicAcid/Peak_04', 'FormicAcid/Peak_05', 'FormicAcid/Peak_06', 'FormicAcid/counts', 'FormicAcid/residuals', 'FormicAcid/wavenumber', 'H2O/Peak_01', 'H2O/Peak_02', 'H2O/counts', 'H2O/residuals', 'H2O/wavenumber', 'Hydrogen/Peak_01', 'Hydrogen/Peak_02', 'Hydrogen/Peak_03', 'Hydrogen/Peak_04', 'Hydrogen/counts', 'Hydrogen/residuals', 'Hydrogen/wavenumber', 'sapphire/Peak_01', 'sapphire/Peak_02', 'sapphire/Peak_03', 'sapphire/Peak_04', 'sapphire/counts', 'sapphire/residuals', 'sapphire/wavenumber']


In [34]:
print(exp_key_list)

['300C/25s', '300C/35s', '300C/45s', '300C/55s', '300C/65s', '320C/25s', '320C/30s', '320C/40s', '320C/50s', '320C/60s', '340C/20s', '340C/30s', '340C/40s', '340C/50s', '340C/60s', '360C/20s', '360C/30s', '360C/40s', '360C/50s', '360C/60s', '380C/15s', '380C/25s', '380C/35s', '380C/45s', '380C/55s', '390C/10s', '390C/15s', '390C/20s', '390C/25s', '390C/30s', '400C/10s', '400C/125s', '400C/15s', '400C/5s', '400C/75s', '410C/10s', '410C/125s', '410C/15s', '410C/5s', '410C/75s', '420C/10s', '420C/5s', '420C/625s', '420C/75s', '420C/875s', '430C/4s', '430C/5s', '430C/6s', '430C/7s', '430C/8s']


In [38]:
frames = []
df = pd.DataFrame()
for i,key in enumerate(exp_key_list):
   df =peakidentify.peak_assignment(hdf5_expfilename, key, hdf5_calfilename, 10, plot =False)
   frames.append(df)

The peaks that we found for CO2 are: 
1280.4
1385.3
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
The peaks that we found for CarbonMonoxide are: 
2139.9096496496495
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
The peaks that we found for FormicAcid are: 
707.31
1055.9
1219.5
1400.1
1716.7
2940.6
[0. 0. 0. 0. 0. 1. 0. 1. 1. 1. 0. 1. 0. 0. 1. 0.]
The peaks that we found for H2O are: 
1640.6
3194.4
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0. 0. 1.]
The peaks that we found for Hydrogen are: 
355.6504104104104
587.3333133133133
816.0073473473473
1035.6547747747748
[0. 0. 0. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
The peaks that we found for sapphire are: 
378.71
418.14
575.97
751.21
[0. 1. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[['Unassigned'], ['sapphire'], ['sapphire'], ['Unassigned'], ['Hydrogen', 'sapphire'], ['FormicAcid'], ['sapphire'], ['FormicAcid'], ['FormicAcid'], ['FormicAcid'], ['H2O'], ['FormicAcid'], ['CarbonMonoxide'], ['Unassigned'], ['FormicAcid'], ['H2O']]
(0.

ValueError: could not assign tuple of length 9 to structure with 8 fields.

In [39]:
result = pd.concat(frames,axis=0, join='outer', join_axes=None, ignore_index=False,
         keys=None, levels=None, names=None, verify_integrity=False,
         copy=True,sort=True)

ValueError: No objects to concatenate

In [None]:
for exp_key_list in 'dataimport_ML_df-Copy1.hdf5':
    for 

## Step 5: Calculate Molar Decomposition

To define the molar decomposition the area of the experimental data will be divided by the calibration data's area. This value will be the molar amount of the substance at the given experimental temperature and resonance time.

## Step 6: Plot Molar Decomposition

## Step 7: Compare Molar Decomposition with Reported Literature Values

# Conclusion