Tech Reviews

This Wiki contains information from all Tech Reviews done over the course of this project in order to give a user insight into the logic defining decisions made regarding technology packages applied to our code base.

Chemical Kinetics

This package is helpful for chemistry related problems written in Python, including kinetics, thermodynamics, etc. It includes classes for representing substances, reactions, and systems of reactions. Additional functions are suitable for balancing reactions, analyzing solutions for common chemical kinetic differential equations, and plotting interactive kinetic models derived from well established physical chemistry formulas. There are associated optional dependent packages listed in the chempy documentation that are useful with solving initial value problems, ode's, and systems of non-linear equations.

This package will be useful for future integration of reactor kinetics with our raman decomposition module.

We will apply an Arrhenius model supplied by the decomposition rates found by Raman peak information; for more information see the wiki for Substance Decomposition.

Link to the chempy github package

Dahlgren, (2018). ChemPy: A package useful for chemistry written in Python. Journal of Open Source Software, 3(24), 565, https://doi.org/10.21105/joss.00565

Link to the chemical_kinetics_dev Juypter Notebook

Data Management

The analysis of decomposing supercritical fluids via Raman spectroscopy requires the comparison of a large number of spectra collected across a range of process parameters. To minimize processing power during analysis, it is convenient to fit the spectra data a single time and store the fit results in an external file. HDF5 (hierarchical data format 5) allows the convenient storage of these fit results, while also allowing for easy access to the results in the future without the need to refit the data. The HDF5 file type contains levels of group and datasets that allow for a recognizable organization of data similar to directories and files. In python, the HDF5 file type is supported through the h5py module.

This module will be useful since it will allow all the spectral data for a set of experiments can be stored in one convenient file. This way, the resource intensive step of fitting the peaks in Raman spectra data does not need to be repeating each time a single peak's decomposition trend is observed.

Supercomputer Capabilities

When writing code our goal is not only to enable read-ability and user friendliness, but also speed of computing. Our code will not have ability at this time to work with supercomputers, as it is not needed for the file size and computing needed for the scope of our work this quarter. We have looked into using this and below is some resources for on campus (University of Washington - Seattle) for applying the code base for supercomputer use.

UW Hyak Supercomputer

The below information is specifically for

Machine Learning

For machine learning we looked up and scoped out work that can be seen below

Literature review

Below is a list (with links) of useful articles recently published on machine learning and raman spectra.

Scope of work

Goal: See how different common regression models perform at correctly identifying the presence of a single component in a mixture data set.

Tasks:

Raw pure component spectra
Simulated data set of spectra - 3 components max a. Training b. Testing Note: validation is going to be the full data set.
Try classification approaches and applying test and train data set to see what is most useful

k-Nearest Neighbor (kNN) - is a simple and well-known classification method. It determines the class of a new sample by looking at the majority of the classes of the nearest k samples. The metrics we used for evaluating the distance between samples is the Euclidean distance. The number of neighbors k∈[2, 5] has been considered and 3 has been selected for our model
Random Forest (RF) - is an ensemble learning method based on decision trees. It has the advantages of simplicity, ease of implementation, and low computational overhead, and powerful performance in many tasks. We use the ‘RandomForestClassifier’ module of the sklearn package in this study. The number of trees has been considered in the range [100, 500], and the number of features randomly sampled for each split is optimized in the range [50, 200]. Finally, 500 trees and 90 features were selected.
Logistic Regression (LR) - is a classical classification method and can also be seen as a simple neural network without hidden layers. LR fits the parameters from the training set, fitting the target to [0, 1], and then discretizing the target to achieve classification. We used the ‘LogisticRegression’ module of the sklearn package in this study. SGD was used to learn weights of the model. The regularization penalty was ‘l1’, and the regularization parameters C was 1.00.

For each classification approach calculate the FP/FN/TP/TN

Create a table and see the difference between methods

Provide feedback

Saved searches

Use saved searches to filter your results more quickly