Skip to content

Tech Reviews

Elizabeth Rasmussen edited this page Mar 31, 2020 · 13 revisions

This Wiki contains information from all Tech Reviews done over the course of this project in order to give a user insight into the logic defining decisions made regarding technology packages applied to our code base.

Chemical Kinetics

This package is helpful for chemistry related problems written in Python, including kinetics, thermodynamics, etc. It includes classes for representing substances, reactions, and systems of reactions. Additional functions are suitable for balancing reactions, analyzing solutions for common chemical kinetic differential equations, and plotting interactive kinetic models derived from well established physical chemistry formulas. There are associated optional dependent packages listed in the chempy documentation that are useful with solving initial value problems, ode's, and systems of non-linear equations.

This package will be useful for future integration of reactor kinetics with our raman decomposition module.

We will apply an Arrhenius model supplied by the decomposition rates found by Raman peak information; for more information see the wiki for Substance Decomposition.

Dahlgren, (2018). ChemPy: A package useful for chemistry written in Python. Journal of Open Source Software, 3(24), 565, https://doi.org/10.21105/joss.00565

Data Management

The analysis of decomposing supercritical fluids via Raman spectroscopy requires the comparison of a large number of spectra collected across a range of process parameters. To minimize processing power during analysis, it is convenient to fit the spectra data a single time and store the fit results in an external file. HDF5 (hierarchical data format 5) allows the convenient storage of these fit results, while also allowing for easy access to the results in the future without the need to refit the data. The HDF5 file type contains levels of group and datasets that allow for a recognizable organization of data similar to directories and files. In python, the HDF5 file type is supported through the h5py module.

This module will be useful since it will allow all the spectral data for a set of experiments can be stored in one convenient file. This way, the resource intensive step of fitting the peaks in Raman spectra data does not need to be repeating each time a single peak's decomposition trend is observed.

Supercomputer Capabilities

When writing code our goal is not only to enable read-ability and user friendliness, but also speed of computing. Our code will not have ability at this time to work with supercomputers, as it is not needed for the file size and computing needed for the scope of our work this quarter. We have looked into using this and below is some resources for on campus (University of Washington - Seattle) for applying the code base for supercomputer use.

UW Hyak Supercomputer

The below information is specifically for

Machine Learning

For machine learning we looked up and scoped out work that can be seen below

Literature review

Below is a list (with links) of useful articles recently published on machine learning and raman spectra.

  1. Machine learning tools for mineral recognition and classification from Raman spectroscopy
  2. Deep learning-based component identification for the Raman spectra of mixtures
  3. Deep convolutional neural networks for Raman spectrum recognition: a unified solution
  4. A study of machine learning regression methods for major elemental analysis of rocks using laser-induced breakdown spectroscopy
  5. A comparison of Raman and FT-IR spectroscopy for the prediction of meat spoilage
  6. Extracting Knowledge from Data through Catalysis Informatics

Scope of work

Goal: See how different common regression models perform at correctly identifying the presence of a single component in a mixture data set.

Tasks:

  1. Raw pure component spectra

  2. Simulated data set of spectra - 3 components max a. Training b. Testing Note: validation is going to be the full data set.

  3. Try classification approaches and applying test and train data set to see what is most useful

  • k-Nearest Neighbor (kNN) - is a simple and well-known classification method. It determines the class of a new sample by looking at the majority of the classes of the nearest k samples. The metrics we used for evaluating the distance between samples is the Euclidean distance. The number of neighbors k∈[2, 5] has been considered and 3 has been selected for our model
  • Random Forest (RF) - is an ensemble learning method based on decision trees. It has the advantages of simplicity, ease of implementation, and low computational overhead, and powerful performance in many tasks. We use the ‘RandomForestClassifier’ module of the sklearn package in this study. The number of trees has been considered in the range [100, 500], and the number of features randomly sampled for each split is optimized in the range [50, 200]. Finally, 500 trees and 90 features were selected.
  • Logistic Regression (LR) - is a classical classification method and can also be seen as a simple neural network without hidden layers. LR fits the parameters from the training set, fitting the target to [0, 1], and then discretizing the target to achieve classification. We used the ‘LogisticRegression’ module of the sklearn package in this study. SGD was used to learn weights of the model. The regularization penalty was ‘l1’, and the regularization parameters C was 1.00.
  1. For each classification approach calculate the FP/FN/TP/TN
  • Screen Shot 2019-06-25 at 2 58 48 PM
  1. Create a table and see the difference between methods