# trixs-dl-models
## Models
* [A Deep-Learning model](Train_Run_DL_Models.ipynb) based on the [Random Forest model](https://github.com/TRI-AMDD/trixs/blob/Torrisi_XANES_RF_2020/notebooks/Train_Run_Models.ipynb) from a published article: 
   * [Random forest machine learning models for interpretable X-ray absorption near-edge structure spectrum-property relationships](https://www.nature.com/articles/s41524-020-00376-6)
* Current version: DL vs RF on pointwise spectra data
  * NN model without regularization : [Train_Run_DL_Models.ipynb](Train_Run_DL_Models.ipynb)
  * NN model with regularization : [Train_Run_DL_Models_V2.ipynb](Train_Run_DL_Models.ipynb)
  * CNN model:
    * trained with original data : [Train_Run_DL_Models_CNN_originalData.ipynb](Train_Run_DL_Models_CNN_originalData.ipynb)
    * data augmentation (average pooling): [Train_Run_DL_Models_CNN_moreData.ipynb](Train_Run_DL_Models_CNN_moreData.ipynb)
* All scenarios:
  
| Status                   | Model      | Data      | Iteration/Epoch | Cross-Validation | Kernel-Size | Feature Importance | Notebook | Performance Bader | Performance MD | Performance All |
| ------------------------ | -----------| --------- | -------------   | ---------------- | ----------- | --  | -------- | -------- | -------- | -------- |
| :heavy_check_mark: | Random Forest    | original  | 300             | 3                | -           | Yes | [done](Train_Run_DL_Models_CNN_originalData.ipynb)| ![img](figures_feffnorm/feff_cnn_originalData_20_bader_uniparity.svg) | ![img](figures_feffnorm/feff_cnn_originalData_20_md_uniparity.svg) | ![img](figures_feffnorm/feff_cnn_originalData_20_all_perf.svg) | 
| :heavy_check_mark: | Neural Networks  | original  | 300             | 3                | -           | Yes | [done](Train_Run_DL_Models.ipynb)| 
| :heavy_check_mark: | CNN              | original  | 300             | 3                | 5           | No | [done](Train_Run_DL_Models_CNN_originalData.ipynb)|  ![img](figures_feffnorm/feff_cnn_originalData_5_bader_uniparity_nn.svg) | ![img](figures_feffnorm/feff_cnn_originalData_5_md_uniparity.svg) |  ![img](figures_feffnorm/feff_cnn_originalData_5_all_perf_nn.svg) |  
| :heavy_check_mark: | CNN              | original  | 300             | 3                | 10          | No | [done](Train_Run_DL_Models_CNN_originalData_10.ipynb)| ![img](figures_feffnorm/feff_cnn_originalData_10_bader_uniparity_nn.svg) | ![img](figures_feffnorm/feff_cnn_originalData_10_md_uniparity_nn.svg) |  ![img](figures_feffnorm/feff_cnn_originalData_10_all_perf_nn.svg) | 
| :heavy_check_mark: | CNN              | original  | 300             | 3                | 20          | No | [done](Train_Run_DL_Models_CNN_originalData_20.ipynb)|  ![img](figures_feffnorm/feff_cnn_originalData_20_bader_uniparity_nn.svg) | ![img](figures_feffnorm/feff_cnn_originalData_20_md_uniparity_nn.svg) |  ![img](figures_feffnorm/feff_cnn_originalData_20_all_perf_nn.svg) |  
| :heavy_check_mark: | Random Forest    | augmented | 300             | 3                | -           | Yes | [done](Train_Run_DL_Models_CNN_moreData.ipyn)| ![img](figures_feffnorm/feff_cnn_moreData_20_bader_uniparity.svg) | ![img](figures_feffnorm/feff_cnn_moreData_20_md_uniparity.svg) |  ![img](figures_feffnorm/feff_cnn_moreData_20_all_perf.svg) | 
| :heavy_check_mark: | Neural Networks  | augmented | 300             | 3                | -           | Yes | [done](Train_Run_DL_Models_moreData.ipynb)|
| :heavy_check_mark: | CNN              | augmented | 300             | 3                | 5           | No | [done](Train_Run_DL_Models_CNN_moreData_5.ipynb)| ![img](figures_feffnorm/feff_cnn_moreData_5_bader_uniparity_nn.svg) | ![img](figures_feffnorm/feff_cnn_moreData_5_md_uniparity_nn.svg) | ![img](figures_feffnorm/feff_cnn_moreData_5_all_perf_nn.svg) | 
| :heavy_check_mark: | CNN              | augmented | 300             | 3                | 10          | No | [done](Train_Run_DL_Models_CNN_moreData_10.ipynb)| ![img](figures_feffnorm/feff_cnn_moreData_10_bader_uniparity_nn.svg) | ![img](figures_feffnorm/feff_cnn_moreData_10_md_uniparity_nn.svg) | ![img](figures_feffnorm/feff_cnn_moreData_10_all_perf_nn.svg) | 
| :heavy_check_mark: | CNN              | augmented | 300             | 3                | 20          | No | [done](Train_Run_DL_Models_CNN_moreData_20.ipynb)| ![img](figures_feffnorm/feff_cnn_moreData_20_bader_uniparity_nn.svg) | ![img](figures_feffnorm/feff_cnn_moreData_20_md_uniparity_nn.svg) | ![img](figures_feffnorm/feff_cnn_moreData_20_all_perf_nn.svg) | 


## Data:
* training data: https://data.matr.io/4/

```
wget https://s3.amazonaws.com/publications.matr.io/4/deployment/data/xanes_2019.zip

unzip xanes_2019.zip

git clone https://github.com/fengchenLBL/trixs-dl-models.git

cp -rf matrio_folder/spectral_data matrio_folder/model_data ./trixs-dl-models

cd trixs-dl-models
```

## References:
* [https://www.nature.com/articles/s41524-020-00376-6](https://www.nature.com/articles/s41524-020-00376-6)
* [https://github.com/TRI-AMDD/trixs/blob/Torrisi_XANES_RF_2020/notebooks/Train_Run_Models.ipynb](https://github.com/TRI-AMDD/trixs/blob/Torrisi_XANES_RF_2020/notebooks/Train_Run_Models.ipynb)
* [https://data.matr.io/4/](https://data.matr.io/4/)


In [1]:
import os  
import pandas as pd
import ipywidgets as widgets
from ipywidgets import interactive
from IPython.display import display
from IPython.display import SVG

## Results

In [2]:
### Random Forest w/ Original Dataset 
rf_df_originalData = pd.read_csv('figures_feffnorm/pointwise_table_feff_cnn_originalData_20.csv')
rf_df_originalData_b = 'figures_feffnorm/feff_cnn_originalData_20_bader_uniparity.svg'
rf_df_originalData_md = 'figures_feffnorm/feff_cnn_originalData_20_md_uniparity.svg'

### CNN w/ Original Dataset & Kernel size 20 
cnn_df_originalData_20 = pd.read_csv('figures_feffnorm/pointwise_table_feff_cnn_originalData_20_nn.csv')
cnn_df_originalData_20_b = 'figures_feffnorm/feff_cnn_originalData_20_bader_uniparity_nn.svg'
cnn_df_originalData_20_md = 'figures_feffnorm/feff_cnn_originalData_20_md_uniparity_nn.svg'

### CNN w/ Original Dataset & Kernel size 10
cnn_df_originalData_10 = pd.read_csv('figures_feffnorm/pointwise_table_feff_cnn_originalData_10_nn.csv')
cnn_df_originalData_10_b = 'figures_feffnorm/feff_cnn_originalData_10_bader_uniparity_nn.svg'
cnn_df_originalData_10_md = 'figures_feffnorm/feff_cnn_originalData_10_md_uniparity_nn.svg'

### CNN w/ Original Dataset & Kernel size 5 
cnn_df_originalData_5 = pd.read_csv('figures_feffnorm/pointwise_table_feff_cnn_originalData_5_nn.csv')
cnn_df_originalData_5_b = 'figures_feffnorm/feff_cnn_originalData_5_bader_uniparity_nn.svg'
cnn_df_originalData_5_md = 'figures_feffnorm/feff_cnn_originalData_5_md_uniparity_nn.svg'

### Random Forest w/ Augmented Dataset 
rf_df_moreData = pd.read_csv('figures_feffnorm/pointwise_table_feff_cnn_moreData_20.csv')
rf_df_moreData_b = 'figures_feffnorm/feff_cnn_moreData_20_bader_uniparity.svg'
rf_df_moreData_md = 'figures_feffnorm/feff_cnn_moreData_20_md_uniparity.svg'

### CNN w/ Augmented Dataset & Kernel size 20 
cnn_df_moreData_20 = pd.read_csv('figures_feffnorm/pointwise_table_feff_cnn_moreData_20_nn.csv')
cnn_df_moreData_20_b = 'figures_feffnorm/feff_cnn_moreData_20_bader_uniparity_nn.svg'
cnn_df_moreData_20_md = 'figures_feffnorm/feff_cnn_moreData_20_md_uniparity_nn.svg'

### CNN w/ Augmented Dataset & Kernel size 10
cnn_df_moreData_10 = pd.read_csv('figures_feffnorm/pointwise_table_feff_cnn_moreData_10_nn.csv')
cnn_df_moreData_10_b = 'figures_feffnorm/feff_cnn_moreData_10_bader_uniparity_nn.svg'
cnn_df_moreData_10_md = 'figures_feffnorm/feff_cnn_moreData_10_md_uniparity_nn.svg'

### CNN w/ Augmented Dataset & Kernel size 5 
cnn_df_moreData_5 = pd.read_csv('figures_feffnorm/pointwise_table_feff_cnn_moreData_5_nn.csv')
cnn_df_moreData_5_b = 'figures_feffnorm/feff_cnn_moreData_5_bader_uniparity_nn.svg'
cnn_df_moreData_5_md = 'figures_feffnorm/feff_cnn_moreData_5_md_uniparity_nn.svg'

In [3]:
### Dropdown List to show performance tables
dfs1 = {'Random Forest: Original': rf_df_originalData,
       'CNN: Original & Kernel 5': cnn_df_originalData_5, 
       'CNN: Original & Kernel 10': cnn_df_originalData_10,
       'CNN: Original & Kernel 20': cnn_df_originalData_20
      }


dfs2 = {
       'Random Forest: Augmented': rf_df_moreData,
       'CNN: Augmented & Kernel 5': cnn_df_moreData_5, 
       'CNN: Augmented & Kernel 10': cnn_df_moreData_10,
       'CNN: Augmented & Kernel 20': cnn_df_moreData_20
      }

items1 = list(dfs1.keys())
items1.extend(list(dfs2.keys()))
items2 = list(dfs2.keys())
items2.extend(list(dfs1.keys()))

dfs1.update(dfs2)

def view1(table=''):
    if table=='': table=items1[0]
    return(display(dfs1[table]))

def view2(table=''):
    if table=='': 
        table=items2[0]
    return(display(dfs1[table]))
 
w1 = widgets.Dropdown(options=items1)
w2 = widgets.Dropdown(options=items2)


fig_dfs = {'RF: Original b': rf_df_originalData_b,
           'RF: Original md': rf_df_originalData_md,
           'CNN: Original & Kernel 5 b': cnn_df_originalData_5_b,
           'CNN: Original & Kernel 5 md': cnn_df_originalData_5_md,
           'CNN: Original & Kernel 10 b': cnn_df_originalData_10_b,
           'CNN: Original & Kernel 10 md': cnn_df_originalData_10_md,
           'CNN: Original & Kernel 20 b': cnn_df_originalData_20_b,
           'CNN: Original & Kernel 20 md': cnn_df_originalData_20_md,
           'RF: Augmented b': rf_df_moreData_b,
           'RF: Augmented md': rf_df_moreData_md,
           'CNN: Augmented & Kernel 5 b': cnn_df_moreData_5_b,
           'CNN: Augmented & Kernel 5 md': cnn_df_moreData_5_md,
           'CNN: Augmented & Kernel 10 b': cnn_df_moreData_10_b,
           'CNN: Augmented & Kernel 10 md': cnn_df_moreData_10_md,
           'CNN: Augmented & Kernel 20 b': cnn_df_moreData_20_b,
           'CNN: Augmented & Kernel 20 md': cnn_df_moreData_20_md
          }

## Original Dataset

In [4]:
interactive(view1, table=w1)

interactive(children=(Dropdown(description='table', options=('Random Forest: Original', 'CNN: Original & Kerne…

## Augmented Dataset

In [5]:
interactive(view2, table=w2)

interactive(children=(Dropdown(description='table', options=('Random Forest: Augmented', 'CNN: Augmented & Ker…