### GAE Dataset Generation


This notebook generates the dataset to be stored in the `raw` folder for GAE training. The dataset is created by running the FITSNAP program to obtain the bispectrum feature matrix and force truths/predictions, and then selecting the per-atom features, labels, and predictions for the forces along the x-direction or all directions (x,y and z).Iron structures saved in `JSON` folder are from https://github.com/FitSNAP/FitSNAP/tree/master/examples/Fe_Linear_NPJ2021

#### Step 1: Run FITSNAP Program

Run the FITSNAP program to train the SNAP potential for the Fe datasets.


In [1]:
#run fitsnap program to train SNAP potential for the Ta datasets
!mpirun --allow-run-as-root -np 2 python -m fitsnap3 Fe-example.in --overwrite


    ______ _  __  _____  _   __ ___     ____  
   / ____/(_)/ /_/ ___/ / | / //   |   / __ \ 
  / /_   / // __/\__ \ /  |/ // /| |  / /_/ /
 / __/  / // /_ ___/ // /|  // ___ | / ____/ 
/_/    /_/ \__//____//_/ |_//_/  |_|/_/      

-----23Sep22------
Reading input...
Hash: 1d4fc8427cf86fca4c211a9e119b48bb
Finished reading input
------------------
mpi4py version:  3.1.4
numpy version:  1.24.0
scipy version:  1.10.1
pandas version:  1.5.3
LAMMPS (28 Mar 2023 - Development)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
-----------
Total wall time: 0:00:00
Using LAMMPSSNAP as FitSNAP calculator
MD-bcc_Te300K : Detected  40  fitting on  40  testing on  0
EOS_100 : Detected  403  fitting on  403  testing on  0
BCC-HCP-transition : Detected  168  fitting on  168  testing on  0
liquid_mag_old : Detected  10  fitting on  10  testing on  0
point_def : Detected  8  fitting on  8  testing on  0
CCMC-bcc-hcp-elastic_Jun20

#### Step 2: Extract and save the per-atom features and forces along x direction

- Load the FITSNAP dataframe using the `DataframeTools` class and preprocess it by removing the 'Energy' rows.


In [None]:
import pandas as pd
import pickle
from dataframe_tools import DataframeTools

dataframe_tool = DataframeTools("FitSNAP.df")
df = dataframe_tool.read_dataframe()
df=df[df['Row_Type']!='Energy']
df=df.reset_index(drop=True)

- Extract the per-atom information (features, preds, Groups etc.) for the forces along the x-direction.




In [None]:
df_fx=df[df.index%3==0]
df_fx=df_fx.reset_index(drop=True)
df_fx = df_fx.iloc[:, 1:]
df_fx.rename(columns={'preds': 'preds_fx'},inplace=True)
df_fx.rename(columns={'truths': 'truths_fx'},inplace=True)


- Group the dataframe by 'Configs' and save the grouped dataframes as a pickle file in the <code>raw</code> folder.
<br><br>
<strong>Note:</strong> The dataset name is constructed using the format <code>'elem' + 'feature_type'</code>, where <code>'elem'</code> represents the element and <code>'feature_type'</code> represents the feature type specified in the user-defined configuration file (see <code>configs</code> folder).
<br><br>
For example:
<ul>
  <li><code>raw_Fe_snap_fx</code> indicates the dataset for the element Tantalum (Ta) with SNAP features on forces in the x-direction.</li>
  <li><code>raw_Fe_snap_all</code> indicates the dataset for the element Tantalum (Ta) with SNAP features on forces in all directions.</li>
</ul>


In [None]:
df_fx['Configs'] = df_fx['Groups'] +'_'+ df_fx['Configs']
grouped_dfs = dict(tuple(df_fx.groupby('Configs')))
with open('../datasets/raw/raw_Fe_snap_fx/Fe_dataset.pkl', 'wb') as file:
    pickle.dump(grouped_dfs, file)

#### Step 3 （optional）: Extract and save the per-atom features and forces along three direction


- Load the FITSNAP dataframe using the <code>DataframeTools</code> class and preprocess it by removing the 'Energy' rows.

- Group and Save the Dataset for Forces along All Directions


In [1]:
!pwd

/usr/WS1/sun36/MEAGraph/run/applications/DataPruning/Fe


In [10]:
import pandas as pd
import pickle
##generate the feature amat and atom_list
from dataframe_tools import DataframeTools
dataframe_tool = DataframeTools("FitSNAP.df")
df = dataframe_tool.read_dataframe()
df=df[df['Row_Type']!='Energy']
df=df.reset_index(drop=True)

df_fx=df[df.index%3==0]
df_fx=df_fx.reset_index(drop=True)

df_fy=df[df.index%3==1]
df_fy=df_fy.reset_index(drop=True)

df_fz=df[df.index%3==2]
df_fz=df_fz.reset_index(drop=True)

num_descriptors=31
df_force=pd.concat([df_fx.iloc[:,1:num_descriptors],df_fy.iloc[:,1:num_descriptors],df_fz.iloc[:,1:]],axis=1)
df_force['preds_fx']=df_fx['preds'].values
df_force['preds_fy']=df_fy['preds'].values
df_force = df_force.rename(columns={'preds': 'preds_fz'})
df_force['truths_fx']=df_fx['truths'].values
df_force['truths_fy']=df_fy['truths'].values
df_force = df_force.rename(columns={'truths': 'truths_fz'})

df_force['Configs'] = df['Groups'] +'_'+ df['Configs']
grouped_dfs = dict(tuple(df_force.groupby('Configs')))

with open('../../../datasets/raw/raw_Fe_snap_all/Fe_dataset.pkl', 'wb') as file:
    pickle.dump(grouped_dfs, file)