#### This notebook presents a tutorial on how to select the configurations previously generated using Classical molecular dynamics (CLMD) and used as initial configurations for single point ab initi calculations using CP2K. 

Important files in this directory include:
- <font color='yellow'>"BASIS_MOLOPT", "dftd3.dat", "GTH_POTENTIALS", "HFX_BASIS"</font> --> Basis sets and pseudo-potentials reqired by CP2K to describe interactions
- <font color='yellow'>"traj_0.inp"</font> --> Reference CP2K input file for single point calculations
- <font color='yellow'>"prep_spc_from_npz.py"</font> --> Script to generate folders with required files to run single point calculation using CP2K
- <font color='yellow'>"spc_results_to_npz"</font> --> Script to parse output from calculations and generate SPC training dataset. 
 


In [3]:
# imports 
import os 

## Define number of samples to be selected
num_samples = 500

In [6]:
# Define path to the Boltzmann Distribution training data set. 

Butane_BoltzDist_CLC_path = './../prep_train_CLC/CLC_dataset_preparation/BoltzmannDist_500Frames_CLC.npz'
name = 'But_BoltzDist_SPC'
# ##Run the prep_spc_from_npz script to genrerate folders containing the required data to run the SPCs using CP2K
# os.system(f'python prep_spc_from_npz.py --input {Butane_BoltzDist_CLC_path} --samples {int(num_samples)} --name {name}')

# Define path to Uniform Distribution training data set
Butane_UnivDist_CLC_path = './../prep_train_CLC/CLC_dataset_preparation/./UniformDist_500Frames_CLC.npz'
name = 'But_UnivDist_SPC'
# ##Run the prep_spc_from_npz script to genrerate folders containing the required data to run the SPCs using CP2K
# os.system(f'python prep_spc_from_npz.py --input {Butane_UnivDist_CLC_path} --samples {int(num_samples)} --name {name}')


# Define path to Half Left Uniform Distribution training data set
Butane_HL_UnivDist_CLC_path = './../prep_train_CLC/CLC_dataset_preparation/./HalfLeft_UniformDist_500Frames_CLC.npz'
name = 'But_HL_UnivDist_SPC'
# ##Run the prep_spc_from_npz script to genrerate folders containing the required data to run the SPCs using CP2K
# os.system(f'python prep_spc_from_npz.py --input {Butane_HL_UnivDist_CLC_path} --samples {int(num_samples)} --name {name}')


# Define path to Half Left Uniform Distribution training data set
Butane_HR_UnivDist_CLC_path = './../prep_train_CLC/CLC_dataset_preparation/./HalfRight_UniformDist_500Frames_CLC.npz'
name = 'But_HR_UnivDist_SPC'
# ##Run the prep_spc_from_npz script to genrerate folders containing the required data to run the SPCs using CP2K
# os.system(f'python prep_spc_from_npz.py --input {Butane_HR_UnivDist_CLC_path} --samples {int(num_samples)} --name {name}')


# Define path to Bias Left Distribution training data set
Butane_Left_BiasDist_CLC_path = './../prep_train_CLC/CLC_dataset_preparation/./BiasLeftDist_500Frames_CLC.npz'
name = 'But_Left_BiasDist_SPC'
# ##Run the prep_spc_from_npz script to genrerate folders containing the required data to run the SPCs using CP2K
# os.system(f'python prep_spc_from_npz.py --input {Butane_Left_BiasDist_CLC_path} --samples {int(num_samples)} --name {name}')

# Define path to Bias Right Distribution training data set
Butane_Right_BiasDist_CLC_path = './../prep_train_CLC/CLC_dataset_preparation/./BiasRightDist_500Frames_CLC.npz'
name = 'But_Right_BiasDist_SPC'
# ##Run the prep_spc_from_npz script to genrerate folders containing the required data to run the SPCs using CP2K
# os.system(f'python prep_spc_from_npz.py --input {Butane_Right_BiasDist_CLC_path} --samples {int(num_samples)} --name {name}')

# Define path to Bias Left Half Distribution training data set
Butane_LeftHalf_BiasDist_CLC_path = './../prep_train_CLC/CLC_dataset_preparation/./BiasRight_HalftDist_500Frames_CLC.npz'
name = 'But_LeftHalf_BiasDist_SPC'
# ##Run the prep_spc_from_npz script to genrerate folders containing the required data to run the SPCs using CP2K
# os.system(f'python prep_spc_from_npz.py --input {Butane_LeftHalf_BiasDist_CLC_path} --samples {int(num_samples)} --name {name}')

# Define path to Bias LeftRight Half Distribution training data set
Butane_RightHalf_BiasDist_CLC_path = './../prep_train_CLC/CLC_dataset_preparation/./BiasRight_HalftDist_500Frames_CLC.npz'
name = 'But_RightHalf_BiasDist_SPC'
# ##Run the prep_spc_from_npz script to genrerate folders containing the required data to run the SPCs using CP2K
# os.system(f'python prep_spc_from_npz.py --input {Butane_RightHalf_BiasDist_CLC_path} --samples {int(num_samples)} --name {name}')






Create folders for single point calculations. 


Reading Files...

The files in this npz file are:

['R', 'F', 'E', 'z', 'diheds']



Shape of arrays:

Coordinates:  (500, 14, 3)
Energies:  (500,)
Forces:  (500, 14, 3)
Types:  (14,)

Array of types:

[6 1 1 1 6 1 1 6 1 1 6 1 1 1]



Original array of types:

['C', 'H', 'H', 'H', 'C', 'H', 'H', 'C', 'H', 'H', 'C', 'H', 'H', 'H']



Modified array of types:

['C', 'H', 'H', 'H', 'C', 'H', 'H', 'C', 'H', 'H', 'C', 'H', 'H', 'H']


Creating input 1 of 500

Found file: But_BoltzDist_SPC_1.xyz
Creating new folder: /media/nando/New_Volume/Phase_2/NNFFs_V3/Deliverables/Butane/prep_train_SPC/But_BoltzDist_SPC_1
Moving file: /media/nando/New_Volume/Phase_2/NNFFs_V3/Deliverables/Butane/prep_train_SPC/But_BoltzDist_SPC_1.xyz to /media/nando/New_Volume/Phase_2/NNFFs_V3/Deliverables/Butane/prep_train_SPC/But_BoltzDist_SPC_1/But_BoltzDist_SPC_1.xyz
Files moved into separate folders.
Symbolic link created for 'BASIS_MOLOPT' in 'But_BoltzDist_SPC_1'.

0

After the files have been created and ran we can proceed to use the spc_results_to_npz.py to extract the potential energy and forces from the cp2k outputs to generate the npz training dataset.



In [10]:
# Define name of the SPC npz data set for the Boltzmann Distribution 
BD_output_name = 'But_BoltzDist_2500_SPC'
os.system(f'python spc_results_to_npz.py --input {Butane_BoltzDist_CLC_path} --samples {int(num_samples)} --name {name} --output {BD_output_name}')

# Define name of the SPC npz data set for the Uniform Distribution 
UD_output_name = 'But_UnivDist_2500_SPC'
os.system(f'python spc_results_to_npz.py --input {Butane_UnivDist_CLC_path} --samples {int(num_samples)} --name {name} --output {UD_output_name}')

# Define name of the SPC npz data set for the Half Left Uniform Distribution 
HLUD_output_name = 'But_HalfLeft_UnivDist_2500_SPC'
os.system(f'python spc_results_to_npz.py --input {Butane_HL_UnivDist_CLC_path} --samples {int(num_samples)} --name {name} --output {HLUD_output_name}')

# Define name of the SPC npz data set for the Half Right Uniform Distribution 
HRUD_output_name = 'But_HalfRight_UnivDist_2500_SPC'
os.system(f'python spc_results_to_npz.py --input {Butane_HR_UnivDist_CLC_path} --samples {int(num_samples)} --name {name} --output {HRUD_output_name}')

# Define name of the SPC npz data set for the Bias Left Distribution 
BiasLeftD_output_name = 'But_BiasLeftt_Dist_2500_SPC'
os.system(f'python spc_results_to_npz.py --input {Butane_Left_BiasDist_CLC_path} --samples {int(num_samples)} --name {name} --output {BiasLeftD_output_name}')

# Define name of the SPC npz data set for the Bias Right Distribution 
BiasRightD_output_name = 'But_BiasRight_Dist_2500_SPC'
os.system(f'python spc_results_to_npz.py --input {Butane_Right_BiasDist_CLC_path} --samples {int(num_samples)} --name {name} --output {BiasRightD_output_name}')

# Define name of the SPC npz data set for the Bias Half Left Distribution 
BiasHalfLeftD_output_name = 'But_BiasHalfLeftt_Dist_2500_SPC'
os.system(f'python spc_results_to_npz.py --input {Butane_LeftHalf_BiasDist_CLC_path} --samples {int(num_samples)} --name {name} --output {BiasHalfLeftD_output_name}')

# Define name of the SPC npz data set for the Bias Half Left Distribution 
BiasHalfRightD_output_name = 'But_BiasHalfRight_Dist_2500_SPC'
os.system(f'python spc_results_to_npz.py --input {Butane_RightHalf_BiasDist_CLC_path} --samples {int(num_samples)} --name {name} --output {BiasHalfRightD_output_name}')

# Define name of the SPC npz data set for the Case A (OnlyMin) Distribution 
CaseA_Omin_output_name = 'But_CaseA_Omin_Dist_2500_SPC'
os.system(f'python spc_results_to_npz.py --input {CaseA_Omin_output_name} --samples {int(num_samples)} --name {name} --output {CaseA_Omin_output_name}')


Files in the npz file: 

['R', 'F', 'E', 'z', 'diheds']

Frame:  1 

Coordinates:
Shape of coordinates: (14, 3)


Extracting Energy from file: 
File path:  ./But_BoltzDist_SPC_1/output_traj.out
Energy (a.u):  -28.64104183178257
Energy (kcal/mol):  -17972.524943269855 


Forces (a.u): 
[[0.04804745, 0.03036109, -0.03152608], [-0.01948292, 0.01495049, 0.00662638], [-0.00363016, -0.02145592, 0.01373386], [-0.01042217, -0.01638502, -0.02482866], [0.01771742, 0.0327294, 0.02961067]]
Forces (kcal/mol/Angstrom): 
[[ 56.97567727  36.00281941 -37.38428907]
 [-23.10325651  17.72860564   7.85770084]
 [ -4.30472012 -25.44288143  16.28590019]
 [-12.3588285  -19.42970151 -29.4423475 ]
 [ 21.00968946  38.81121124  35.11295558]]

Frame:  2 

Coordinates:
Shape of coordinates: (14, 3)


Extracting Energy from file: 
File path:  ./But_BoltzDist_SPC_2/output_traj.out
Energy (a.u):  -28.631770997910518
Energy (kcal/mol):  -17966.707407232272 


Forces (a.u): 
[[0.02900522, -0.01615243, -0.02180115], [0.01

0

Training data sets have been created sucesfully