This notebook has the code and instructions to used to select the configurations generated from and Ab initio molecular dynamics simulation (AIMD) and make single-poin classical MD (CLMD) with said configutrations. THe AIMD simulation was performed for 20 ps and a 0.5 fs timestep using CP2k.

In [1]:
# import modules 
import numpy as np 
import matplotlib.pyplot as plt
import lammps_logfile as lmplog
import os
import natsort
import glob
from lmp_spc_scripts import read_xyz_traj,xyz_cords_array,parse_lammpstrj


In [2]:
## Define paths to xyz, force file and colvar file. OPotential energy is extracted from force file 
LOUD = True
coord_path = glob.glob('./AIMD_300K_20ps/*-pos-1.xyz')[0]
force_file = file_path = glob.glob('./AIMD_300K_20ps/*-frc-1.xyz')[0]
colvar_file = glob.glob('./AIMD_300K_20ps/dihedrals_aimd_300K_20ps.txt')[0]
output_name = './AIMD_300K_20ps/AIMD_Butane_Conv20ps_500Frames'

The file 'aimd_to_npz.py' contains a Command Line Interface (CLI) algorithm that will look into the trajectory and force files generated through the AIMD simulation in CP2K. THen it will generate a npz training data that contains n samples and their respective potential energies, atomic forces and coordinates. This npz data set can be used directly to train an Allefro model using the SPC data or to generate single point CLMD simulations for a dataset the goes from AIMD to CLMD. 

In [3]:
## Run the python script 
os.system(f'python ./AIMD_300K_20ps/aimd_to_npz.py --input {coord_path} --force {force_file} --colvar {colvar_file} --output {output_name} --sample {500} --plot') 



Convert AIMD of CP2K trajectory to .npz file


Reading Files...

Coordinate file read succesfully: ./AIMD_300K_20ps/Butane_aimd_test-pos-1.xyz
Force file read succesfully: ./AIMD_300K_20ps/Butane_aimd_test-frc-1.xyz
Colvar file read succesfully: ./AIMD_300K_20ps/dihedrals_aimd_300K_20ps.txt


Trajectory Details:

Number of frames: 40001
Number of atoms: 14


Creating dictionary of all coordinates and forces...


Possible values per Keys: dict_keys(['time', 'E', 'coordinates', 'forces', 'colvar'])
Creating Array of 500 samples
Total number of frames: 40001

[    0     1     2 ... 39998 39999 40000]

Randomly selected 500 frames from 40001 total frames

[ 4891 19705  4123 12206  3003  3030 28855   435  5890 22975  7061 16089
 31821 37332 16355 17951 31999 22553  2238 26455  8512 32009 37219 17755
 22940 37191 12858 18226 17790 30350 27166 31463 24191 23824 25124 19502
 30959  5832 38278  5189 36006 33999 14748 18988  5052 34059 39310 36663
 30128 15744  4601 31551  5295 28666 37909 200

0

The goal now is to have functio nthaat will read the coordinates in the reference Butane data file and will replace them with tne coordinates in the npz file. Additionally it will create a new folder for each frame, since each of them represents an individual siulation. THe code will also chhange the names of the data file i nthe read_data section of the LAMMPS input file. 

In [4]:
# Define function that will read the atomic coordinates from the reference data file and will assign the coordinates from the npz dataset 
def update_lmp_data_file(file_path, atomic_coordinates, new_file_path):
    """
    Update the atomic coordinates in a LAMMPS data file with new coordinates.
    
    Parameters
    ----------
    file_path : str
        Path to the LAMMPS data file.
    atomic_coordinates : np.ndarray
        Array of new atomic coordinates.
    new_file_path : str
        Path to the new LAMMPS data file.
    """
    with open(file_path, 'r') as file:
        lines = file.readlines()

    # Find the starting and ending index of the 'Atoms' section
    start_index = lines.index(' Atoms\n') + 2
    end_index = start_index + len(atomic_coordinates)

    # Replace the coordinates in the file with those from the array
    for i, coords in enumerate(atomic_coordinates):
        line_parts = lines[start_index + i].split()[:4]  # Retain first four columns
        new_line = ' '.join(line_parts + [f"{coord:.6f}" for coord in coords]) + '\n'
        lines[start_index + i] = new_line

    # Write the updated content back to the file
    with open(new_file_path, 'w') as file:
        file.writelines(lines)

    print("Coordinates updated successfully!")


## Define function to rename the value of the read_data flag in the Lammps input file 
def change_read_data(filename, new_data_file):
    try:
        with open(filename, 'r') as file:
            lines = file.readlines()

        found = False
        for i, line in enumerate(lines):
            if 'read_data' in line:
                found = True
                lines[i] = f"read_data       {new_data_file}\n"
                break

        if found:
            with open(filename, 'w') as file:
                file.writelines(lines)
            print(f"File '{filename}' updated. 'read_data' line changed to '{new_data_file}'")
        else:
            print("Error: 'read_data' line not found in the file.")

    except FileNotFoundError:
        print("Error: File not found.")



In [5]:
# Once the AIMD npz dataset has been generated, we can proceed to generate required files for LAMMPS energy and force calculations using the coordinates obtained from AIMD

#Reload the generated npz file
aimd_conv20_npz = np.load(output_name+'.npz')

# Define name for folders where simulations will be ran 
umbs_dir = 'restraint_'

## Define name of folder where lammps calculations will be conducted
lmp_spc_folder = 'lmp_spc_data'
# Create folder to store the new data files. Check if the exits, if not create it
if not os.path.exists(lmp_spc_folder):
    os.makedirs(lmp_spc_folder)


In [6]:
## Do a for loop that iterates over the leng of the size of the coordinates array.
for frame in range(len(aimd_conv20_npz['R'][:])):
    
    # Create folder to store the new data files and run the single point CLMD simulations
    lmp_spc_folder = 'lmp_spc_data'
    lmp_spc_folder = f'{lmp_spc_folder}/{umbs_dir}{frame}'
    print(f'\nFolder created for frame {frame}')
    
    ## Check if the lmp_spc_folder exits, if not create it
    if not os.path.exists(lmp_spc_folder):
        os.makedirs(lmp_spc_folder)

    # Define inputs for the function update_lmp_data_file
    file_path = './Butane.data'  # Replace with your file path
    atomic_coordinates = aimd_conv20_npz['R'][frame]  # Replace with your new coordinates
    new_file_path = f'{lmp_spc_folder}/butane_{frame}.data'

    # Run the function to update the data file
    update_lmp_data_file(file_path, atomic_coordinates, new_file_path)
    print(f'New file created for frame {frame}')

    # Define inputs for the function change_read_data
    file_path = './Butane.in'  # Replace with your file path
    new_data_filename = f'butane_{frame}.data'

    # Run the function to update the input file
    change_read_data(file_path, new_data_filename)

    # Use os to copy the input file to the new folder
    os.system(f'cp {file_path} {lmp_spc_folder}')

    # Use os to copy plumed file to the new folder
    os.system(f'cp ./plumed_but.dat {lmp_spc_folder}')



Folder created for frame 0
Coordinates updated successfully!
New file created for frame 0
File './Butane.in' updated. 'read_data' line changed to 'butane_0.data'

Folder created for frame 1
Coordinates updated successfully!
New file created for frame 1
File './Butane.in' updated. 'read_data' line changed to 'butane_1.data'

Folder created for frame 2
Coordinates updated successfully!
New file created for frame 2
File './Butane.in' updated. 'read_data' line changed to 'butane_2.data'

Folder created for frame 3
Coordinates updated successfully!
New file created for frame 3
File './Butane.in' updated. 'read_data' line changed to 'butane_3.data'

Folder created for frame 4
Coordinates updated successfully!
New file created for frame 4
File './Butane.in' updated. 'read_data' line changed to 'butane_4.data'

Folder created for frame 5
Coordinates updated successfully!
New file created for frame 5
File './Butane.in' updated. 'read_data' line changed to 'butane_5.data'

Folder created for fr

With all the input files generated the next step is to run the single point CLMD simulations to obtained the data of interest. To run this simulations use the lmp_spc_run.py script located in the lmp_spc_data directory.


In [7]:
# ## RUn the script by uncommenting and runnig the line below:

# ## Run the python script 
# os.system(f'python ./lmp_spc_data/lmp_spc_run.py') 

Now we will read the potential energy and atomic forces calculated by lammps and will extract them to createe the new npz dataset. 


In [10]:
## Define names and paths 
data_folder = './lmp_spc_data/'
workig_folder = 'restraint_'
log_file = 'log.lammps' ##'adp_clmd_2ns_umb_'
forces_file = 'forces.dump'
xyz_traj_file = 'traj_nnip.xyz'
npz_aimd = './AIMD_300K_20ps/AIMD_Butane_Conv20ps_500Frames.npz'
#Load the 
npz_aimd = np.load(npz_aimd)

##Initialize list for storing values
potEng = []
types = []
coordinates = []
forces = []
cv = []
verbose = True



In [13]:
os.getcwd()

'/media/nando/New_Volume/Phase_2/NNFFs_V3/Deliverables/Butane/prep_train_CLC/CLC_dataset_preparation/CaseB_Conv20ps_TrainingData'

In [14]:
#Loop over the folders inside the data_folder to extarct the info of each simulation
for i, umbrella in enumerate(natsort.natsorted(os.listdir(data_folder))[:]):
    if umbrella.startswith(workig_folder): # Enter only directories that start with restraint
        if verbose:
            print(f'\n{i}. Umbrella Folder: {umbrella}')
            
        
        try: #Try block for log file and potential energy
            # Lod the log file and get the potential energy. Then append it to the list
            log = lmplog.File(f'{data_folder}{umbrella}/{log_file}')
            potEng.append(log.get('PotEng')[0])

            if verbose:
                print(f'{data_folder}{umbrella}/{log_file}{i}.log file found.')
                print(f'Number of frames: {len(log.get("PotEng"))}')
            
        except:
            print(f'No log file found for {data_folder}{umbrella}\n')
            break
        
            
        try: #Try block for coordinate araray. Extract coords from xyz file 
            xyz_traj_dict = read_xyz_traj(f'{data_folder}{umbrella}/{xyz_traj_file}')
            xyz_coordinates = xyz_cords_array(xyz_traj_dict,0)
            coordinates.append(xyz_coordinates)
            
            if verbose:
                print(f'{data_folder}{umbrella}/{xyz_traj_file} file found.')
                #print(f'Number of frames: {len(xyz_traj_dict["frames"])}')
            
        except:
            print(f'No xyz file found for {data_folder}{umbrella}\n')
            break
            
        try: #Block for forces
            forces_dict = parse_lammpstrj(f'{data_folder}{umbrella}/{forces_file}')
            
            # Sort the dictionary by ID, so every frmae is in the same order
            for key in forces_dict.keys():
                forces_dict[key]['atoms'].sort(key=lambda x: x[0])
                
            forces_temp = [] # Iniftialize list to store forces of the cong
            for row in forces_dict[0]['atoms'][:]:
                forces_temp.append(row[-3:])
            
            #Append to genral forces list    
            forces.append(forces_temp)
            if verbose:
                print(f'{data_folder}{umbrella}/{forces_file} forces file found.')
                #print(f'Number of frames: {len(forces_dict)}')
        except:
            print(f'No forces file found for {data_folder}{umbrella}/\n')
            break

        try:
            colvar = np.loadtxt(f'{data_folder}{umbrella}/dihedral.dat')
            print(f'{data_folder}{umbrella}/dihedral.dat Colvar file found')
            cv.append(colvar[0])
        except:
            print(f'No colvar file found for {data_folder}{umbrella}/dihedral.dat\n')
            break        
            
        


#Define type array for the Alanine Dipeptide molecuele. MAke sure follows the same order as the xyz file and the forces file
type_array = np.array(npz_aimd['z'])
#type_array = np.array(types)
cv = np.array(cv)
# COnvert list into np arrays
potEng = np.array(potEng)
coordinates = np.array(coordinates)
forces = np.array(forces) 



print(f'\nSummary of data:') 
print(f'\nShape of Potential Energy: {np.shape(potEng)}')
print(f'Shape of Coordinates: {np.shape(coordinates)}')
print(f'Shape of Forces: {np.shape(forces)}')
print(f'Shape of Colvar: {np.shape(cv)}')
print(f'Shape of Type Array: {np.shape(type_array)}')

# # #Save to npz file
# # np.savez(f'{data_folder}alanine_univ2D_CLC_2500frames_373K.npz', E=potEng, R=coordinates, F=forces, z=type_array)




2. Umbrella Folder: restraint_0
./lmp_spc_data/restraint_0/log.lammps2.log file found.
Number of frames: 2
./lmp_spc_data/restraint_0/traj_nnip.xyz file found.
./lmp_spc_data/restraint_0/forces.dump forces file found.
./lmp_spc_data/restraint_0/dihedral.dat Colvar file found

3. Umbrella Folder: restraint_1
./lmp_spc_data/restraint_1/log.lammps3.log file found.
Number of frames: 2
./lmp_spc_data/restraint_1/traj_nnip.xyz file found.
./lmp_spc_data/restraint_1/forces.dump forces file found.
./lmp_spc_data/restraint_1/dihedral.dat Colvar file found

4. Umbrella Folder: restraint_2
./lmp_spc_data/restraint_2/log.lammps4.log file found.
Number of frames: 2
./lmp_spc_data/restraint_2/traj_nnip.xyz file found.
./lmp_spc_data/restraint_2/forces.dump forces file found.
./lmp_spc_data/restraint_2/dihedral.dat Colvar file found

5. Umbrella Folder: restraint_3
./lmp_spc_data/restraint_3/log.lammps5.log file found.
Number of frames: 2
./lmp_spc_data/restraint_3/traj_nnip.xyz file found.
./lmp_sp

In [15]:
#Save to arrays to a npz file. THis files is the Case B with CLMD data
np.savez(f'{data_folder}Buatane_CaseB_Conv20ps_CLMD.npz', E=potEng, R=coordinates, F=forces, z=type_array)
