## Tutorial: Using gen_ini_full_cv Python Script for Generating Uniformly Distributed Initial Configurations

### Introduction

The gen_ini_full_cv script is a powerful tool designed to generate uniformly distributed initial configurations in a 2D collective variable space. This tutorial will guide you through the process of using this script effectively.


### Usage

The initial_config.py script accepts several command-line arguments to customize the generation process. Be sure to habe the the lammps and plumed reference files.

Here's a breakdown of the available options:

    cv1_bin : Number of bins for the first collective variable (Cv_1).
    cv1_min : Minimum value of Cv_1.
    cv1_max : Maximum value of Cv_1.
    cv2_bin : Number of bins for the second collective variable (Cv_2).
    cv2_min : Minimum value of Cv_2.
    cv2_max : Maximum value of Cv_2.
    --plot : Flag to visualize the generated points.

To execute the script, open a terminal or command prompt and run the following command:

python initial_configs.py 224 -3.14 3.14 224 -3.14 3.14

This will generate 50176 initial conviguration equally distributed between -pi to pi in both dimensions



## Selection on initial configurations

This step consist of running a simulation with each of the created initial configuration from each step. THis is achieved using the 'run_cv_config_select_1.py' script. Thi sscript will recreate the referece array from the CVs so make sure to modify the bins and min-max values for each arra, so they mathc the values used in the previous step. Each simulation is ran for 1 ns while the phi and psi values are   (Cvs) are colected, at the end of the simulation the CV values are compared to the reference value. If the error is less that 0.3%, the inthex of the row with the smalles error is stored and the code, continues to the next iterarion. If the error is more than 0.3%, the seed number of the velocity in the lammps imput file is changed, and the process is repeated until the value is obtained..


In [1]:
# Uncomment for usage 
# os.system('python run_cv_config_select_1.py')


Now that the configurations with the desired CV values has been obtained we will proceed to run single point claculations of said configurationsto generate the ADP_OnlyMinima training data set. FOr storage purposes the productions_runs folder has been shared as a zip file. 

Now run the lmp_spc_prep.py script to run a single pooint calculation with classical level MD of the runs which had sampled CV values with a percent difference lower than 0.3%.

Run the 'selected_phi_psi.py' to generate a second list of the selected indices. 


In [10]:
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Name of relevant files and folders
data_folder = './production_runs/'
# workig_folder = 'umbrella_'
log_file = 'adp_clmd_2ns_umb_' #'log.lammps'
forces_file = 'forces.dump'
xyz_traj_file = 'traj_nnip.xyz'
lmp_spc_folder = './lmp_spc_500Frames/' # Folder were new data files will be saved. LAMMPS spc
workig_folder = 'umbrella_'
og_data_file = './example.input'
lmp_input_file = './input_0.inp'
verbose =True
indices_file = 'index_pcnt_diff.txt'

In [5]:
# Load the phi and psi values from the file
# selected_phi_psi = np.loadtxt('selected_phi_psi.txt')

# Load npy file
selected_phi_psi = np.load('selected_phi_psi.npy')
print(selected_phi_psi.shape)
umbrella = selected_phi_psi[:,0]
index = selected_phi_psi[:,1]
phi = selected_phi_psi[:,2]
psi = selected_phi_psi[:,3]

## IMPORTANT NOTE: This values represent the the index (frame) per umbrella where the phi-psi combination are closer to the desired value


In [7]:
# Convert to pandas dataframe
import pandas as pd
selected_phi_psi = pd.DataFrame(selected_phi_psi, columns=['umbrella', 'index', 'phi', 'psi'])
#selected_phi_psi.head()

# Delete the rows that have 0.0 in the index,phi,psi columns
selected_phi_psi = selected_phi_psi.loc[selected_phi_psi['index'] != 0.0]
#selected_phi_psi

Once the indices have been selected, we can proceed to select the regions in which we want to take samples



In [None]:
## Selection of points

phi_values =  np.rad2deg(selected_phi_psi['phi']) # Replace with actual phi values
psi_values =    np.rad2deg(selected_phi_psi['psi']) # Replace with actual psi values

# List of user-provided phi and psi values
user_values_list = [(-143.66420274551217, 155.47320410490312),
(-79.67265047518481, 59.00798175598632),
(60.34846884899679, -41.90421892816417),
(-84.7412882787751, 136.65906499429877)]  # Add more values as needed

# Number of points to select
m = 125   ## for 500 samples
#m = 250  ## for 1000 samples
#m = 625  ###for 2500 samples 



In [None]:
## Get euclidean distance between available points and selected centers
indexes =[]

for user_values in user_values_list[:]:
    user_phi, user_psi = user_values
    print(f"User input. phi = {user_phi}, psi = {user_psi}" )

    # Calculate the distance between the user-provided phi and psi values and the phi and psi values in the dataset
    distances = np.sqrt((phi_values - user_phi)**2 + (psi_values - user_psi)**2)
    print(f"Shape of distances array {distances.shape}")

    # Get indices of m closest points
    indices = np.argsort(distances)[:m] 
    print(f"Shape of indices array {indices.shape}")
    print(indices)

    # append indices to list
    indexes.append(indices)

# Convert list to numpy array
indexes = np.array(indexes).T
#reshape array to 1D
indexes = indexes.reshape(-1)
print(f"Shape of indexes array {indexes.shape}")

euclid_selected_phi_psi = selected_phi_psi.iloc[indexes]

In [None]:
## Plot the phi vs psi of the selected points

plt.scatter(np.rad2deg(euclid_selected_phi_psi['phi']),np.rad2deg(euclid_selected_phi_psi['psi']),marker='o',c='r',s=10)
plt.xlabel('Phi',fontsize=14)
plt.ylabel('Psi',fontsize=14)
plt.title(f'{len(user_values_list)*m} Selected Points')
# plt.legend(loc='upper right', bbox_to_anchor=(1, 1))
# plt.xlim(-np.pi,np.pi)
# plt.ylim(-np.pi,np.pi)

plt.show()

In [12]:
##  Code to select coordiantes of configurations and create files for lammps spc 

# grab the values of the umbrella and index columns in arrays 
umbrellas = euclid_selected_phi_psi['umbrella'].values
index = euclid_selected_phi_psi['index'].values


In [13]:
# # Create folder to store the new data files

# ## Check if the lmp_spc_folder exits, if not create it
# if not os.path.exists(lmp_spc_folder):
#     os.makedirs(lmp_spc_folder)

#     #i = 1

# # Loop over indices from selected 
# for i,umbrella in enumerate(umbrellas[:5]):
#     print(f'\n\nProcessing Umbrella: {int(umbrella)}')

#     # 1. Determine the working directory and the index of interest for each umbrella.
#     working_dir = f'{data_folder}umbrella_{int(umbrella)}/'
#     print(f'Working directory: {working_dir}')
#     print(f'Index with smallest error is {int(index[i])}')

#     ## 2. Determine the working xyz file.
#     working_xyz_file = working_dir + xyz_traj_file
#     print(f'Working xyz file: {working_xyz_file}')

#     # Read the coordinates from the xyz file
#     xyz_traj_dict = read_xyz_traj(working_xyz_file)

#     # Extract the coordinates from the dictionary
#     xyz_coordinates = xyz_cords_array(xyz_traj_dict,int(index[i]))   #xyz_traj_dict['frames'][indices[i]]

#     # Substitute the coordinates in the lammps data file
#     lmp_data_subs_coord(xyz_coordinates,og_data_file,f'{lmp_spc_folder}frame_{int(umbrella)}.data',LOUD=False)
#     # REname the filename in the read_data command in the lammps input file
#     change_data_file_name(f'{lmp_input_file}',f'frame_{int(umbrella)}.data',LOUD=False)

#     #3. Create directory to store lammps data and input files.
#     os.makedirs(lmp_spc_folder+f'umbrella_{int(umbrella)}', exist_ok=True)

#     print(f'Created directory: {lmp_spc_folder}umbrella_{int(umbrella)}')

#     # move the data file to the folder
#     os.system(f'mv {lmp_spc_folder}frame_{int(umbrella)}.data {lmp_spc_folder}/umbrella_{int(umbrella)}/')

#     # Copy the input file to the folder
#     os.system(f'cp {lmp_input_file} ./plumed.dat {lmp_spc_folder}/umbrella_{int(umbrella)}/')

# print(f'\n\nAll Done! Thank you!!\n\n')
