## Tutorial_04_preparation


* This is the tutorial demonstrating an example of mW parameterization
* The mW (monoatomic water) is a coarse-grained water model dervied from Stillinger-Weber potential form [1-2]. The model quantitatively predicts many physical properties of water with the accuracy comparable to those of atomistic water models, while being more computationally efficient.
* It is a short-ranged potential with pairwise potential plus a three-body term that penalizes the configuration that deviates from tetrahedral angle. 
* The full potential contains 11 free parameters. 
* Functional form: 
<img src="sw_potential.png" width="600">

### Genearting Initial configuration (mbuild): 

In [None]:
import mbuild as mb

# initialize a compound object
cg_water = mb.Compound()


# create a coarse-grained water bead,"_xx" name used for a coarse-grained system

cg_bead = mb.Particle(name='_H2O', pos=[0, 0, 0])

# add each bead to
cg_water.add([cg_bead])

# fill a box with the defined "compound". box size is chosen to match with density 0.997g/cm3 for water
cg_water_box = mb.fill_box(compound=cg_water,
                           n_compounds=512,
                           box=[2.4859,2.4859,2.4859],
                           seed=2020)

# use a .xml file to change the molar mass of coarse-grained bead to water's molar mass; save the configuration in LAMMPS data file format
cg_water_box.save('mW.lmp',
                  forcefield_files="water.xml",
                  overwrite=True,
                  foyer_kwargs={"assert_bond_params":False})



### Prepare the reference data (An example)
* In a Reference data folder (e.g. "ReferenceData"), mkdir "force" folder (e.g. force, rdf, isobars)
* Remember the folder path of Reference data folder containing all your reference data. e.g. "/project/ReferenceData".  
* Create a subfolder inisde it with a more descriptive name like "mW_300K_1bar"
* Perform a short production run of 2.5ns at 300K and 1bar to generate coordinates, force, and potential energy with sampling frequency of 5ps.
* Name the generated reference data as "Ref.xxx" where "xxx" is a property name. "Ref.xxx" is the default name used by objective function calculations, and they can be modified through source code in objective/force_matching or objective/rdf_matching or objective/isobar_matching
* Move those Reference data to the subfolder you created before. 

### Prepare a template folder for predicted data (An example)

* In a template folder for predicted data (e.g. prepsystem), mkdir "force" folder (e.g. force, rdf, isobars) 
* Remember the folder path of Reference data folder containing all your reference data. e.g. "/project/prepsystem"
* Make sure all subfolders inside the "properties" folder match exactly as those of Reference data
* For every folder, make sure you prepare necessary run input files for simulators of choice, and files associated with initial configuration. 
* Make sure the output predicted data file name is "predict.xxx" where "xxx" is a property name. "predict.xxx" is the default name used by objective function calculations, and they can be modified through source code in objective/force_matching or objective/rdf_matching or objective/isobar_matching

### Prepare the shell command

* Current package is tested to work with the Slurm command. 
* The command is a string with two formatted variables. Here is an example of runing the LAMMPS from command-line interface. e.g. "module load intel && srun -n %d -N1 -c1 --mpi=pmi2 lmp_ml_water < in.%s" where %d can be number of cores and the in.%s will be substituted by "in.properties", e.g. in.force, in.isobar, in.rdf
* Make sure the shell command can be correctly invoked from the command-line in each predicted folder.  
 

###  Prepare the input script

* choose units

real   

* objective function defined as the following format: 
* 'matching type', 'sub_folder name',"weight of current objective functions", "cores for running sampling","cores for evaluating objective functions"
* The corresponding python data type is: "string","string", "float","integer","integer" 
* "bf" is the buffer size. How many configurations to be read into memory for each core. 
* "w" is the weight

force mW_300K_1bar_500 1 2 2 bf 5000 w 0 1   

* shell command to launch sampling in each folder :  
* Python subprocess module launch these commands in non-blocking manner in each desired folder:
* sampling input file names will be subsistuted as: in.force, in.rdf, in.isobar ... 
* Default sampling package is LAMMPS 

module load intel/psxe-2019-64-bit && srun -n %d -N1 -c1 --exclusive --mpi=pmi2 lmp_ml_water < in.%s

### Reference: 

[1]: Molinero, V., & Moore, E. B. (2009). Water Modeled As an Intermediate Element between Carbon and Silicon. J. Phys. Chem. B, 113(13), 4008–4016. https://doi.org/10.1021/jp805227c

[2]: Stillinger, F. H., & Rahman, A. (1974). Improved Simulation of Liquid Water by Molecular-Dynamics. J. Chem. Phys., 60(4), 1545–1557. https://doi.org/10.1063/1.1681229

