# MB-Fit tutorial (v20190924)

This notebook will walk you through the multiple possibilities one has to obtain many-body fits for multiple molecules. 



## Chapter 0. Set up the notebook.

### 0.1. Import the python library
Remember that in order to import the library without any errors, you need to perform the following operations in the bash terminal from which you are running the notebook. If you didn't do it, please, close the notebook and write in a bash terminal:
```sh
cd HOME/DIRECTORY/OF/mbfit
source install.sh
```
Now the following command should run without any problem.

In [None]:
# This is for testing purposes. Can be ignored.
%load_ext autoreload
%autoreload 2

In [None]:
# The library that will enable the fitting generation and energy calculation
import mbfit
# Some other useful libraries
import os

## Example 3. Generate a CO2-CO2 two-body MB-nrg PEF

### 3.1. Definition of the variables

In [None]:
main_dir = os.getcwd()

In [None]:
# The software that will be used to perform all the calculations
code = "qchem"
#code = "psi4"

# The quantum chemistry method we want to use
method = "HF"
#method = "MP2"
#method = "wb97m-v"

# Basis set to use. Must be pre-defined in the software. Custom basis sets not implemented yet.
basis = "STO-3G"

# Use counter-poise correction or not.
cp = False
#cp = True

# Number of threads and memory we would like to use
num_threads = 2
memory = "4GB"

# This is the path where all the log files will be stored.
log_path = "logs"

In [None]:
# Names that will identify the monomers. This is used for identification purposes only.
names = ["CO2","CO2"]

# Number of atoms of each monomer
number_of_atoms = [3,3]

# Charge of each monomer
charges = [0,0]

# Spin multiplicity of each monomer
spin = [1,1]

# Use MB-pol for water (if applicable). 
# If 1 will use the Partridge-Shwenke PEF for water, with the position dependent charges.
use_mbpol = [0,0]

In [None]:
# Symmetry of the molecule
symmetry = ["A1B2", "A1B2"]

# SMILES string
smiles = ["C(O)O", "C(O)O"]

In [None]:
# Settings for monomer
mon_settings = "monomer_settings.ini"

my_settings_file_mon = """
[files]
# Local path directory to write log files in
log_path = """ + log_path + """

[config_generator]
# what library to use for geometry optimization and normal mode generation
code = """ + code + """
# use geometric or linear progression for T and A in config generation, exactly 1 must be True
geometric = False
linear = False

[energy_calculator]
# what library to use for energy calculations
code = """ + code + """

[psi4]
# memory to use when doing a psi4 calculation
memory = """ + memory + """
# number of threads to use when executing a psi4 calculation
num_threads = """ + str(num_threads) + """

[qchem]
# number of threads to use when executing a qchem calculation
num_threads = """ + str(num_threads) + """

[molecule]
# name of fragments, seperated by commas
names = """ + names[0] + """
# number of atoms in each fragment, seperated by commas
fragments = """ + str(number_of_atoms[0]) + """
# charge of each fragment, seperated by commas
charges = """ + str(charges[0]) + """
# spin multiplicity of each fragment, seperated by commas
spins = """ + str(spin[0]) + """
# tag when putting geometries into database
tag = none
# Use or not MB-pol
use_mbpol = """ + str(use_mbpol[0]) + """
# symmetry of each fragment, seperated by commas
symmetry = """ + symmetry[0] + """
SMILES = """ + smiles[0] + """
"""



In [None]:
# Settings for dimer
dim_settings = "dimer_settings.ini"

my_settings_file_dim = """
[files]
# Local path directory to write log files in
log_path = """ + log_path + """

[config_generator]
# what library to use for geometry optimization and normal mode generation
code = """ + code + """
# use geometric or linear progression for T and A in config generation, exactly 1 must be True
geometric = False
linear = False

[energy_calculator]
# what library to use for energy calculations
code = """ + code + """

[psi4]
# memory to use when doing a psi4 calculation
memory = """ + memory + """
# number of threads to use when executing a psi4 calculation
num_threads = """ + str(num_threads) + """

[qchem]
# number of threads to use when executing a qchem calculation
num_threads = """ + str(num_threads) + """

[molecule]
# name of fragments, seperated by commas
names = """ + names[0] + "," + names[1] + """
# number of atoms in each fragment, seperated by commas
fragments = """ + str(number_of_atoms[0]) + """,""" + str(number_of_atoms[1]) + """
# charge of each fragment, seperated by commas
charges = """ + str(charges[0]) + """,""" + str(charges[1]) + """
# spin multiplicity of each fragment, seperated by commas
spins = """ + str(spin[0]) + """,""" + str(spin[1]) + """
# tag when putting geometries into database
tag = none
# Use or not MB-pol
use_mbpol = """ + str(use_mbpol[0]) + """,""" + str(use_mbpol[1]) + """
# symmetry of each fragment, seperated by commas
symmetry = """ + symmetry[0] + """,""" + symmetry[1] + """
SMILES = """ + smiles[0] + """,""" + smiles[1] + """
"""

In [None]:
# Write the files:
ff = open(mon_settings,'w')
ff.write(my_settings_file_mon)
ff.close()

ff = open(dim_settings,'w')
ff.write(my_settings_file_dim)
ff.close()

In [None]:
# XYZ file that contains the unoptimized geommetry of the monomer
unopt_mon = "monomer.xyz"

my_unopt_monomer = """3
unoptimized co2
C   0   0   0
O   1.3   0   0
O   -1.3  0   0
"""

In [None]:
# Write the file:
ff = open(unopt_mon,'w')
ff.write(my_unopt_monomer)
ff.close()

In [None]:
# XYZ file that contains the optimized geommetry of the monomer
opt_mon = "monomer_opt.xyz"

# File where normal modes of monomer 1 will be outputed
normal_modes_mon = "monomer_normal_modes.dat"

# Same for dimer
unopt_dim = "dimer.xyz"
opt_dim = "dimer_opt.xyz"
normal_modes_dim = "dimer_normal_modes.dat"

In [None]:
# XYZ file with the configurations of the training set
rigid_training_configs = "rigid_training_configs.xyz" 
flex_training_configs = "flex_training_configs.xyz"
normal_mode_training_configs = "normal_mode_training_configs"

ttm_training_configs = "ttm_training_configs.xyz"

# XYZ file with the configurations of the test set
rigid_test_configs = "rigid_test_configs.xyz" 
flex_test_configs = "flex_test_configs.xyz"
normal_mode_test_configs = "normal_mode_test_configs"

ttm_test_configs = "ttm_test_configs.xyz"

# Distorted monomer configurations for the flexible training set
mon_distorted = "mon_distorted.xyz"

# And the screened values
mon_screened = "mon_screened.xyz"

# XYZ file with the training set that the codes need to perform the fit
# Configurations are the same as training_configs but this file
# has the energies in the comment line
training_set = "training_set.xyz"
ttm_training_set = "ttm_training_set.xyz"

# XYZ file with the test set that the codes need to perform the fit
# Configurations are the same as test_configs but this file
# has the energies in the comment line 
test_set = "test_set.xyz"
ttm_test_set = "ttm_test_set.xyz"


In [None]:
# PostgreSQL database that stores structures and energies
database_config = "local.ini"
client_name = "pikachu"

In [None]:
my_database_settings = """[database]
host = piggy.pl.ucsd.edu
port = 5432
database = potential_fitting
username = potential_fitting
password = 9t8ARDuN2Wy49VtMOrcJyHtOzyKhkiId
"""

# Write the file. Remember to update the username and password!
ff = open(database_config,'w')
ff.write(my_database_settings)
ff.close()

In [None]:
# Monomers 1 and 2 separated by '_'
molecule_in = "_".join(symmetry)

# Configuration file that contains all the monomer 
# and dimer information. Will be used to generate the 2B codes.
config = "config.ini"

# Input file for the polynomial generation
poly_in = "poly.in"

# Directory where the polynomials will be generated
poly_directory = "polynomial_generation"

# Degree of the polynomials
polynomial_order = 2

In [None]:
# Directory where mb-nrg fitting code will be stored
mbnrg_directory = "mb-nrg_fitting_code"
mbnrg_fits_directory = "mb-nrg_fits"

In [None]:
# Number of configurations in the 2b training_set
num_training_configs = 1000
############################
num_rigid_training_configs = int(0.35*num_training_configs)
num_flex_training_configs = int(0.5*num_training_configs)
num_nm_training_configs = int(0.15*num_training_configs)

# Number of configurations in the 2b test set
num_test_configs = int(0.2*num_training_configs)

num_rigid_test_configs = int(0.35*num_test_configs)
num_flex_test_configs = int(0.5*num_test_configs)
num_nm_test_configs = int(0.15*num_test_configs)
############################

# Number of distorted configurations for monomer 1 and monomer 2
num_mon_distorted = 100

# Maximum energy allowed for distorted monomers (in kcal/mol)
mon_emax = 30.0

# Maximum binding energy allowed
bind_emax = 500.0

# Minimum and maximum distance between the two monomers
min_d_2b = 1.0
max_d_2b = 9.0

# Minimum fraction of the VdW distance that is allowed between any atoms that belong to different monomers
min_inter_d = 0.5

# Seeds to be used in the configuration generation to ensure different
# configurations for training and test
seed_training = 23410
seed_test = 93109

# IDs of the monomers (should be consistent with the 1B id for each)
mon_ids = ["co2","co2"]

# Number of MB-nrg fits to perform
num_mb_fits = 5

### 3.2. Generate polynomials

#### 3.2.1. Generate polynomial input file

In [None]:
help(mbfit.generate_poly_input)

In [None]:
mbfit.generate_poly_input(dim_settings, molecule_in, poly_in)

#### 3.2.2. Generate polynomial files

In [None]:
help(mbfit.generate_polynomials)

In [None]:
mbfit.generate_polynomials(dim_settings, poly_in, polynomial_order, poly_directory, generate_direct_gradients=True)

#### 3.2.3. Optimize the polynomial evaluation

In [None]:
help(mbfit.execute_maple)

In [None]:
mbfit.execute_maple(dim_settings, poly_directory)

### 3.3. Geometry optimization and normal mode calculation

#### 3.3.1. Monomers

In [None]:
help(mbfit.optimize_geometry)

In [None]:
# Optimize monomer
mbfit.optimize_geometry(mon_settings, unopt_mon, opt_mon, method, basis)

In [None]:
help(mbfit.generate_normal_modes)

In [None]:
# Get its normal modes
mbfit.generate_normal_modes(mon_settings, opt_mon,normal_modes_mon, method, basis)

#### 3.3.2. Dimer

Now the same for the dimer.

In [None]:
help(mbfit.generate_2b_configurations)

In [None]:
# Generate a dimer
mbfit.generate_2b_configurations(dim_settings, opt_mon, opt_mon, 
                                             1, unopt_dim, 
                                             min_distance = 2, max_distance = 5, 
                                             min_inter_distance = 0.8, 
                                             progression=False, use_grid=False, 
                                             step_size=0.5, num_attempts=100, 
                                             logarithmic=True, distribution=None, 
                                             mol1_atom_index=None, mol2_atom_index=None, 
                                             seed=seed_training)

In [None]:
# Optimize the dimer
mbfit.optimize_geometry(dim_settings, unopt_dim, opt_dim, method, basis)

In [None]:
# Get its normal modes
mbfit.generate_normal_modes(dim_settings, opt_dim,normal_modes_dim, method, basis)

### 3.5. Obtain config file

In [None]:
# C6, A and b parameters are obtained from example 2
example2_c6 = [319.9415, 221.5987, 173.3298]
example2_d6 = [3.08949, 3.71685, 4.09252]
example2_a = [15312.3, 20732.5, 78777.2]

In [None]:
help(mbfit.get_system_properties)

In [None]:
chg, pol, c6 = mbfit.get_system_properties(dim_settings, config, geo_paths = [opt_mon,opt_mon])

In [None]:
help(mbfit.write_config_file)

In [None]:
mbfit.write_config_file(dim_settings, config, chg, pol, 
                                    [opt_mon, opt_mon], C6 = example2_c6, 
                                    d6=example2_d6, A=example2_a)

### 3.7. MB-nrg Training and Test Set generation

#### 3.7.1. Rigid Training Set

##### Generate configurations

In [None]:
# Training Set
mbfit.generate_2b_configurations(dim_settings, opt_mon, opt_mon, 
                                             num_rigid_training_configs, rigid_training_configs, 
                                             min_distance = min_d_2b, max_distance = max_d_2b, 
                                             min_inter_distance = min_inter_d, 
                                             progression=True, use_grid=False, 
                                             step_size=0.5, num_attempts=100, 
                                             logarithmic=True, distribution=None, 
                                             mol1_atom_index=None, mol2_atom_index=None, 
                                             seed=seed_training)

In [None]:
# Test Set
mbfit.generate_2b_configurations(dim_settings, opt_mon, opt_mon, 
                                             num_rigid_test_configs, rigid_test_configs, 
                                             min_distance = min_d_2b, max_distance = max_d_2b, 
                                             min_inter_distance = min_inter_d, 
                                             progression=True, use_grid=False, 
                                             step_size=0.5, num_attempts=100, 
                                             logarithmic=True, distribution=None, 
                                             mol1_atom_index=None, mol2_atom_index=None, 
                                             seed=seed_training)

##### Add configurations to the database

In [None]:
help(mbfit.init_database)

In [None]:
# Training set
mbfit.init_database(dim_settings, database_config, rigid_training_configs, 
                                method, basis, cp, "train_rig_ex3", optimized = False)

In [None]:
# Test Set
mbfit.init_database(dim_settings, database_config, rigid_test_configs, 
                                method, basis, cp, "test_rig_ex3", optimized = False)

In [None]:
# Add monomer optimized geommetry to database (needed for binding energy)
mbfit.init_database(mon_settings, database_config, opt_mon, method, basis, cp, "train_rig_ex3", optimized = True)
mbfit.init_database(mon_settings, database_config, opt_mon, method, basis, cp, "test_rig_ex3", optimized = True)

#### 3.7.2. Flexible Configurations

##### Generate distorted monomer configurations

In [None]:
help(mbfit.generate_normal_mode_configurations)

In [None]:
# Generate the normal mode configurations for the monomers:
mbfit.generate_normal_mode_configurations(mon_settings, opt_mon, normal_modes_mon,
                                          mon_distorted, number_of_configs=num_mon_distorted,
                                          seed=seed_training + 1, classical=True)

##### Add them to the database along with the optimized geometries

In [None]:
# Add configurations to database
mbfit.init_database(mon_settings, database_config, 
                                mon_distorted, method, basis, cp, 
                                "mondist_ex3", optimized = False)

In [None]:
# Now add optimized geometries
mbfit.init_database(mon_settings, database_config, opt_mon, 
                                method, basis, cp, "mondist_ex3", optimized = True)


##### Calculate their energy

In [None]:
mbfit.fill_database(mon_settings, database_config, client_name, "mondist_ex3", 
                                calculation_count = None)

##### Retrieve the configurations

In [None]:
help(mbfit.generate_training_set)

In [None]:
mbfit.generate_training_set(mon_settings, database_config, 
                                        mon_screened, method, basis, cp, 
                                        "mondist_ex3", 
                                        e_bind_max=bind_emax,
                                        e_mon_max=mon_emax)

##### Generate the flexible training and test set configurations

In [None]:
# Training set
mbfit.generate_2b_configurations(dim_settings, mon_screened, mon_screened, 
                                             num_flex_training_configs, flex_training_configs, 
                                             min_distance = min_d_2b, max_distance = max_d_2b, 
                                             min_inter_distance = min_inter_d, 
                                             progression=True, use_grid=False, 
                                             step_size=0.5, num_attempts=100, 
                                             logarithmic=True, distribution=None, 
                                             mol1_atom_index=None, mol2_atom_index=None, 
                                             seed=seed_training + 10)

In [None]:
# Test set
mbfit.generate_2b_configurations(dim_settings, mon_screened, mon_screened, 
                                             num_flex_test_configs, flex_test_configs, 
                                             min_distance = min_d_2b, max_distance = max_d_2b, 
                                             min_inter_distance = min_inter_d, 
                                             progression=True, use_grid=False, 
                                             step_size=0.5, num_attempts=100, 
                                             logarithmic=True, distribution=None, 
                                             mol1_atom_index=None, mol2_atom_index=None, 
                                             seed=seed_test + 10)

##### Add them to the database

In [None]:
# Training set
mbfit.init_database(dim_settings, database_config, flex_training_configs, 
                                method, basis, cp, "train_flex_ex3", optimized = False)

In [None]:
# Test Set
mbfit.init_database(dim_settings, database_config, flex_test_configs, 
                                method, basis, cp, "test_flex_ex3", optimized = False)

In [None]:
# Add monomer optimized geommetry to database (needed for binding energy)
mbfit.init_database(mon_settings, database_config, opt_mon, method, basis, cp, "train_flex_ex3", optimized = True)
mbfit.init_database(mon_settings, database_config, opt_mon, method, basis, cp, "test_flex_ex3", optimized = True)

#### 3.7.3. Normal mode training set

##### Generate the configurations

In this case we are going to generate normal mode configurations for the dimer, but we will use a low temperature to ensure that we only sample the area around the minimum, and that we don't get too distorted configurations.

In [None]:
# Training Set
mbfit.generate_normal_mode_configurations(dim_settings, opt_dim, normal_modes_dim, 
                                                      normal_mode_training_configs, 
                                                      num_nm_training_configs, 
                                                      seed_training + 20, temperature = 100)

In [None]:
# Test Set
mbfit.generate_normal_mode_configurations(dim_settings, opt_dim, normal_modes_dim, 
                                                      normal_mode_test_configs, 
                                                      num_nm_test_configs, 
                                                      seed_test + 20, temperature = 100)

##### Add them to the database

In [None]:
# Training Set
mbfit.init_database(dim_settings, database_config, 
                                normal_mode_training_configs, method, basis, cp, 
                                "train_nm_ex3", optimized = False)

In [None]:
# Test Set
mbfit.init_database(dim_settings, database_config, 
                                normal_mode_test_configs, method, basis, cp, 
                                "test_nm_ex3", optimized = False)

In [None]:
# Add monomer optimized geommetry to database (needed for binding energy)
mbfit.init_database(mon_settings, database_config, opt_mon, method, basis, cp, "train_nm_ex3", optimized = True)
mbfit.init_database(mon_settings, database_config, opt_mon, method, basis, cp, "test_nm_ex3", optimized = True)

#### 3.7.4. Fill the database

In [None]:
mbfit.fill_database(dim_settings, database_config, client_name, 
                                "train_rig_ex3", 
                                "test_rig_ex3",
                                "train_flex_ex3", 
                                "test_flex_ex3",
                                "train_nm_ex3", 
                                "test_nm_ex3",
                                calculation_count = None)

#### 3.7.5. Training set and Test set generation

Generates the training set file in the format that will be needed in the fitting codes. If your database contains energies computed with a variety of methods/basis, **only one method and basis can be used in the same training set**. The format of the training set is the same as the configurations generated for the training set in previous steps. The difference is that now, the comment line will have the binding, and n-body energy of that configuration.

In [None]:
# Obtain training set
mbfit.generate_training_set(dim_settings, database_config, 
                                        training_set, method, basis, cp, 
                                        "train_rig_ex3", 
                                        "train_flex_ex3", 
                                        "train_nm_ex3", 
                                        e_bind_max = bind_emax, e_mon_max = mon_emax)

In [None]:
# Obtain test set
mbfit.generate_training_set(dim_settings, database_config, 
                                        test_set, method, basis, cp,
                                        "test_rig_ex3", 
                                        "test_flex_ex3",
                                        "test_nm_ex3", 
                                        e_bind_max = bind_emax, e_mon_max = mon_emax)

### 3.8. MB-nrg fit

#### 3.8.1. Obtain and compile the fitting code

In [None]:
help(mbfit.generate_mbnrg_fitting_code)

In [None]:
os.chdir(main_dir)
mbfit.generate_mbnrg_fitting_code(dim_settings, config, 
                                              poly_in, poly_directory, 
                                              polynomial_order, mbnrg_directory, 
                                              use_direct=False)

In [None]:
help(mbfit.compile_fit_code)

In [None]:
mbfit.compile_fit_code(dim_settings, mbnrg_directory)

#### 3.8.2. Perform the fit

In [None]:
help(mbfit.prepare_fits)

In [None]:
mbfit.prepare_fits(dim_settings, mbnrg_directory, 
                               training_set, mbnrg_fits_directory, 
                               DE=20, alpha=0.0005, num_fits=num_mb_fits, 
                               ttm=False, over_ttm=False)

In [None]:
help(mbfit.execute_fits)

In [None]:
mbfit.execute_fits(dim_settings, mbnrg_fits_directory)

In [None]:
help(mbfit.retrieve_best_fit)

In [None]:
mbfit.retrieve_best_fit(dim_settings, mbnrg_fits_directory, fitted_nc_path = "mbnrg.nc")

### 3.9 Visuzalize the fit

In [None]:
help(mbfit.get_correlation_data)

In [None]:
energies = mbfit.get_correlation_data(dim_settings, mbnrg_directory, 
                                                  mbnrg_fits_directory, test_set,
                                                  min_energy_plot = -5.0,
                                                  max_energy_plot = 50.0,
                                                  split_energy = 10.0)

### 3.10 Add files to MBX

In [None]:
help(mbfit.generate_MBX_files)

In [None]:
mbfit.generate_MBX_files(dim_settings, config, mon_ids, 
                                     do_ttmnrg=False, mbnrg_fits_path=mbnrg_fits_directory,  
                                     MBX_HOME = None, version = "v1")