# MB-Fit tutorial (v20190924)

This notebook will walk you through the multiple possibilities one has to obtain many-body fits for multiple molecules. 



## Chapter 0. Set up the notebook.

### 0.1. Import the python library
Remember that in order to import the library without any errors, you need to perform the following operations in the bash terminal from which you are running the notebook. If you didn't do it, please, close the notebook and write in a bash terminal:
```sh
cd HOME/DIRECTORY/OF/POTENTIAL_FITTING
source install.sh
```
Now the following command should run without any problem.

In [None]:
# This is for testing purposes. Can be ignored.
%load_ext autoreload
%autoreload 2

In [None]:
# The library that will enable the fitting generation and energy calculation
import potential_fitting
# Some other useful libraries
import os

## Chapter 1. One-body MB-nrg PEF for CO2

### 1.1. Define variables, filepaths, and folders to work in

The working directory is the path where the files should be. If they are not in the working directory, the full path to the file should be provided. In order to run this example for CO2, we will create a folder called co2_1b in the current directory.

In [None]:
main_dir = os.getcwd()
os.system("mkdir -p chapter1_co2_1b_mbnrg")
os.chdir("chapter1_co2_1b_mbnrg")

#### Specifications of the QC calculations
Next we are gonna define the method that we want to use to calculate energies, along with some other technical details.

In [None]:
# The software that will be used to perform all the calculations
#code = "qchem"
code = "psi4"

# The quantum chemistry method we want to use
method = "HF"
#method = "MP2"
#method = "wb97m-v"

# Basis set to use. Must be pre-defined in the software. Custom basis sets not implemented yet.
basis = "STO-3G"

# Use counter-poise correction or not.
cp = False
#cp = True

# Number of threads and memory we would like to use
num_threads = 2
memory = "4GB"

# This is the path where all the log files will be stored.
log_path = "logs"

#### Monomer specifications
This section will be used to define all the specifications that define the monomer. In this case there is only one monomer, but we still need to specify the properties as a list, since when more monomers are present, we need to specify them in a list.

In [None]:
# Names that will identify the monomers. This is used for identification purposes only.
names = ["CO2"]

# Number of atoms of each monomer
number_of_atoms = [3]

# Charge of each monomer
charges = [0]

# Spin multiplicity of each monomer
spin = [1]

# Use MB-pol for water (if applicable). 
# If 1 will use the Partridge-Shwenke PEF for water, with the position dependent charges.
use_mbpol = [0]

The symmetry tag requires a little bit of explanation. It contains the atom identity for the monomers. Some examples are `symmetry = ["A1B4"]` for methane monomer, `symmetry = ["A1B2","A1B2"]` for a CO2 dimer, `symmetry = ["A6B6","C1D2"]` for a benzene -- water dimer without lone pairs, and `symmetry = ["A1B2Z2","C1D2"]` for a H2O -- SO2 dimer with lone pairs. The rules are the following:
    * Symmetry names must be written in capital letters and start with A for the first atom of the first monomer. Any new atom type will be assigned the next letter of the alphabet.
    * Exchangable atoms must have the same label, even if they are in different molecules.
    * As for today, no more than 9 atoms of the same atom type is accepted.
    * If there are virtual sites such as lone pairs that will play a role in the polynomials, they must be labels with letters X, Y, or Z.
    * If two groups inside the same molecule have the same symmetry, they should be separated. As an example, DMSO should have a symmetry `symmetry = ["A1B3_A1B3_C1D1"]`. This allows permutation within the whole methyl groups, but not within the different carbons or hydrogens individually between the two methyl groups.
    * The symmetry order MUST match the xyz order.

The SMILES tag also requires a little bit of explanation. One can get the smiles from open babel:
`obabel -ixyz input.xyz -osmiles -O smiles.txt`
The order of the atoms in the SMILES string must also match the XYZ order.

In [None]:
# Symmetry of the molecule
symmetry = ["A1B2"]

# SMILES string
smiles = ["C(O)O"]

#### Creating files needed by the code

As for 08/12/2019, the `settings` files are still needed. This example is only for a CO2 monomer, so only one settings file is needed.

In [None]:
# Settings for monomer
mon_settings = "monomer_settings.ini"

my_settings_file = """
[files]
# Local path directory to write log files in
log_path = """ + log_path + """

[config_generator]
# what library to use for geometry optimization and normal mode generation
code = """ + code + """
# use geometric or linear progression for T and A in config generation, exactly 1 must be True
geometric = False
linear = False

[energy_calculator]
# what library to use for energy calculations
code = """ + code + """

[psi4]
# memory to use when doing a psi4 calculation
memory = """ + memory + """
# number of threads to use when executing a psi4 calculation
num_threads = """ + str(num_threads) + """

[qchem]
# number of threads to use when executing a qchem calculation
num_threads = """ + str(num_threads) + """

[molecule]
# name of fragments, seperated by commas
names = """ + names[0] + """
# number of atoms in each fragment, seperated by commas
fragments = """ + str(number_of_atoms[0]) + """
# charge of each fragment, seperated by commas
charges = """ + str(charges[0]) + """
# spin multiplicity of each fragment, seperated by commas
spins = """ + str(spin[0]) + """
# tag when putting geometries into database
tag = none
# Use or not MB-pol
use_mbpol = """ + str(use_mbpol[0]) + """
# symmetry of each fragment, seperated by commas
symmetry = """ + symmetry[0] + """
SMILES = """ + smiles[0] + """
"""

In [None]:
# Write the file:
ff = open(mon_settings,'w')
ff.write(my_settings_file)
ff.close()

Unoptimized geometries of the two monomers are inputed as an [XYZ formatted file](https://en.wikipedia.org/wiki/XYZ_file_format). 

In [None]:
# XYZ file that contains the unoptimized geommetry of monomer 1
unopt_mon = "monomer.xyz"

my_unopt_monomer = """3
unoptimized co2
C   0   0   0
O   1.3   0   0
O   -1.3  0   0
"""

In [None]:
# Write the file:
ff = open(unopt_mon,'w')
ff.write(my_unopt_monomer)
ff.close()

#### Defining files that will be written by the code

In [None]:
# XYZ file that contains the optimized geommetry of monomer 1
opt_mon = "monomer_opt.xyz"

# File where normal modes of monomer 1 will be outputed
normal_modes_mon = "monomer_normal_modes.dat"

Training and test set files. The `training configs` and `test_configs` files will contain the configurations generated by the training set generation functions. Only the geometry (i.e., the coordinates of all atoms for each configuration) will be stored in those files. Later on, we will calculate the energies for each of these configurations, and create the files in the format that will be inputed to the fitting code. These new files are going to be defined in `training_set` and `test_set`. The coordinates in these files will be the same as in the initial `training configs` and `test_configs`, but now the comment line will be filled with the energies needed by the fitting code.

In [None]:
# XYZ file with the configurations of the training set
training_configs = "training_configs.xyz"

# XYZ file with the configurations of the test set
test_configs = "test_configs.xyz"

# XYZ file with the training set that the codes need to perform the fit
# Configurations are the same as training_configs but this file
# has the energies in the comment line
training_set = "training_set.xyz"

# XYZ file with the test set that the codes need to perform the fit
# Configurations are the same as test_configs but this file
# has the energies in the comment line 
test_set = "test_set.xyz"

The information about the training, test, energies... is stored in an `postgreSQL` database. In principle there is no need to interact with this database, since everything is automatized, but you might want to retrieve some information at some point.

The database_config `.ini` file should contain one section `[database]` with 5 properties:
* `host`: The address of the server where the database is hosted.
* `port`: The port used to connect to the database.
* `database`: The name of the database.
* `username`: Your username to connect to the database.
* `password`: Your password to connect to the database.

For now use these parameters:

* `host`: piggy.pl.ucsd.edu
* `port`: 5432
* `database`: potential_fitting
* `username`: potential_fitting
* `password`: Please contact Ethan or Kaushik for the password.

The username potential_fitting was established as a general username that anyone who only needs basic access to the database can use. Alternatively, each user has their own username and password you can use. For most of you, this should be the same as your ucsd email prefix and password.

The file database.ini does not exist in the git repo, so you will have to create a file and update the variable below to be its filepath. For some reason, python doesn't like it when you use `~` to specify your home directory, so provide a relateive or absolute path instead. It is recommended that you create the file in your home directory.

<h3 style="color:red;">Make sure only you have read access to this file using the chmod command or else anyone on our fileserver will be able to see your password and <b>PLEASE DO NOT ACCIDENTALLY COMMIT A FILE CONTAINING YOUR PASSWORD VIA GIT!</b></h3>

cleint_name is used in the database to track what machines performed what calculations. Please use something that indicates where you are running the calculations.

In [None]:
# PostgreSQL database that stores structures and energies
database_config = "local.ini"
client_name = "motzu the survivor"

In [None]:
my_database_settings = """[database]
host = piggy.pl.ucsd.edu
port = 5432
database = potential_fitting
username = potential_fitting
password = 9t8ARDuN2Wy49VtMOrcJyHtOzyKhkiId
"""

# Write the file. Remember to update the username and password!
ff = open(database_config,'w')
ff.write(my_database_settings)
ff.close()

These files will be used by the polynomial generation functions. 
- `poly_in` is the name of the file that will contain all the information about the polynomials: distances, variables, filters... Later on, once it is created, extra filters can be added. How to add them will be explained later in the tutorial.
- `molecule_in` is the symmetry name of your system. It must match the symmetry specified in the settings file, and must follow the same rules. If you are fitting a system larger than a monomer (two-body, three-body...), this is the monomer names separated by an underscore (`_`). As an example, for a CO2 dimer, `molecule_in = A1B2_A1B2`, while for a NH4+ -- H2O dimer, `molecule_in = A1B4_C1D2`.
- `poly_directory` is the folder that will be created and will contain all the polynomial files generated.
- `monX_config` is the config file from monomer X, should have been generated during the 1b fit for that monomer.
- `dimX_config` is the config file from dimer X, should have been generated during the 2b fit for that dimer.
- `config` is the name that the file containing all the chemical and physical information about monomers such as charges, C6 coefficients, polarizabilities... It will be generated by the code.
- `polynomial_order` specifies the maximum order of the polynomials. 

In [None]:
# Input file for the polynomial generation
poly_in = "poly.in"

# Monomers 1 and 2 separated by '_'
molecule_in = symmetry[0]

# Directory where the polynomials will be generated
poly_directory = "polynomial_generation"

# Configuration file that contains all the monomer 
# and dimer information. Will be used to generate the 2B codes.
config = "config.ini"

# Degree of the polynomials
polynomial_order = 2

#### Directories for the different sections

These variables specify the directories where the fitting code for each type of PEF is going to be created.
- `mbnrg_directory` will contain the code that fits MB-nrg PEFs for the system specified.

In [None]:
# Directory where mb-nrg fitting code will be stored
mbnrg_directory = "mb-nrg_fit"

#### Multiple variables that will be used later

In [None]:
# Number of configurations in the 1b training_set
num_training_configs = 500

# Number of configurations in the 1b test set
num_test_configs = 100

# Maximum energy allowed for distorted monomers (in kcal/mol)
mon_emax = 100.0

# Maximum binding energy allowed
bind_emax = 500.0

# Seeds to be used in the configuration generation to ensure different
# configurations for training and test
seed_training = 12345
seed_test = 54321

# IDs of the monomers (should be consistent with the 1B id for each)
mon_ids = ["co2"]

# Number of TTM-nrg fits to perform
num_mb_fits = 5

### 1.2. Generate polynomials

We first generate the polynomials to see how many parameters we have in them. A recommended ratio is to have twenty times the number of parameters configurations in the training set.

#### 1.2.1. Generate polynomial input file

This call generates a polynomial input file based on the symmetry of the dimer specified. 

*Note. Write some more info and doc for the input. Filters by default, new filters that can be added...*
<p style="color:red;">Note: the database filepath argument has been exchanged for the database_config in generate_poly_input_from_database </p>

In [None]:
potential_fitting.generate_poly_input(mon_settings, molecule_in, poly_in)

#### 1.2.2. Generate maple input files

Generate polynomials of the degree specified at the beginning, based on the polynomial input file that we have generated in the previous step.

In [None]:
potential_fitting.generate_polynomials(mon_settings, poly_in, polynomial_order, poly_directory)

#### 1.2.3. Optimize the polynomial evaluation

The maple input files define the non optimized polynomials. The polynomials can sometimes be large, and **Maple** is able to optimize them to perform the minimum number of floating point operations (FLOPs). It will output three different files. One with non-optimized polynomials, one with optimized polynomials with gradient evaluation, and one without gradient evaluation.

In [None]:
potential_fitting.execute_maple(mon_settings, poly_directory)

### 1.3. Geometry optimization and normal mode calculation

#### 1.3.1. Monomers

Performs a geometry optimization of the monomer at the level of theory specified in `monomer_settings.ini`. **Before running these commands** please make sure that the specifications in the sections `[config_generator]` and `[molecule]` of the corresponding `settings.ini` file are correct and consistent.

In [None]:
# Optimize monomer
potential_fitting.optimize_geometry(mon_settings, unopt_mon, opt_mon, method, basis)

In [None]:
# Get its normal modes
potential_fitting.generate_normal_modes(mon_settings, opt_mon,normal_modes_mon, method, basis)

### 1.4. Training and test set generation

#### 1.4.1. Generate configurations 

Generates configurations using the normal modes previously calculated. The configrations that we have generated will be stored in an XYZ formatted file with the names we have previously given.

In [None]:
# Get some for the training set
potential_fitting.generate_normal_mode_configurations(mon_settings, opt_mon, normal_modes_mon, training_configs, num_training_configs, seed_training)

In [None]:
# And some for the test set
potential_fitting.generate_normal_mode_configurations(mon_settings, opt_mon, normal_modes_mon, test_configs, num_test_configs, seed_test)

#### 1.4.2. Add configurations to the database

The configurations generated in the previous step will be added to the database. **This step will only add the configurations, not calculate the energy**.

The method, basis, and cp need not be the same as used for the geometry optimization. The optimization and normal modes calculation can be performed at a different level of theory than the energy evaluation. **The recommendation is to use the same settings**, but is up to the user to do it so.

In [None]:
# Add dimer training set configurations
potential_fitting.init_database(mon_settings, database_config, training_configs, method, basis, cp, "training-ch1", optimized = False)

# Add monomer 1 optimized geommetry to database (needed for binding energy)
potential_fitting.init_database(mon_settings, database_config, opt_mon, method, basis, cp, "training-ch1", optimized = True)

In [None]:
# Add dimer training set configurations
potential_fitting.init_database(mon_settings, database_config, test_configs, method, basis, cp, "test-ch1", optimized = False)

# Add monomer 1 optimized geommetry to database (needed for binding energy)
potential_fitting.init_database(mon_settings, database_config, opt_mon, method, basis, cp, "test-ch1", optimized = True)

#### 1.4.3. Calculate energy

Loops through every uncalculated energy in the database and calculates it. This will take a while depending on what method/basis you use. If desired, there is an optional argument calculation_count, when set to an integer, it limits the number of calculations to perform.

In [None]:
potential_fitting.fill_database(mon_settings, database_config, client_name, "training-ch1", "test-ch1", calculation_count = None)

#### 1.4.4. Training set and Test set generation

Generates the training set file in the format that will be needed in the fitting codes. If your database contains energies computed with a variety of methods/basis, **only one method and basis can be used in the same training set**. The format of the training set is the same as the configurations generated for the training set in previous steps. The difference is that now, the comment line will have the binding, and n-body energy of that configuration.

In [None]:
# Generate training set
potential_fitting.generate_training_set(mon_settings, database_config, training_set, method, basis, cp, "training-ch1", e_bind_max = bind_emax, e_mon_max = mon_emax)

# Generate test set
potential_fitting.generate_training_set(mon_settings, database_config, test_set, method, basis, cp, "test-ch1", e_bind_max = bind_emax, e_mon_max = mon_emax)

### 1.5. Obtain charges, polarizabilities, and C6

In order to perform the fit, charges, polarizabilities, C6, and other properties of the dimer have to be calculated. This is done, for now, with the software **QChem**. This instruction will compute these properties for you. The predefined basis set and method is wb97m-v/avtz. This step can take a long time if the molecule is large.

After the calculation is completed, all the information needed for the fits (both MB-nrg and TTM-nrg) will be added in the configuration file specified at the beggining.

In [None]:
potential_fitting.generate_fitting_config_file_new(mon_settings, config, geo_paths = [opt_mon])

### 1.6. MB-nrg fit

#### 1.6.1. Obtain and compile the fitting code

Generate 1b fitting code

In [None]:
potential_fitting.generate_mbnrg_fitting_code(mon_settings, config, poly_in, poly_directory, polynomial_order, mbnrg_directory)

And we compile it.

In [None]:
potential_fitting.compile_fit_code(mon_settings, mbnrg_directory)

### 1.6.2. Perform the fit

This command will prepare as many folders as fits one needs to run with a bash script that will execute the fit and save the output. If there are 5 fit folders and we run 2 more, two new fit folders will be created.

In [None]:
potential_fitting.prepare_fits(mon_settings, mbnrg_directory + "/fit-1b", training_set, num_fits = 3)

Now all the fits need to run. This can be done externally or run directly with the following command.

In [None]:
potential_fitting.execute_fits(mon_settings)

And finally we retrieve the best fit

In [None]:
potential_fitting.retrieve_best_fit(mon_settings, ttm = False, fitted_nc_path = "mbnrg.nc")

### 1.7. Visualize the results

Finally, we can plot the correlation plots for the training and test sets, along with the error, using this helper function.

In [None]:
%matplotlib inline
# TODO

### 1.8 Add files to MBX

In [None]:
potential_fitting.fitting.generate_software_files(mon_settings, config, mon_ids, polynomial_order, ttm_only = False, MBX_HOME = None, version = "v1")

### 1.9. Wrapping up

Now we can get out of the 1b folder, after we have obtained the fit.

## Chapter 2. Generate a CO2-CO2 two-body TTM-nrg PEF

### 2.1. Definition of the variables

This chapter is going to guide us through the process of obtaining a two-body TTM-nrg PEF for a CO2 dimer. Let's make a folder for this chapter.

In [None]:
os.chdir(main_dir)
os.system("mkdir -p chapter2_co2_2b_ttmnrg")
os.chdir("chapter2_co2_2b_ttmnrg")

#### Specifications of the QC calculations
Next we are gonna define the method that we want to use to calculate energies, along with some other technical details.

In [None]:
# The software that will be used to perform all the calculations
#code = "qchem"
code = "psi4"

# The quantum chemistry method we want to use
method = "HF"
#method = "MP2"
#method = "wb97m-v"

# Basis set to use. Must be pre-defined in the software. Custom basis sets not implemented yet.
basis = "STO-3G"

# Use counter-poise correction or not.
cp = False
#cp = True

# Number of threads and memory we would like to use
num_threads = 2
memory = "4GB"

# This is the path where all the log files will be stored.
log_path = "logs"

#### Monomer specifications
This section will be used to define all the specifications that define the monomer. In this case there is only one monomer, but we still need to specify the properties as a list, since when more monomers are present, we need to specify them in a list.

In [None]:
# Names that will identify the monomers. This is used for identification purposes only.
names = ["CO2","CO2"]

# Number of atoms of each monomer
number_of_atoms = [3,3]

# Charge of each monomer
charges = [0,0]

# Spin multiplicity of each monomer
spin = [1,1]

# Use MB-pol for water (if applicable). 
# If 1 will use the Partridge-Shwenke PEF for water, with the position dependent charges.
use_mbpol = [0,0]

The symmetry tag requires a little bit of explanation. It contains the atom identity for the monomers. Some examples are `symmetry = ["A1B4"]` for methane monomer, `symmetry = ["A1B2","A1B2"]` for a CO2 dimer, `symmetry = ["A6B6","C1D2"]` for a benzene -- water dimer without lone pairs, and `symmetry = ["A1B2Z2","C1D2"]` for a H2O -- SO2 dimer with lone pairs. The rules are the following:
    * Symmetry names must be written in capital letters and start with A for the first atom of the first monomer. Any new atom type will be assigned the next letter of the alphabet.
    * Exchangable atoms must have the same label, even if they are in different molecules.
    * As for today, no more than 9 atoms of the same atom type is accepted.
    * If there are virtual sites such as lone pairs that will play a role in the polynomials, they must be labels with letters X, Y, or Z.
    * If two groups inside the same molecule have the same symmetry, they should be separated. As an example, DMSO should have a symmetry `symmetry = ["A1B3_A1B3_C1D1"]`. This allows permutation within the whole methyl groups, but not within the different carbons or hydrogens individually between the two methyl groups.
    * The symmetry order MUST match the xyz order.

The SMILES tag also requires a little bit of explanation. One can get the smiles from open babel:
`obabel -ixyz input.xyz -osmiles -O smiles.txt`
The order of the atoms in the SMILES string must also match the XYZ order.

In [None]:
# Symmetry of the molecule
symmetry = ["A1B2", "A1B2"]

# SMILES string
smiles = ["C(O)O", "C(O)O"]

#### Creating files needed by the code

As for 08/12/2019, the `settings` files are still needed. This example is only for a CO2 dimer, so only one settings file is needed for the monomers, since they are the same, and one for the dimer.

In [None]:
# Settings for monomer
mon_settings = "monomer_settings.ini"

my_settings_file_mon = """
[files]
# Local path directory to write log files in
log_path = """ + log_path + """

[config_generator]
# what library to use for geometry optimization and normal mode generation
code = """ + code + """
# use geometric or linear progression for T and A in config generation, exactly 1 must be True
geometric = False
linear = False

[energy_calculator]
# what library to use for energy calculations
code = """ + code + """

[psi4]
# memory to use when doing a psi4 calculation
memory = """ + memory + """
# number of threads to use when executing a psi4 calculation
num_threads = """ + str(num_threads) + """

[qchem]
# number of threads to use when executing a qchem calculation
num_threads = """ + str(num_threads) + """

[molecule]
# name of fragments, seperated by commas
names = """ + names[0] + """
# number of atoms in each fragment, seperated by commas
fragments = """ + str(number_of_atoms[0]) + """
# charge of each fragment, seperated by commas
charges = """ + str(charges[0]) + """
# spin multiplicity of each fragment, seperated by commas
spins = """ + str(spin[0]) + """
# tag when putting geometries into database
tag = none
# Use or not MB-pol
use_mbpol = """ + str(use_mbpol[0]) + """
# symmetry of each fragment, seperated by commas
symmetry = """ + symmetry[0] + """
SMILES = """ + smiles[0] + """
"""

In [None]:
# Settings for dimer
dim_settings = "dimer_settings.ini"

my_settings_file_dim = """
[files]
# Local path directory to write log files in
log_path = """ + log_path + """

[config_generator]
# what library to use for geometry optimization and normal mode generation
code = """ + code + """
# use geometric or linear progression for T and A in config generation, exactly 1 must be True
geometric = False
linear = False

[energy_calculator]
# what library to use for energy calculations
code = """ + code + """

[psi4]
# memory to use when doing a psi4 calculation
memory = """ + memory + """
# number of threads to use when executing a psi4 calculation
num_threads = """ + str(num_threads) + """

[qchem]
# number of threads to use when executing a qchem calculation
num_threads = """ + str(num_threads) + """

[molecule]
# name of fragments, seperated by commas
names = """ + names[0] + "," + names[1] + """
# number of atoms in each fragment, seperated by commas
fragments = """ + str(number_of_atoms[0]) + """,""" + str(number_of_atoms[1]) + """
# charge of each fragment, seperated by commas
charges = """ + str(charges[0]) + """,""" + str(charges[1]) + """
# spin multiplicity of each fragment, seperated by commas
spins = """ + str(spin[0]) + """,""" + str(spin[1]) + """
# tag when putting geometries into database
tag = none
# Use or not MB-pol
use_mbpol = """ + str(use_mbpol[0]) + """,""" + str(use_mbpol[1]) + """
# symmetry of each fragment, seperated by commas
symmetry = """ + symmetry[0] + """,""" + symmetry[1] + """
SMILES = """ + smiles[0] + """,""" + smiles[1] + """
"""

In [None]:
# Write the files:
ff = open(mon_settings,'w')
ff.write(my_settings_file_mon)
ff.close()

ff = open(dim_settings,'w')
ff.write(my_settings_file_dim)
ff.close()

Unoptimized geometries of the two monomers are inputed as an [XYZ formatted file](https://en.wikipedia.org/wiki/XYZ_file_format). 

In [None]:
# XYZ file that contains the unoptimized geommetry of the monomer
unopt_mon = "monomer.xyz"

my_unopt_monomer = """3
unoptimized co2
C   0   0   0
O   1.3   0   0
O   -1.3  0   0
"""

In [None]:
# Write the file:
ff = open(unopt_mon,'w')
ff.write(my_unopt_monomer)
ff.close()

#### Defining files that will be written by the code

In [None]:
# XYZ file that contains the optimized geommetry of the monomer
opt_mon = "monomer_opt.xyz"

# File where normal modes of monomer 1 will be outputed
normal_modes_mon = "monomer_normal_modes.dat"

# Same for dimer
unopt_dim = "dimer.xyz"
opt_dim = "dimer_opt.xyz"
normal_modes_dim = "dimer_normal_modes.dat"

Training and test set files. The `training configs` and `test_configs` files will contain the configurations generated by the training set generation functions. Only the geometry (i.e., the coordinates of all atoms for each configuration) will be stored in those files. Later on, we will calculate the energies for each of these configurations, and create the files in the format that will be inputed to the fitting code. These new files are going to be defined in `training_set` and `test_set`. The coordinates in these files will be the same as in the initial `training configs` and `test_configs`, but now the comment line will be filled with the energies needed by the fitting code.

In [None]:
# XYZ file with the configurations of the training set
training_configs = "training_configs.xyz"

# XYZ file with the configurations of the test set
test_configs = "test_configs.xyz"

# XYZ file with the training set that the codes need to perform the fit
# Configurations are the same as training_configs but this file
# has the energies in the comment line
training_set = "training_set.xyz"

# XYZ file with the test set that the codes need to perform the fit
# Configurations are the same as test_configs but this file
# has the energies in the comment line 
test_set = "test_set.xyz"

The information about the training, test, energies... is stored in an `postgreSQL` database. In principle there is no need to interact with this database, since everything is automatized, but you might want to retrieve some information at some point.

The database_config `.ini` file should contain one section `[database]` with 5 properties:
* `host`: The address of the server where the database is hosted.
* `port`: The port used to connect to the database.
* `database`: The name of the database.
* `username`: Your username to connect to the database.
* `password`: Your password to connect to the database.

For now use these parameters:

* `host`: piggy.pl.ucsd.edu
* `port`: 5432
* `database`: potential_fitting
* `username`: potential_fitting
* `password`: Please contact Ethan or Kaushik for the password.

The username potential_fitting was established as a general username that anyone who only needs basic access to the database can use. Alternatively, each user has their own username and password you can use. For most of you, this should be the same as your ucsd email prefix and password.

The file database.ini does not exist in the git repo, so you will have to create a file and update the variable below to be its filepath. For some reason, python doesn't like it when you use `~` to specify your home directory, so provide a relateive or absolute path instead. It is recommended that you create the file in your home directory.

<h3 style="color:red;">Make sure only you have read access to this file using the chmod command or else anyone on our fileserver will be able to see your password and <b>PLEASE DO NOT ACCIDENTALLY COMMIT A FILE CONTAINING YOUR PASSWORD VIA GIT!</b></h3>

cleint_name is used in the database to track what machines performed what calculations. Please use something that indicates where you are running the calculations.

In [None]:
# PostgreSQL database that stores structures and energies
database_config = "local.ini"
client_name = "pikachu"

In [None]:
my_database_settings = """[database]
host = piggy.pl.ucsd.edu
port = 5432
database = potential_fitting
username = potential_fitting
password = 9t8ARDuN2Wy49VtMOrcJyHtOzyKhkiId
"""

# Write the file. Remember to update the username and password!
ff = open(database_config,'w')
ff.write(my_database_settings)
ff.close()

In this chapter we are not going to use polynomials. However, the code that generates the MB-nrg fitting code is also the same as the one that generates the TTM-nrg one. Thus, we will need to generate polynomials anyways.

In [None]:
# Monomers 1 and 2 separated by '_'
molecule_in = "_".join(symmetry)

# Configuration file that contains all the monomer 
# and dimer information. Will be used to generate the 2B codes.
config = "config.ini"

# Input file for the polynomial generation
poly_in = "poly.in"

# Directory where the polynomials will be generated
poly_directory = "polynomial_generation"

# Degree of the polynomials
polynomial_order = 2

#### Directories for the different sections

These variables specify the directories where the fitting code for each type of PEF is going to be created.
- `ttmnrg_directory` will contain the code that fits TTM-nrg PEFs for the system specified.

In [None]:
# Directory where ttm-nrg fitting code will be stored
ttmnrg_directory = "ttm-nrg_fit"

#### Multiple variables that will be used later

In [None]:
# Number of configurations in the 2b training_set
num_training_configs = 200

# Number of configurations in the 2b test set
num_test_configs = 50

# Maximum energy allowed for distorted monomers (in kcal/mol)
mon_emax = 30.0

# Maximum binding energy allowed
bind_emax = 500.0

# Minimum and maximum distance between the two monomers
min_d_2b = 1.0
max_d_2b = 9.0

# Minimum fraction of the VdW distance that is allowed between any atoms that belong to different monomers
min_inter_d = 0.5

# Seeds to be used in the configuration generation to ensure different
# configurations for training and test
seed_training = 12345
seed_test = 54321

# IDs of the monomers (should be consistent with the 1B id for each)
mon_ids = ["co2","co2"]

# Number of TTM-nrg fits to perform
num_ttm_fits = 5

### 2.2. Generate polynomials

We first generate the polynomials to see how many parameters we have in them. A recommended ratio is to have twenty times the number of parameters configurations in the training set.

#### 2.2.1. Generate polynomial input file

This call generates a polynomial input file based on the symmetry of the dimer specified. 

*Note. Write some more info and doc for the input. Filters by default, new filters that can be added...*
<p style="color:red;">Note: the database filepath argument has been exchanged for the database_config in generate_poly_input_from_database </p>

In [None]:
potential_fitting.generate_poly_input(dim_settings, molecule_in, poly_in)

#### 2.2.2. Generate maple input files

Generate polynomials of the degree specified at the beginning, based on the polynomial input file that we have generated in the previous step.

In [None]:
potential_fitting.generate_polynomials(dim_settings, poly_in, polynomial_order, poly_directory)

#### 2.2.3. Optimize the polynomial evaluation

The maple input files define the non optimized polynomials. The polynomials can sometimes be large, and **Maple** is able to optimize them to perform the minimum number of floating point operations (FLOPs). It will output three different files. One with non-optimized polynomials, one with optimized polynomials with gradient evaluation, and one without gradient evaluation.

In [None]:
potential_fitting.execute_maple(dim_settings, poly_directory)

### 2.3. Geometry optimization and normal mode calculation

#### 2.3.1. Monomers

Performs a geometry optimization of the monomer at the level of theory specified in `monomer_settings.ini`. **Before running these commands** please make sure that the specifications in the sections `[config_generator]` and `[molecule]` of the corresponding `settings.ini` file are correct and consistent.

In [None]:
# Optimize monomer
potential_fitting.optimize_geometry(mon_settings, unopt_mon, opt_mon, method, basis)

In [None]:
# Get its normal modes
potential_fitting.generate_normal_modes(mon_settings, opt_mon,normal_modes_mon, method, basis)

#### 2.3.2. Dimer

Since TTM-nrg does not have any intramolecular terms at the two-body level, no further calculations for the dimer are needed.

### 2.4. Training and Test Set generation

#### 2.4.1. Generate configurations

In [None]:
# Training Set
potential_fitting.generate_2b_configurations(dim_settings, opt_mon, opt_mon, num_training_configs, training_configs, min_d_2b, max_d_2b, min_inter_d, seed_training)

In [None]:
# Test Set
potential_fitting.generate_2b_configurations(dim_settings, opt_mon, opt_mon, num_test_configs, test_configs, min_d_2b, max_d_2b, min_inter_d, seed_training)

#### 2.4.2. Add configurations to the database

In [None]:
# Training set
potential_fitting.init_database(dim_settings, database_config, training_configs, method, basis, cp, "training", optimized = False)

In [None]:
# Test Set
potential_fitting.init_database(dim_settings, database_config, test_configs, method, basis, cp, "test", optimized = False)

In [None]:
# Add monomer optimized geommetry to database (needed for binding energy)
potential_fitting.init_database(mon_settings, database_config, opt_mon, method, basis, cp, "training", optimized = True)
potential_fitting.init_database(mon_settings, database_config, opt_mon, method, basis, cp, "test", optimized = True)

#### 2.4.3. Fill the database

In [None]:
potential_fitting.fill_database(dim_settings, database_config, client_name, "training", "test", calculation_count = None)

#### 2.4.4. Training set and Test set generation

Generates the training set file in the format that will be needed in the fitting codes. If your database contains energies computed with a variety of methods/basis, **only one method and basis can be used in the same training set**. The format of the training set is the same as the configurations generated for the training set in previous steps. The difference is that now, the comment line will have the binding, and n-body energy of that configuration.

In [None]:
# Generate training set
potential_fitting.generate_training_set(dim_settings, database_config, training_set, method, basis, cp, "training", e_bind_max = bind_emax, e_mon_max = mon_emax)

# Generate test set
potential_fitting.generate_training_set(dim_settings, database_config, test_set, method, basis, cp, "test", e_bind_max = bind_emax, e_mon_max = mon_emax)

### 2.5. Obtain charges, polarizabilities, and C6

In order to perform the fit, charges, polarizabilities, C6, and other properties of the dimer have to be calculated. This is done, for now, with the software **QChem**. This instruction will compute these properties for you. The predefined basis set and method is wb97m-v/avtz. This step can take a long time if the molecule is large.

After the calculation is completed, all the information needed for the fits (both MB-nrg and TTM-nrg) will be added in the configuration file specified at the beggining.

In [None]:
potential_fitting.generate_fitting_config_file_new(dim_settings, config, geo_paths = [opt_mon, opt_mon])

### 2.6. Fitting the TTM-nrg PEF

#### 2.6.1. Obtain and compile the fitting code

Generate MB-nrg / TTM-nrg combined fitting code

In [None]:
potential_fitting.generate_mbnrg_fitting_code(dim_settings, config, poly_in, poly_directory, polynomial_order, ttmnrg_directory)

And we compile it.

In [None]:
potential_fitting.compile_fit_code(dim_settings, ttmnrg_directory)

#### 2.6.2. Perform the fit

This command will prepare as many folders as fits one needs to run with a bash script that will execute the fit and save the output. If there are 5 fit folders and we run 2 more, two new fit folders will be created.

In [None]:
potential_fitting.prepare_fits(dim_settings, ttmnrg_directory + "/fit-2b-ttm", training_set, num_fits = 3, ttm = True)

Now all the fits need to run. This can be done externally or run directly with the following command.

In [None]:
potential_fitting.execute_fits(dim_settings, ttm = True)

And finally we retrieve the best fit

In [None]:
potential_fitting.retrieve_best_fit(dim_settings, ttm = True)

In [None]:
potential_fitting.update_config_with_ttm(dim_settings, config)

### 2.7. Add potential to MBX

In [None]:
potential_fitting.fitting.generate_software_files(dim_settings, config, mon_ids, polynomial_order, ttm_only = True, MBX_HOME = None, version = "v1")

## Chapter 3. Generate a CO2-H2O two-body MB-nrg PEF

### 3.1. Definition of the variables

This chapter is going to guide us through the process of obtaining a two-body MB-nrg PEF for a CO2-H2O dimer. Let's make a folder for this chapter.

In [None]:
os.chdir(main_dir)
os.system("mkdir -p chapter3_co2_h2o_2b_mbnrg")
os.chdir("chapter3_co2_h2o_2b_mbnrg")

#### Specifications of the QC calculations
Next we are gonna define the method that we want to use to calculate energies, along with some other technical details.

In [None]:
# The software that will be used to perform all the calculations
code = "qchem"
#code = "psi4"

# The quantum chemistry method we want to use
method = "HF"
#method = "MP2"
#method = "wb97m-v"

# Basis set to use. Must be pre-defined in the software. Custom basis sets not implemented yet.
basis = "STO-3G"

# Use counter-poise correction or not.
cp = False
#cp = True

# Number of threads and memory we would like to use
num_threads = 2
memory = "4GB"

# This is the path where all the log files will be stored.
log_path = "logs"

#### Monomer specifications
This section will be used to define all the specifications that define the monomer. In this case there is only one monomer, but we still need to specify the properties as a list, since when more monomers are present, we need to specify them in a list.

In [None]:
# Names that will identify the monomers. This is used for identification purposes only.
names = ["CO2","H2O"]

# Number of atoms of each monomer
number_of_atoms = [3,3]

# Charge of each monomer
charges = [0,0]

# Spin multiplicity of each monomer
spin = [1,1]

# Use MB-pol for water (if applicable). 
# If 1 will use the Partridge-Shwenke PEF for water, with the position dependent charges.
use_mbpol = [0,1]

The symmetry tag requires a little bit of explanation. It contains the atom identity for the monomers. Some examples are `symmetry = ["A1B4"]` for methane monomer, `symmetry = ["A1B2","A1B2"]` for a CO2 dimer, `symmetry = ["A6B6","C1D2"]` for a benzene -- water dimer without lone pairs, and `symmetry = ["A1B2Z2","C1D2"]` for a H2O -- SO2 dimer with lone pairs. The rules are the following:
    * Symmetry names must be written in capital letters and start with A for the first atom of the first monomer. Any new atom type will be assigned the next letter of the alphabet.
    * Exchangable atoms must have the same label, even if they are in different molecules.
    * As for today, no more than 9 atoms of the same atom type is accepted.
    * If there are virtual sites such as lone pairs that will play a role in the polynomials, they must be labels with letters X, Y, or Z.
    * If two groups inside the same molecule have the same symmetry, they should be separated. As an example, DMSO should have a symmetry `symmetry = ["A1B3_A1B3_C1D1"]`. This allows permutation within the whole methyl groups, but not within the different carbons or hydrogens individually between the two methyl groups.
    * The symmetry order MUST match the xyz order.

The SMILES tag also requires a little bit of explanation. One can get the smiles from open babel:
`obabel -ixyz input.xyz -osmiles -O smiles.txt`
The order of the atoms in the SMILES string must also match the XYZ order.

In [None]:
# Symmetry of the molecule
symmetry = ["A1B2", "C1D2X2"]

# SMILES string
smiles = ["C(O)O", "O(H)H"]

#### Creating files needed by the code

As for 08/12/2019, the `settings` files are still needed. This example is only for a CO2 dimer, so only one settings file is needed for the monomers, since they are the same, and one for the dimer.

In [None]:
# Settings for monomer
mon1_settings = "monomer1_settings.ini"
mon2_settings = "monomer2_settings.ini"

my_settings_file_mon1 = """
[files]
# Local path directory to write log files in
log_path = """ + log_path + """

[config_generator]
# what library to use for geometry optimization and normal mode generation
code = """ + code + """
# use geometric or linear progression for T and A in config generation, exactly 1 must be True
geometric = False
linear = False

[energy_calculator]
# what library to use for energy calculations
code = """ + code + """

[psi4]
# memory to use when doing a psi4 calculation
memory = """ + memory + """
# number of threads to use when executing a psi4 calculation
num_threads = """ + str(num_threads) + """

[qchem]
# number of threads to use when executing a qchem calculation
num_threads = """ + str(num_threads) + """

[molecule]
# name of fragments, seperated by commas
names = """ + names[0] + """
# number of atoms in each fragment, seperated by commas
fragments = """ + str(number_of_atoms[0]) + """
# charge of each fragment, seperated by commas
charges = """ + str(charges[0]) + """
# spin multiplicity of each fragment, seperated by commas
spins = """ + str(spin[0]) + """
# tag when putting geometries into database
tag = none
# Use or not MB-pol
use_mbpol = """ + str(use_mbpol[0]) + """
# symmetry of each fragment, seperated by commas
symmetry = """ + symmetry[0] + """
SMILES = """ + smiles[0] + """
"""

my_settings_file_mon2 = """
[files]
# Local path directory to write log files in
log_path = """ + log_path + """

[config_generator]
# what library to use for geometry optimization and normal mode generation
code = """ + code + """
# use geometric or linear progression for T and A in config generation, exactly 1 must be True
geometric = False
linear = False

[energy_calculator]
# what library to use for energy calculations
code = """ + code + """

[psi4]
# memory to use when doing a psi4 calculation
memory = """ + memory + """
# number of threads to use when executing a psi4 calculation
num_threads = """ + str(num_threads) + """

[qchem]
# number of threads to use when executing a qchem calculation
num_threads = """ + str(num_threads) + """

[molecule]
# name of fragments, seperated by commas
names = """ + names[1] + """
# number of atoms in each fragment, seperated by commas
fragments = """ + str(number_of_atoms[1]) + """
# charge of each fragment, seperated by commas
charges = """ + str(charges[1]) + """
# spin multiplicity of each fragment, seperated by commas
spins = """ + str(spin[1]) + """
# tag when putting geometries into database
tag = none
# Use or not MB-pol
use_mbpol = """ + str(use_mbpol[1]) + """
# symmetry of each fragment, seperated by commas
symmetry = """ + symmetry[1] + """
SMILES = """ + smiles[1] + """
"""

In [None]:
# Settings for dimer
dim_settings = "dimer_settings.ini"

my_settings_file_dim = """
[files]
# Local path directory to write log files in
log_path = """ + log_path + """

[config_generator]
# what library to use for geometry optimization and normal mode generation
code = """ + code + """
# use geometric or linear progression for T and A in config generation, exactly 1 must be True
geometric = False
linear = False

[energy_calculator]
# what library to use for energy calculations
code = """ + code + """

[psi4]
# memory to use when doing a psi4 calculation
memory = """ + memory + """
# number of threads to use when executing a psi4 calculation
num_threads = """ + str(num_threads) + """

[qchem]
# number of threads to use when executing a qchem calculation
num_threads = """ + str(num_threads) + """

[molecule]
# name of fragments, seperated by commas
names = """ + names[0] + "," + names[1] + """
# number of atoms in each fragment, seperated by commas
fragments = """ + str(number_of_atoms[0]) + """,""" + str(number_of_atoms[1]) + """
# charge of each fragment, seperated by commas
charges = """ + str(charges[0]) + """,""" + str(charges[1]) + """
# spin multiplicity of each fragment, seperated by commas
spins = """ + str(spin[0]) + """,""" + str(spin[1]) + """
# tag when putting geometries into database
tag = none
# Use or not MB-pol
use_mbpol = """ + str(use_mbpol[0]) + """,""" + str(use_mbpol[1]) + """
# symmetry of each fragment, seperated by commas
symmetry = """ + symmetry[0] + """,""" + symmetry[1] + """
SMILES = """ + smiles[0] + """,""" + smiles[1] + """
"""

In [None]:
# Write the files:
ff = open(mon1_settings,'w')
ff.write(my_settings_file_mon1)
ff.close()

ff = open(mon2_settings,'w')
ff.write(my_settings_file_mon2)
ff.close()

ff = open(dim_settings,'w')
ff.write(my_settings_file_dim)
ff.close()

Unoptimized geometries of the two monomers are inputed as an [XYZ formatted file](https://en.wikipedia.org/wiki/XYZ_file_format). 

In [None]:
# XYZ file that contains the unoptimized geommetry of the monomer
unopt_mon1 = "monomer1.xyz"

my_unopt_monomer1 = """3
unoptimized co2
C   0   0   0
O   1.3   0   0
O   -1.3  0   0
"""

unopt_mon2 = "monomer2.xyz"

my_unopt_monomer2 = """3
unoptimized h2o
O   0   0   0
H   0.8   0.8   0
H   0.8  0.2   0
"""

In [None]:
# Write the file:
ff = open(unopt_mon1,'w')
ff.write(my_unopt_monomer1)
ff.close()

ff = open(unopt_mon2,'w')
ff.write(my_unopt_monomer2)
ff.close()

#### Defining files that will be written by the code

In [None]:
# XYZ file that contains the optimized geommetry of the monomer
opt_mon1 = "monomer1_opt.xyz"
opt_mon2 = "monomer2_opt.xyz"

# File where normal modes of monomer 1 will be outputed
normal_modes_mon1 = "monomer1_normal_modes.dat"
normal_modes_mon2 = "monomer2_normal_modes.dat"

# Same for dimer
unopt_dim = "dimer.xyz"
opt_dim = "dimer_opt.xyz"
normal_modes_dim = "dimer_normal_modes.dat"

Training and test set files. The `training configs` and `test_configs` files will contain the configurations generated by the training set generation functions. Only the geometry (i.e., the coordinates of all atoms for each configuration) will be stored in those files. Later on, we will calculate the energies for each of these configurations, and create the files in the format that will be inputed to the fitting code. These new files are going to be defined in `training_set` and `test_set`. The coordinates in these files will be the same as in the initial `training configs` and `test_configs`, but now the comment line will be filled with the energies needed by the fitting code.

In [None]:
# XYZ file with the configurations of the training set
rigid_training_configs = "rigid_training_configs.xyz" 
flex_training_configs = "flex_training_configs.xyz"
normal_mode_training_configs = "normal_mode_training_configs"

ttm_training_configs = "ttm_training_configs.xyz"

# XYZ file with the configurations of the test set
rigid_test_configs = "rigid_test_configs.xyz" 
flex_test_configs = "flex_test_configs.xyz"
normal_mode_test_configs = "normal_mode_test_configs"

ttm_test_configs = "ttm_test_configs.xyz"

# Distorted monomer configurations for the flexible training set
mon1_distorted = "mon1_distorted.xyz"
mon2_distorted = "mon2_distorted.xyz"

# And the screened values
mon1_screened = "mon1_screened.xyz"
mon2_screened = "mon2_screened.xyz"

# XYZ file with the training set that the codes need to perform the fit
# Configurations are the same as training_configs but this file
# has the energies in the comment line
training_set = "training_set.xyz"
ttm_training_set = "ttm_training_set.xyz"

# XYZ file with the test set that the codes need to perform the fit
# Configurations are the same as test_configs but this file
# has the energies in the comment line 
test_set = "test_set.xyz"
ttm_test_set = "ttm_test_set.xyz"


The information about the training, test, energies... is stored in an `postgreSQL` database. In principle there is no need to interact with this database, since everything is automatized, but you might want to retrieve some information at some point.

The database_config `.ini` file should contain one section `[database]` with 5 properties:
* `host`: The address of the server where the database is hosted.
* `port`: The port used to connect to the database.
* `database`: The name of the database.
* `username`: Your username to connect to the database.
* `password`: Your password to connect to the database.

For now use these parameters:

* `host`: piggy.pl.ucsd.edu
* `port`: 5432
* `database`: potential_fitting
* `username`: potential_fitting
* `password`: Please contact Ethan or Kaushik for the password.

The username potential_fitting was established as a general username that anyone who only needs basic access to the database can use. Alternatively, each user has their own username and password you can use. For most of you, this should be the same as your ucsd email prefix and password.

The file database.ini does not exist in the git repo, so you will have to create a file and update the variable below to be its filepath. For some reason, python doesn't like it when you use `~` to specify your home directory, so provide a relateive or absolute path instead. It is recommended that you create the file in your home directory.

<h3 style="color:red;">Make sure only you have read access to this file using the chmod command or else anyone on our fileserver will be able to see your password and <b>PLEASE DO NOT ACCIDENTALLY COMMIT A FILE CONTAINING YOUR PASSWORD VIA GIT!</b></h3>

cleint_name is used in the database to track what machines performed what calculations. Please use something that indicates where you are running the calculations.

In [None]:
# PostgreSQL database that stores structures and energies
database_config = "local.ini"
client_name = "pikachu"

In [None]:
my_database_settings = """[database]
host = piggy.pl.ucsd.edu
port = 5432
database = potential_fitting
username = potential_fitting
password = 9t8ARDuN2Wy49VtMOrcJyHtOzyKhkiId
"""

# Write the file. Remember to update the username and password!
ff = open(database_config,'w')
ff.write(my_database_settings)
ff.close()

In this chapter we are not going to use polynomials. However, the code that generates the MB-nrg fitting code is also the same as the one that generates the TTM-nrg one. Thus, we will need to generate polynomials anyways.

In [None]:
# Monomers 1 and 2 separated by '_'
molecule_in = "_".join(symmetry)

# Configuration file that contains all the monomer 
# and dimer information. Will be used to generate the 2B codes.
config = "config.ini"

# Input file for the polynomial generation
poly_in = "poly.in"

# Directory where the polynomials will be generated
poly_directory = "polynomial_generation"

# Degree of the polynomials
polynomial_order = 2

#### Directories for the different sections

These variables specify the directories where the fitting code for each type of PEF is going to be created.
- `ttmnrg_directory` will contain the code that fits TTM-nrg PEFs for the system specified.

In [None]:
# Directory where ttm-nrg fitting code will be stored
ttmnrg_directory = "ttm-nrg_fitting_code"

# Directory where mb-nrg fitting code will be stored
mbnrg_directory = "mb-nrg_fitting_code"

#### Multiple variables that will be used later

In [None]:
# Number of configurations in the 2b training_set
num_training_configs = 2500
num_ttm_training_configs = 250
############################
num_rigid_training_configs = int(0.35*num_training_configs)
num_flex_training_configs = int(0.5*num_training_configs)
num_nm_training_configs = int(0.15*num_training_configs)

# Number of configurations in the 2b test set
num_test_configs = int(0.2*num_training_configs)
num_ttm_test_configs = int(0.2*num_ttm_training_configs)

num_rigid_test_configs = int(0.35*num_test_configs)
num_flex_test_configs = int(0.5*num_test_configs)
num_nm_test_configs = int(0.15*num_test_configs)
############################

# Number of distorted configurations for monomer 1 and monomer 2
num_mon1_distorted = 100
num_mon2_distorted = 100

# Maximum energy allowed for distorted monomers (in kcal/mol)
mon_emax = 30.0

# Maximum binding energy allowed
bind_emax = 500.0

# Minimum and maximum distance between the two monomers
min_d_2b = 1.0
max_d_2b = 9.0

# Minimum fraction of the VdW distance that is allowed between any atoms that belong to different monomers
min_inter_d = 0.5

# Seeds to be used in the configuration generation to ensure different
# configurations for training and test
seed_training = 12345
seed_test = 54321

# IDs of the monomers (should be consistent with the 1B id for each)
mon_ids = ["co2","h2o"]

# Number of TTM-nrg fits to perform
num_ttm_fits = 5
num_mb_fits = 5

### 3.2. Generate polynomials

We first generate the polynomials to see how many parameters we have in them. A recommended ratio is to have twenty times the number of parameters configurations in the training set.

#### 3.2.1. Generate polynomial input file

This call generates a polynomial input file based on the symmetry of the dimer specified. 

*Note. Write some more info and doc for the input. Filters by default, new filters that can be added...*
<p style="color:red;">Note: the database filepath argument has been exchanged for the database_config in generate_poly_input_from_database </p>

In [None]:
potential_fitting.generate_poly_input(dim_settings, molecule_in, poly_in)

#### 3.2.2. Generate maple input files

Generate polynomials of the degree specified at the beginning, based on the polynomial input file that we have generated in the previous step.

In [None]:
potential_fitting.generate_polynomials(dim_settings, poly_in, polynomial_order, poly_directory)

#### 3.2.3. Optimize the polynomial evaluation

The maple input files define the non optimized polynomials. The polynomials can sometimes be large, and **Maple** is able to optimize them to perform the minimum number of floating point operations (FLOPs). It will output three different files. One with non-optimized polynomials, one with optimized polynomials with gradient evaluation, and one without gradient evaluation.

In [None]:
potential_fitting.execute_maple(dim_settings, poly_directory)

### 3.3. Geometry optimization and normal mode calculation

#### 3.3.1. Monomers

Performs a geometry optimization of the monomer at the level of theory specified in `monomer_settings.ini`. **Before running these commands** please make sure that the specifications in the sections `[config_generator]` and `[molecule]` of the corresponding `settings.ini` file are correct and consistent.

In [None]:
# Optimize monomer
potential_fitting.optimize_geometry(mon1_settings, unopt_mon1, opt_mon1, method, basis)

In [None]:
# Get its normal modes
potential_fitting.generate_normal_modes(mon1_settings, opt_mon1,normal_modes_mon1, method, basis)

In [None]:
# Optimize monomer
potential_fitting.optimize_geometry(mon2_settings, unopt_mon2, opt_mon2, method, basis)

In [None]:
# Get its normal modes
potential_fitting.generate_normal_modes(mon2_settings, opt_mon2,normal_modes_mon2, method, basis)

#### 3.3.2. Dimer

Now the same for the dimer.

In [None]:
# Generate a dimer
potential_fitting.generate_2b_configurations(dim_settings, opt_mon1, opt_mon2, 1, unopt_dim, 2, 5, min_inter_d, seed_training)

In [None]:
# Optimize the dimer
potential_fitting.optimize_geometry(dim_settings, unopt_dim, opt_dim, method, basis)

In [None]:
# Get its normal modes
potential_fitting.generate_normal_modes(dim_settings, opt_dim,normal_modes_dim, method, basis)

### 3.4. Obtain charges, polarizabilities, and C6

In order to perform the fit, charges, polarizabilities, C6, and other properties of the dimer have to be calculated. This is done, for now, with the software **QChem**. This instruction will compute these properties for you. The predefined basis set and method is wb97m-v/avtz. This step can take a long time if the molecule is large.

After the calculation is completed, all the information needed for the fits (both MB-nrg and TTM-nrg) will be added in the configuration file specified at the beggining.

In [None]:
potential_fitting.generate_fitting_config_file_new(dim_settings, config, geo_paths = [opt_mon1, opt_mon2])

### 3.5. TTM-nrg training and test set

In order to run an MB-nrg fit, we first need to do a TTM-nrg fit to obtain the classical part. The MB-nrg fit will use the dispersion from the TTM-nrg, so we first need to obtain it.

#### 3.5.1. Training and test generation

In [None]:
# Training Set
potential_fitting.generate_2b_configurations(dim_settings, opt_mon1, opt_mon2, num_ttm_training_configs, ttm_training_configs, min_d_2b, max_d_2b, min_inter_d, seed_training + 100)

In [None]:
# Test Set
potential_fitting.generate_2b_configurations(dim_settings, opt_mon1, opt_mon2, num_ttm_test_configs, ttm_test_configs, min_d_2b, max_d_2b, min_inter_d, seed_test + 100)

#### 3.5.2. Add configurations to the database

In [None]:
# Training set
potential_fitting.init_database(dim_settings, database_config, ttm_training_configs, method, basis, cp, "ch3-ttm-training", optimized = False)

In [None]:
# Test Set
potential_fitting.init_database(dim_settings, database_config, ttm_test_configs, method, basis, cp, "ch3-ttm-test", optimized = False)

In [None]:
# Add monomer optimized geommetry to database (needed for binding energy)
potential_fitting.init_database(mon1_settings, database_config, opt_mon1, method, basis, cp, "ch3-ttm-training", optimized = True)
potential_fitting.init_database(mon1_settings, database_config, opt_mon1, method, basis, cp, "ch3-ttm-test", optimized = True)

potential_fitting.init_database(mon2_settings, database_config, opt_mon2, method, basis, cp, "ch3-ttm-training", optimized = True)
potential_fitting.init_database(mon2_settings, database_config, opt_mon2, method, basis, cp, "ch3-ttm-test", optimized = True)

#### 3.5.3. Calculate energies

In [None]:
potential_fitting.fill_database(dim_settings, database_config, client_name, "ch3-ttm-training", "ch3-ttm-test", calculation_count = None)

#### 3.5.4. Generate TTM-nrg training and test set files

In [None]:
# Obtain training set
potential_fitting.generate_training_set(dim_settings, database_config, ttm_training_set, method, basis, cp, "ch3-ttm-training", e_bind_max = bind_emax, e_mon_max = mon_emax)

In [None]:
# Obtain test set
potential_fitting.generate_training_set(dim_settings, database_config, ttm_test_set, method, basis, cp, "ch3-ttm-test", e_bind_max = bind_emax, e_mon_max = mon_emax)

### 3.6.TTM-nrg fit

#### 3.6.1. Obtain and compile the fitting code

Generate MB-nrg / TTM-nrg combined fitting code

In [None]:
potential_fitting.generate_mbnrg_fitting_code(dim_settings, config, poly_in, poly_directory, polynomial_order, ttmnrg_directory)

And we compile it.

In [None]:
potential_fitting.compile_fit_code(dim_settings, ttmnrg_directory)

#### 3.6.2. Perform the fit

This command will prepare as many folders as fits one needs to run with a bash script that will execute the fit and save the output. If there are 5 fit folders and we run 2 more, two new fit folders will be created.

In [None]:
potential_fitting.prepare_fits(dim_settings, ttmnrg_directory + "/fit-2b-ttm", ttm_training_set, num_fits = num_ttm_fits, ttm = True)

Now all the fits need to run. This can be done externally or run directly with the following command.

In [None]:
potential_fitting.execute_fits(dim_settings, ttm = True)

And finally we retrieve the best fit

In [None]:
potential_fitting.retrieve_best_fit(dim_settings, ttm = True)

Now we can update the config file so MB-nrg can use the TTM-nrg dispersion

In [None]:
potential_fitting.update_config_with_ttm(dim_settings, config)

### 3.7. MB-nrg Training and Test Set generation

#### 3.7.1. Rigid Training Set

##### Generate configurations

In [None]:
# Training Set
potential_fitting.generate_2b_configurations(dim_settings, opt_mon1, opt_mon2, num_rigid_training_configs, rigid_training_configs, min_d_2b, max_d_2b, min_inter_d, seed_training)

In [None]:
# Test Set
potential_fitting.generate_2b_configurations(dim_settings, opt_mon1, opt_mon2, num_rigid_test_configs, rigid_test_configs, min_d_2b, max_d_2b, min_inter_d, seed_test)

##### Add configurations to the database

In [None]:
# Training set
potential_fitting.init_database(dim_settings, database_config, rigid_training_configs, method, basis, cp, "ch3-training", optimized = False)

In [None]:
# Test Set
potential_fitting.init_database(dim_settings, database_config, rigid_test_configs, method, basis, cp, "ch3-test", optimized = False)

In [None]:
# Add monomer optimized geommetry to database (needed for binding energy)
potential_fitting.init_database(mon1_settings, database_config, opt_mon1, method, basis, cp, "ch3-training", optimized = True)
potential_fitting.init_database(mon1_settings, database_config, opt_mon1, method, basis, cp, "ch3-test", optimized = True)

potential_fitting.init_database(mon2_settings, database_config, opt_mon2, method, basis, cp, "ch3-training", optimized = True)
potential_fitting.init_database(mon2_settings, database_config, opt_mon2, method, basis, cp, "ch3-test", optimized = True)

#### 3.7.2. Flexible Configurations

##### Generate distorted monomer configurations

In [None]:
# Generate the normal mode configurations for the monomers:
potential_fitting.generate_normal_mode_configurations(mon1_settings, opt_mon1, normal_modes_mon1, mon1_distorted, num_mon1_distorted, seed_training + 1)
potential_fitting.generate_normal_mode_configurations(mon2_settings, opt_mon2, normal_modes_mon2, mon2_distorted, num_mon2_distorted, seed_training + 2)

##### Add them to the database along with the optimized geometries

In [None]:
# Add configurations to database
potential_fitting.init_database(mon1_settings, database_config, mon1_distorted, method, basis, cp, "ch3-mon1", optimized = False)
potential_fitting.init_database(mon2_settings, database_config, mon2_distorted, method, basis, cp, "ch3-mon2", optimized = False)

In [None]:
# Now add optimized geometries
potential_fitting.init_database(mon1_settings, database_config, opt_mon1, method, basis, cp, "ch3-mon1", optimized = True)
potential_fitting.init_database(mon2_settings, database_config, opt_mon2, method, basis, cp, "ch3-mon2", optimized = True)

##### Calculate their energy

In [None]:
potential_fitting.fill_database(mon1_settings, database_config, client_name, "ch3-mon1", calculation_count = None)

In [None]:
potential_fitting.fill_database(mon2_settings, database_config, client_name, "ch3-mon2", calculation_count = None)

##### Retrieve the configurations

In [None]:
potential_fitting.generate_training_set(mon1_settings, database_config, mon1_screened, method, basis, cp, "ch3-mon1", e_bind_max = bind_emax, e_mon_max = mon_emax)

In [None]:
potential_fitting.generate_training_set(mon2_settings, database_config, mon2_screened, method, basis, cp, "ch3-mon2", e_bind_max = bind_emax, e_mon_max = mon_emax)

##### Generate the flexible training and test set configurations

In [None]:
# Training set
potential_fitting.generate_2b_configurations(dim_settings, mon1_screened, mon2_screened, num_flex_training_configs, flex_training_configs, min_d_2b, max_d_2b, min_inter_d, seed_training + 10)

In [None]:
# Test set
potential_fitting.generate_2b_configurations(dim_settings, mon1_screened, mon2_screened, num_flex_test_configs, flex_test_configs, min_d_2b, max_d_2b, min_inter_d, seed_test + 10)

##### Add them to the database

In [None]:
# Training set
potential_fitting.init_database(dim_settings, database_config, flex_training_configs, method, basis, cp, "ch3-training", optimized = False)

In [None]:
# Test Set
potential_fitting.init_database(dim_settings, database_config, flex_test_configs, method, basis, cp, "ch3-test", optimized = False)

#### 3.7.3. Normal mode training set

##### Generate the configurations

In this case we are going to generate normal mode configurations for the dimer, but we will use a low temperature to ensure that we only sample the area around the minimum, and that we don't get too distorted configurations.

In [None]:
# Training Set
potential_fitting.generate_normal_mode_configurations(dim_settings, opt_dim, normal_modes_dim, normal_mode_training_configs, num_nm_training_configs, seed_training + 20, temperature = 100)

In [None]:
# Test Set
potential_fitting.generate_normal_mode_configurations(dim_settings, opt_dim, normal_modes_dim, normal_mode_test_configs, num_nm_test_configs, seed_test + 20, temperature = 100)

##### Add them to the database

In [None]:
# Training Set
potential_fitting.init_database(dim_settings, database_config, normal_mode_training_configs, method, basis, cp, "ch3-training", optimized = False)

In [None]:
# Test Set
potential_fitting.init_database(dim_settings, database_config, normal_mode_test_configs, method, basis, cp, "ch3-test", optimized = False)

#### 3.7.4. Fill the database

In [None]:
potential_fitting.fill_database(dim_settings, database_config, client_name, "ch3-training", "ch3-test", calculation_count = None)

#### 3.7.5. Training set and Test set generation

Generates the training set file in the format that will be needed in the fitting codes. If your database contains energies computed with a variety of methods/basis, **only one method and basis can be used in the same training set**. The format of the training set is the same as the configurations generated for the training set in previous steps. The difference is that now, the comment line will have the binding, and n-body energy of that configuration.

In [None]:
# Obtain training set
potential_fitting.generate_training_set(dim_settings, database_config, training_set, method, basis, cp, "ch3-training", e_bind_max = bind_emax, e_mon_max = mon_emax)

In [None]:
# Obtain test set
potential_fitting.generate_training_set(dim_settings, database_config, test_set, method, basis, cp, "ch3-test", e_bind_max = bind_emax, e_mon_max = mon_emax)

### 3.8. MB-nrg fit

#### 3.8.1. Obtain and compile the fitting code

Generate 2b fitting code

In [None]:
potential_fitting.generate_mbnrg_fitting_code(dim_settings, config, poly_in, poly_directory, polynomial_order, mbnrg_directory)

And we compile it.

In [None]:
potential_fitting.compile_fit_code(dim_settings, mbnrg_directory)

### 3.8.2. Perform the fit

This command will prepare as many folders as fits one needs to run with a bash script that will execute the fit and save the output. If there are 5 fit folders and we run 2 more, two new fit folders will be created.

In [None]:
potential_fitting.prepare_fits(dim_settings, mbnrg_directory + "/fit-2b", training_set, num_fits = 5)

Now all the fits need to run. This can be done externally or run directly with the following command.

In [None]:
potential_fitting.execute_fits(dim_settings)

And finally we retrieve the best fit

In [None]:
potential_fitting.retrieve_best_fit(dim_settings, ttm = False, fitted_nc_path = "mbnrg.nc")

### 3.9 Generate plots

### 3.10 Add files to MBX

In [None]:
potential_fitting.fitting.generate_software_files(dim_settings, config, mon_ids, polynomial_order, ttm_only = False, MBX_HOME = None, version = "v1")