This is part of the supporting information for the paper  
*ParAMS: Parameter Fitting for Atomistic and Molecular Models* (DOI: *123123*)  
The full documentation can be found at https://www.scm.com/doc.trunk/params/index.html

# Optimization: Original Setup

This Notebook sets up the optimization of the [Mue2016](https://doi.org/10.1021/acs.jctc.6b00461) force field for ReaxFF as published by Müller and Hartke (MH). It aims to retain most of the settings that were discussed in the original publication for the sake of a comparison. Specifically, the same parameters will be optimizied within the same bounds as in the original publication. This setup differs from the MH publication in the following:

* The initial point $x_0$ for this optimization is the already optimized force field as found by MH, possibly giving this optimization an advantage
* Optimizer related settings could not be considered as we are using a differend algorithm altogether: CMA-ES rather than OGOLEM
 

In [1]:
import os, sys
import numpy as np
from os.path    import join as opj
from scm.params import *
from scm.params import __version__ as paramsver
print(f"ParAMS Version used: {paramsver}")

INDIR = '../data'
if not os.path.exists(INDIR):
    os.makedirs(INDIR)

ParAMS Version used: 0.5.0



# Step 0: Auxiliary functions
Müller and Hartke provide the reference gradients as external files. This function adds them to the data set

In [2]:
def add_grads(path, dataset):
    for i in os.listdir(path):
        if i.endswith('gradient'):
            name = i.rstrip('.gradient')
            grads = np.loadtxt(opj(path, i), skiprows=1, usecols=(1,2,3))
            for id,at in enumerate(grads):
                for xyz,value in enumerate(at):
                    dataset.add_entry(f'forces("{name}", {id}, {xyz})', 1., sigma=0.01, reference=-value)

# Step 1: Convert from the old ReaxFF format to ParAMS

We start with the job collection:

In [3]:
jc1 = geo_to_params('../MH/optInput/geo', normal_run_settings='../MH/control')
jc2 = geo_to_params('../MH/valSet/geo',   normal_run_settings='../MH/control')

print('The following jobIDs are in *both* the training and validation sets:')
print("\n".join([i for i in jc1.keys() if i in jc2])+'\n')

The following jobIDs are in *both* the training and validation sets:
dmds
s8
dpds
dpods



Join the sets into one job collection, tell AMS that Gradients need to be computed and append the link to the original publication in the metadata:

In [4]:
jc  = jc1 + jc2

for e in jc.values():
    e.metadata['Source'] = 'https://doi.org/10.1021/acs.jctc.6b00461'
    e.settings.input.ams.properties.gradients = True 
    
jc.store(opj(INDIR, 'jobcollection.yml'))

Now convert the data sets:

In [5]:
train_set = trainset_to_params('../MH/optInput/trainset.in')
val_set  =  trainset_to_params('../MH/valSet/trainset.in')
add_grads('../MH/optInput/grads', train_set)

for ds in [train_set, val_set]:
    for e in ds:
        e.metadata['Source'] = 'https://doi.org/10.1021/acs.jctc.6b00461'

train_set.store(opj(INDIR, 'trainingset.yml'))
val_set.store(  opj(INDIR, 'validationset.yml'))

# Step 2: Calculate the Loss value for $x_0$

Our parameter interface that will be parameterized: ReaxFF

In [6]:
x0 = ReaxParams('../MH/mue2016')

In [7]:
print('Running x0 ...')
engine = x0.get_engine()
r  = jc.run(engine.settings)
fx = train_set.evaluate(r)
print(f'Training Set   f(x) = {fx:.3e}')
fx = val_set.evaluate(r)
print(f'Validation Set f(x) = {fx:.3e}\n')
print('Published training set value is 12393\n(https://doi.org/10.1021/acs.jctc.6b00461)\n')
print('A more recent publication reports a training set value of 16271\n(https://doi.org/10.1021/acs.jctc.9b00769)')

Running x0 ...
Training Set   f(x) = 1.444e+04
Validation Set f(x) = 1.479e+04

Published training set value is 12393
(https://doi.org/10.1021/acs.jctc.6b00461)

A more recent publication reports a training set value of 16271
(https://doi.org/10.1021/acs.jctc.9b00769)


# Step 3: Set active paramters and ranges
In ReaxFF parameter fitting, `ffield_bool`, `ffield_min` and `ffield_max` are commonly used formats to specify which parameters should be optimized and their respective bounds. These can be easily handled by ParAMS.

In [8]:
ff_min = ReaxParams(opj('..', 'MH', 'ffield_min'))
ff_max = ReaxParams(opj('..', 'MH', 'ffield_max'))
ff_bool= ReaxParams(opj('..', 'MH', 'ffield_bool'))

The `range` attribute allows to define box constraints for every optimizer:

In [9]:
x0.range = [(xmin,xmax) for xmin, xmax in zip(ff_min.x, ff_max.x)]

The `is_active` attribute marks individual parameters for optimization:

In [10]:
print(f"Total number of paramters in force field: {len(x0)}")
print(f"Number of paramters to be optimized before setting: {len(x0.active)}")
x0.is_active = [bool(i) for i in ff_bool.x]
print(f"Number of paramters to be optimized after setting:  {len(x0.active)}")

Total number of paramters in force field: 701
Number of paramters to be optimized before setting: 619
Number of paramters to be optimized after setting:  87


# Step 4: Start the optimization

In [13]:
o            = CMAOptimizer(popsize=15, sigma=0.5)
callbacks    = [Logger(), Timeout(60*60*8), TimePerEval(10), EarlyStopping(6000)]
optimization = Optimization(jc, [train_set, val_set], x0, o, callbacks=callbacks)
optimization.summary()

Optimization() Instance Settings:
Workdir:                           opt
JobCollection size:                458
Interface:                         ReaxParams
Active parameters:                 87
Optimizer:                         CMAOptimizer
Parallelism:                       ParallelLevels(optimizations=1, parametervectors=6, jobs=1, processes=1, threads=1)
Verbose:                           True
Callbacks:                         Logger
                                   Timeout
                                   TimePerEval
                                   EarlyStopping

Evaluators:
-----------
Name:                              trainingset (_LossEvaluator)
Loss:                              SSE
Evaluation frequency:              1

Data Set entries:                  4875
Data Set jobs:                     231
Batch size:                        None

Use PIPE:                          True
---
Name:                              validationset (_LossEvaluator)
Loss:               

The following will start the optimization (we will not show the output in the notebook):

In [17]:
# optimization.optimize()