# Compact sample of ESP fitting with constraints

## Work flow summarized
To begin with, insert implicit hydrogens with `insertHbyList.py`and perform DFT with gpaw.
The following example starts from an already optimized "all-atom" structure in "system100.traj". 

```

module load gpaw

mpirun -n 20 gpaw-python gpw_from_traj.py -c 6 sandbox/system100.traj \
    sandbox/system100.gpw 2>&1 | tee sandbox/gpw_from_traj.log

mpirun -n 20 gpaw-python esp_from_gpw.py sandbox/system100.gpw \
    sandbox/system100.vHtg.cube sandbox/system100.rho.cube \
    sandbox/system100.rho_pseudo.cube 2>&1 | tee sandbox/esp_from_gpw.log

./aa2ua_cube.py sandbox/system100.pdb sandbox/system100.lean.top \
    sandbox/system100.vHtg.cube sandbox/system100.vHtg_ua.cube \
    2>&1 | tee sandbox/aa2ua_cube.vHtg.log

./aa2ua_cube.py sandbox/system100.pdb sandbox/system100.lean.top \
    sandbox/system100.rho.cube sandbox/system100.rho_ua.cube \
    2>&1 | tee sandbox/aa2ua_cube.rho.log

./horton-esp-cost.sh --sign --esp-infile-cube sandbox/system100.vHtg_ua.cube \
    --dens-infile-cube sandbox/system100.rho_ua.cube \
    --cost-outfile-hdf5 sandbox/system100.cost_ua_neg.h5 \
    2>&1 | tee sandbox/horton-esp-cost-ua-neg.log


./fitESPconstrained.py sandbox/system100.cost_ua_neg.h5 sandbox/system100.pdb \
    sandbox/system100.lean.top sandbox/atoms_in_charge_group.csv \
    sandbox/charge_group_total_charge.csv sandbox/atoms_of_same_charge.csv \
    sandbox/fitted_point_charges.txt sandbox/fitted_point_charges.top \
    --qtot 6.0 --verbose 2>&1 | tee sandbox/esp-fit-constrained.log 
    
```

## Work flow explained
The commands summarized above are explained in the following:
It is recommendable to perform the first (and second due to high memory requirement) command as a job or an interactive session in parallel with mpirun, as displayed below. All input files are assumed to exist in subdirectory `sandbox`.

```
module load gpaw

mpirun -n 20 gpaw-python gpw_from_traj.py -c 6 sandbox/system100.traj \
    sandbox/system100.gpw 2>&1 | tee sandbox/gpw_from_traj.log

mpirun -n 20 gpaw-python esp_from_gpw.py sandbox/system100.gpw sandbox/system100.vHtg.cube \
    sandbox/system100.rho.cube sandbox/system100.rho_pseudo.cube \
    2>&1 | tee sandbox/esp_from_gpw.log
    
```

The `2>&1 | tee xyz.log` appendix logs both errors and standard output to screen and to a .log file. The `vHtg.cube` file contains the system's Hartree potential and `rho.cube` the all-electron density, both discretized on a regular grid in cartesian space.

The .cube files are converted to united-atom by removing the atom entries previously inserted by `insertHbyList.py`. Herefore, the original united-atom .pdb and .top are necessary. he `lean.top` file has all `#include` statements (and corresponding `[ molecules ]` entries, e.g. for solvent and background electrolyte ions) removed as to avoid import failure due to unlocatable files.

```
./aa2ua_cube.py sandbox/system100.pdb sandbox/system100.lean.top \
    sandbox/system100.vHtg.cube sandbox/system100.vHtg_ua.cube \
    2>&1 | tee sandbox/aa2ua_cube.vHtg.log

./aa2ua_cube.py sandbox/system100.pdb sandbox/system100.lean.top \
    sandbox/system100.rho.cube sandbox/system100.rho_ua.cube \
    2>&1 | tee sandbox/aa2ua_cube.rho.log
    
```

Next, construct cost function with Horton either by

```
module purge
module load horton/2.1.0b
horton-esp-cost.py sandbox/system100.vHtg_ua.cube sandbox/system100.cost_ua_neg.h5 \
    --pbc 000 --wdens sandbox/system100.rho_ua.cube --overwrite --sign
module purge
module load gpaw
```

or by using the following wrapper to avoid loading and unloading modules

```
./horton-esp-cost.sh --sign --esp-infile-cube sandbox/system100.vHtg_ua.cube \
    --dens-infile-cube sandbox/system100.rho_ua.cube \
    --cost-outfile-hdf5 sandbox/system100.cost_ua_neg.h5 \
    2>&1 | tee sandbox/horton-esp-cost-ua-neg.log
```
The additional `--sign` option is necessary due to non-standardized sign conventions in cube data. If not set, all charges including total charge, group charges and fitted point charges have to be treated with opposite sign in the following.

Finally, use

```
./fitESPconstrained.py sandbox/system100.cost_ua_neg.h5 sandbox/system100.pdb \
    sandbox/system100.lean.top sandbox/atoms_in_charge_group.csv \
    sandbox/charge_group_total_charge.csv sandbox/atoms_of_same_charge.csv \
    sandbox/fitted_point_charges.txt sandbox/fitted_point_charges.top \
    --qtot 6.0 --verbose 2>&1 | tee sandbox/esp-fit-constrained.log 
```
to obtain fitted point charges in a simple text format ordered by the atoms' ASE indices
within `fitted_point_charges.txt` and assigned to atom name and residue within `sandbox/fitted_point_charges.top`. Use `./fitESPconstrained.py --help` for information on the input files' formats. Alternatively, use the code snippets below to obtain point charges within python directly for further processing.

## From within Python

In [1]:
%load_ext autoreload

In [2]:
# for log output in jupyter notebook
%config Application.log_level="INFO"
import logging
import sys

FORMAT = '%(asctime)-15s:  %(message)s'
logging.basicConfig(level=logging.DEBUG, 
    format=FORMAT, filename='compactHortonFitEspConstrainedSample.log', 
                    filemode='w')

In [3]:
from fitESPconstrained import fitESPconstrained

In [4]:
help(fitESPconstrained)

Help on function fitESPconstrained in module fitESPconstrained:

fitESPconstrained(infile_pdb, infile_top, infile_cost_h5, infile_atoms_in_cg_csv, infile_cg_charges_csv, infile_atoms_of_same_charge_csv, qtot=0.0, strip_string=':SOL,CL', implicitHbondingPartners={'CD4': 1, 'CD3': 1, 'CA2': 2, 'CA3': 2, 'CB2': 2, 'CB3': 2}, debug=False, outfile_top=None, outfile_csv=None)
    Automizes the whole fitting process from importing Horton's
    cost function over reading constraints from simple text files to
    minimizing, logging and double-checking the results.
    
    Parameters
    ----------
    infile_pdb: str
        PDB file with original (united-atom) molecular structure 
    infile_top: str
        GROMACS topolgy file with original (united-atom) system.
        All #includes shoulb be removed!
    infile_cost_h5: str
        Cost function by HORTON, hdf5 format
    infile_atoms_in_cg_csv: str
        file with atom - charge group assignments in simple 
        "comma separated val

In [5]:
#q, lagrange_multiplier, info_df, cg2ase, cg2cgtype, cg2q, sym2ase
q, lagrange_multiplier, info_df, cg2ase, \
    cg2cgtype, cg2q, sym2ase = \
    fitESPconstrained(infile_pdb = 'sandbox/system100.pdb', 
                  infile_top = 'sandbox/system100.lean.top', 
                  infile_cost_h5 = 'sandbox/system100.cost_ua_neg.h5', 
                  infile_atoms_in_cg_csv = 'sandbox/atoms_in_charge_group.csv', 
                  infile_cg_charges_csv = 'sandbox/charge_group_total_charge.csv', 
                  infile_atoms_of_same_charge_csv = 'sandbox/atoms_of_same_charge.csv',
                  qtot = 6.0, strip_string=':SOL,CL', 
                  implicitHbondingPartners = {'CD4':1,'CD3':1,'CA2':2,'CA3':2,'CB2':2,'CB3':2},
                  debug=True, outfile_top='sandbox/system100.fitted.top',
                  outfile_csv='sandbox/system100.fitted.csv')

In [6]:
q

array([-0.485,  0.145,  0.339, -0.045,  0.045,  0.105,  0.112, -0.327,
       -0.235, -0.277,  0.784, -0.161,  0.787, -0.474, -0.558,  0.244,
        0.606, -1.137,  0.523,  0.44 ,  0.568,  0.787, -0.474, -0.558,
        0.244,  0.606, -1.137,  0.523,  0.44 ,  0.568, -0.045,  0.045,
       -0.045,  0.045,  0.105,  0.112, -0.327, -0.235, -0.277,  0.784,
       -0.161,  0.787, -0.474, -0.558,  0.244,  0.606, -1.137,  0.523,
        0.44 ,  0.568,  0.787, -0.474, -0.558,  0.244,  0.606, -1.137,
        0.523,  0.44 ,  0.568, -0.045,  0.045, -0.285,  0.285, -2.371,
        2.774, -0.737, -1.899,  0.407,  1.189,  0.07 , -1.918,  0.596,
        2.186, -0.299, -0.045,  0.045,  0.105,  0.112, -0.327, -0.235,
       -0.277,  0.784, -0.161,  0.787, -0.474, -0.558,  0.244,  0.606,
       -1.137,  0.523,  0.44 ,  0.568,  0.787, -0.474, -0.558,  0.244,
        0.606, -1.137,  0.523,  0.44 ,  0.568, -0.045,  0.045])

In [None]:
lagrange_multiplier

## Double-check results
following snippets are samples on how to check the results

### charge group constraints

In [8]:
info_df.iloc[cg2ase[0]] # select first charge group
# q is the fully constraine fit
# q_qtot_constrained is the fit for only the system's total charge constrained
# q_cg_qtot_constrained is the fit for charge groups and total charge constrained, but not symmetries

Unnamed: 0,atom,residue,q,q_unconstrained,q_qtot_constrained,q_cg_qtot_constrained,q_sym_qtot_constrained
0,CE1,terB,-0.484725,-0.329952,-0.324445,-0.30701,-0.079646
1,HE1,terB,0.145415,0.137402,0.128249,0.162811,0.019302
2,HE2,terB,0.33931,0.146113,0.159843,0.144199,0.16353


In [9]:
info_df.iloc[cg2ase[0]]['q'].sum() # total charge in first charge group

-1.4988010832439613e-15

In [10]:
# Double-check all charge groups
for i, cg in enumerate(cg2ase):
    print("#{:3d}, id {:3d}: charge: {:12.4e}".format(
        i, cg2cgtype[i], info_df.iloc[cg2ase[i]]['q'].sum()))

#  0, id   1: charge:  -1.4988e-15
#  1, id   2: charge:   0.0000e+00
#  2, id   3: charge:   2.7756e-17
#  3, id   4: charge:   6.1617e-15
#  4, id   5: charge:   1.0000e+00
#  5, id   6: charge:   2.9976e-15
#  6, id   7: charge:   1.0000e+00
#  7, id   8: charge:   2.5951e-15
#  8, id   2: charge:  -6.7446e-15
#  9, id   3: charge:   4.9127e-15
# 10, id   4: charge:   1.6653e-16
# 11, id   5: charge:   1.0000e+00
# 12, id   6: charge:   1.1380e-15
# 13, id   7: charge:   1.0000e+00
# 14, id   8: charge:  -4.2466e-15
# 15, id   2: charge:  -1.3670e-15
# 16, id   3: charge:   1.3878e-15
# 17, id   4: charge:  -1.3045e-15
# 18, id   5: charge:   1.0000e+00
# 19, id   6: charge:  -2.6368e-15
# 20, id   7: charge:   1.0000e+00
# 21, id   8: charge:   0.0000e+00
# 22, id   9: charge:  -4.2744e-15
# 23, id  10: charge:  -5.5511e-17


### symmetry constraints

In [11]:
sym2ase

[array([ 3, 32, 74]),
 array([ 4, 33, 75]),
 array([ 5, 34, 76]),
 array([ 6, 35, 77]),
 array([ 7, 36, 78]),
 array([ 8, 37, 79]),
 array([ 9, 38, 80]),
 array([10, 39, 81]),
 array([11, 40, 82]),
 array([12, 41, 83]),
 array([13, 42, 84]),
 array([14, 43, 85]),
 array([15, 44, 86]),
 array([16, 45, 87]),
 array([17, 46, 88]),
 array([18, 47, 89]),
 array([19, 48, 90]),
 array([20, 49, 91]),
 array([21, 50, 92]),
 array([22, 51, 93]),
 array([23, 52, 94]),
 array([24, 53, 95]),
 array([25, 54, 96]),
 array([26, 55, 97]),
 array([27, 56, 98]),
 array([28, 57, 99]),
 array([ 29,  58, 100]),
 array([ 30,  59, 101]),
 array([ 31,  60, 102]),
 array([  3,  30,  32,  59,  74, 101]),
 array([  4,  31,  33,  60,  75, 102]),
 array([12, 21, 41, 50, 83, 92]),
 array([13, 22, 42, 51, 84, 93]),
 array([14, 23, 43, 52, 85, 94]),
 array([15, 24, 44, 53, 86, 95]),
 array([16, 25, 45, 54, 87, 96]),
 array([17, 26, 46, 55, 88, 97]),
 array([18, 27, 47, 56, 89, 98]),
 array([19, 28, 48, 57, 90, 99]),
 

In [12]:
info_df.iloc[sym2ase[0]] # select first symmetry group

Unnamed: 0,atom,residue,q,q_unconstrained,q_qtot_constrained,q_cg_qtot_constrained,q_sym_qtot_constrained
3,CD1,terB,-0.044997,-0.088691,0.006398,-0.40941,-0.256854
32,CD1,OXO0,-0.044997,-0.18902,-0.371108,-0.153185,-0.256854
74,CD1,terA,-0.044997,-0.369452,-0.538505,-0.179503,-0.256854


In [10]:
info_df['q'].sum()

5.9999999999999982

In [11]:
info_df['q_unconstrained'].sum()

5.7642074023428744

In [13]:
import pprint
import numpy as np
pp = pprint.PrettyPrinter(indent=8)

In [14]:
# each charge's deviation from the group's mean value
for i, sym in enumerate(sym2ase):
    sym2names = dict(zip(sym, [ info_df['atom'].iloc[s] for s in sym ] ))
    charges = [ info_df['q'].iloc[s] for s in sym ] 
    sym2charges = dict(zip(sym, charges))
    symMean = np.mean(charges)
    sym2error = dict(zip(sym, abs(charges - symMean)))
    print("# {:3d}: {}".format(i,sym2names))
    pp.pprint(sym2error)

#   0: {3: 'CD1', 32: 'CD1', 74: 'CD1'}
{       3: 2.4077961846558082e-15,
        32: 3.920475055707584e-15,
        74: 6.3143934525555778e-15}
#   1: {4: 'HD1', 33: 'HD1', 75: 'HD1'}
{       4: 5.1070259132757201e-15,
        33: 1.2490009027033011e-16,
        75: 4.98212582300539e-15}
#   2: {5: 'CD2', 34: 'CD2', 76: 'CD2'}
{       5: 9.4368957093138306e-16,
        34: 8.4654505627668186e-16,
        76: 1.7902346272080649e-15}
#   3: {6: 'HD2', 35: 'HD2', 77: 'HD2'}
{       6: 3.5804692544161298e-15,
        35: 6.2450045135165055e-16,
        77: 4.2188474935755949e-15}
#   4: {7: 'OD1', 36: 'OD1', 78: 'OD1'}
{       7: 4.163336342344337e-15,
        36: 6.3837823915946501e-15,
        78: 2.3314683517128287e-15}
#   5: {8: 'CD3', 37: 'CD3', 79: 'CD3'}
{       8: 9.1593399531575415e-16,
        37: 2.6922908347160046e-15,
        79: 3.5804692544161298e-15}
#   6: {9: 'CD4', 38: 'CD4', 80: 'CD4'}
{       9: 3.219646771412954e-15,
        38: 3.3306690738754696e-16,
        80: 

### total charge of system based upon GPAW electron density

In [10]:
from ase.io.cube import read_cube_data
from ase.units import Bohr
import numpy as np

In [11]:
cube_data, cube_atoms = read_cube_data("sandbox/system100.rho.cube")

In [32]:
unit_cell = cube_atoms.cell.diagonal() / cube_data.shape
unit_volume = np.prod(unit_cell)
q_el = cube_data.sum()*unit_volume/Bohr**3 # integrate electron density
q_core_total = 0
for a in cube_atoms: # count positive core charges
    q_core_total += a.number

In [33]:
q_core_total

494

In [34]:
q_el

488.00318759870044

In [36]:
q_tot = q_core_total - q_el

In [39]:
q_tot # conclusion: integral over electron density correctly reproduces total charge of 6

5.9968124012995645

## Sandbox

In [61]:
unique_atoms = info_df['atom'].unique()

In [66]:
unique_atoms

array(['CE1', 'HE1', 'HE2', 'CD1', 'HD1', 'CD2', 'HD2', 'OD1', 'CD3',
       'CD4', 'CD5', 'HD3', 'CA1', 'OA1', 'OA2', 'CA2', 'CA3', 'NA1',
       'HA1', 'HA2', 'HA3', 'CB1', 'OB1', 'OB2', 'CB2', 'CB3', 'NB1',
       'HB1', 'HB2', 'HB3', 'CD6', 'HD4', 'CC1', 'HC1', 'CC2', 'CC3',
       'HC2', 'CC4', 'HC3', 'CC5', 'HC4', 'CC6', 'HC5', 'CC7', 'HC6'], dtype=object)

In [67]:
for a in unique_atoms:
    #print(info_df[ info_df['atom'] == a ])
    new_symmetry_group = info_df[ info_df['atom'] == a ]
    if not new_symmetry_group.empty:                                
        print(new_symmetry_group.index.values)

[0]
[1]
[2]
[ 3 32 74]
[ 4 33 75]
[ 5 34 76]
[ 6 35 77]
[ 7 36 78]
[ 8 37 79]
[ 9 38 80]
[10 39 81]
[11 40 82]
[12 41 83]
[13 42 84]
[14 43 85]
[15 44 86]
[16 45 87]
[17 46 88]
[18 47 89]
[19 48 90]
[20 49 91]
[21 50 92]
[22 51 93]
[23 52 94]
[24 53 95]
[25 54 96]
[26 55 97]
[27 56 98]
[28 57 99]
[ 29  58 100]
[ 30  59 101]
[ 31  60 102]
[61]
[62]
[63]
[64]
[65]
[66]
[67]
[68]
[69]
[70]
[71]
[72]
[73]


In [None]:
ase2pmd_df = pd.DataFrame(ase2pmd).T
ase2pmd_df.columns = ['atom','residue']    

In [None]:
    ### constraints fulfilled?
    logging.info( "|D({}) x({}) - q({})| = {:e}".format(A[N:,:N].shape, X[:N].shape, B[N:].shape,
        np.linalg.norm( np.dot(A[N:,:N],X[:N]) - B[N:] ) ) )

In [7]:
info_df[info_df['residue']=='terA']

Unnamed: 0,atom,residue,q,q_unconstrained,q_qtot_constrained,q_cg_qtot_constrained,q_sym_qtot_constrained
61,CC1,terA,-0.284763,-0.178245,-0.221887,-0.348905,-1.063941
62,HC1,terA,0.284763,0.203349,0.230402,0.348905,0.313576
63,CC2,terA,-2.370532,0.200466,0.264524,-0.976143,1.489747
64,CC3,terA,2.774215,-0.191765,-0.232788,1.111288,-0.757727
65,HC2,terA,-0.737148,0.141203,0.130732,-0.379753,0.234287
66,CC4,terA,-1.898657,-0.011678,0.113592,-0.677709,0.275429
67,HC3,terA,0.407138,0.051807,-0.02122,0.098914,0.041491
68,CC5,terA,1.189308,-0.173877,-0.280955,0.322367,-0.557471
69,HC4,terA,0.070273,0.203868,0.275617,0.184609,0.347592
70,CC6,terA,-1.917942,-0.097261,-0.153635,-0.809476,0.068024


In [None]:
infor