-
Notifications
You must be signed in to change notification settings - Fork 75
/
forcefield.py
1180 lines (1037 loc) · 55.6 KB
/
forcefield.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
""" @package forcebalance.forcefield
Force field module.
In ForceBalance a 'force field' is built from a set of files containing
physical parameters. These files can be anything that enter into any
computation - our original program was quite dependent on the GROMACS
force field format, but this program is set up to allow very general
input formats.
We introduce several important concepts:
1) Adjustable parameters are allocated into a vector.
To cast the force field optimization as a math problem, we treat all
of the parameters on equal footing and write them as indices in a parameter vector.
2) A mapping from interaction type to parameter number.
Each element in the parameter vector corresponds to one or more
interaction types. Whenever we change the parameter vector and
recompute the objective function, this amounts to changing the
physical parameters in the simulations, so we print out new
force field files for external programs. In addition, when
these programs are computing the objective function we are often
in low-level subroutines that compute terms in the energy and
force. If we need an analytic derivative of the objective
function, then these subroutines need to know which index of the
parameter vector needs to be modified.
This is done by way of a hash table: for example, when we are
computing a Coulomb interaction between atom 4 and atom 5, we
can build the words 'COUL4' and 'COUL5' and look it up in the
parameter map; this gives us two numbers (say, 10 and 11)
corresponding to the eleventh and twelfth element of the
parameter vector. Then we can compute the derivatives of
the energy w/r.t. these parameters (in this case, COUL5/rij
and COUL4/rij) and increment these values in the objective function
gradient.
In custom-implemented force fields (see counterpoisematch.py)
the hash table can also be used to look up parameter values for
computation of interactions. This is probably not the fastest way
to do things, however.
3) Distinction between physical and mathematical parameters.
The optimization algorithm works in a space that is related to, but
not exactly the same as the physical parameter space. The reasons
for why we do this are:
a) Each parameter has its own physical units. On the one hand
it's not right to treat different physical units all on the same
footing, so nondimensionalization is desirable. To make matters
worse, the force field parameters can be small as 1e-8 or as
large as 1e+6 depending on the parameter type. This means the
elements of the objective function gradient / Hessian have
elements that differ from each other in size by 10+ orders of
magnitude, leading to mathematical instabilities in the optimizer.
b) The parameter space can be constrained, most notably for atomic
partial charges where we don't want to change the overall charge
on a molecule. Thus we wish to project out certain movements
in the mathematical parameters such that they don't change the physical
parameters.
c) We wish to regularize our optimization so as to avoid changing
our parameters in very insensitive directions (linear dependencies).
However, the sensitivity of the objective function to changes in the
force field depends on the physical units!
For all of these reasons, we introduce a 'transformation matrix'
which maps mathematical parameters onto physical parameters. The
diagonal elements in this matrix are rescaling factors; they take the
mathematical parameter and magnify it by this constant factor. The
off-diagonal elements correspond to rotations and other linear
transformations, and currently I just use them to project out the
'increase the net charge' direction in the physical parameter space.
Note that with regularization, these rescaling factors are equivalent
to the widths of prior distributions in a maximum likelihood framework.
Because there is such a correspondence between rescaling factors and
choosing a prior, they need to be chosen carefully. This is work in
progress. Another possibility is to sample the width of the priors from
a noninformative distribution -- the hyperprior (we can choose the
Jeffreys prior or something). This is work in progress.
Right now only GROMACS parameters are supported, but this class is extensible,
we need more modules!
@author Lee-Ping Wang
@date 04/2012
"""
import os
import sys
from re import match, sub, split
import forcebalance
from forcebalance import gmxio, qchemio, tinkerio, custom_io, openmmio, amberio, psi4io
from forcebalance.finite_difference import in_fd
from numpy import argsort, arange, array, diag, exp, eye, log, mat, mean, ndarray, ones, vstack, zeros, sin, cos, pi, sqrt
from numpy.linalg import norm
from forcebalance.nifty import col, flat, invert_svd, isint, isfloat, kb, orthogonalize, pmat2d, printcool, row, warn_press_key, printcool_dictionary
from string import count
from copy import deepcopy
try:
from lxml import etree
except: pass
import traceback
import itertools
from collections import OrderedDict, defaultdict
from forcebalance.output import getLogger
logger = getLogger(__name__)
FF_Extensions = {"itp" : "gmx",
"in" : "qchem",
"prm" : "tinker",
"gen" : "custom",
"xml" : "openmm",
"frcmod" : "frcmod",
"mol2" : "mol2",
"gbs" : "gbs",
"grid" : "grid"
}
""" Recognized force field formats. """
FF_IOModules = {"gmx": gmxio.ITP_Reader ,
"qchem": qchemio.QCIn_Reader ,
"tinker": tinkerio.Tinker_Reader ,
"custom": custom_io.Gen_Reader ,
"openmm" : openmmio.OpenMM_Reader,
"frcmod" : amberio.FrcMod_Reader,
"mol2" : amberio.Mol2_Reader,
"gbs" : psi4io.GBS_Reader,
"grid" : psi4io.Grid_Reader
}
def determine_fftype(ffname,verbose=False):
""" Determine the type of a force field file. It is possible to
specify the file type explicitly in the input file using the
syntax 'force_field.ext:type'. Otherwise this function will try
to determine the force field type by extension. """
fsplit = ffname.split('/')[-1].split(':')
fftype = None
if verbose: logger.info("Determining file type of %s ..." % fsplit[0])
if len(fsplit) == 2:
if fsplit[1] in FF_IOModules:
if verbose: logger.info("We're golden! (%s)\n" % fsplit[1])
fftype = fsplit[1]
else:
if verbose: logger.info("\x1b[91m Warning: \x1b[0m %s not in supported types (%s)!\n" % (fsplit[1],', '.join(FF_IOModules.keys())))
elif len(fsplit) == 1:
if verbose: logger.info("Guessing from extension (you may specify type with filename:type) ...")
ffname = fsplit[0]
ffext = ffname.split('.')[-1]
if ffext in FF_Extensions:
guesstype = FF_Extensions[ffext]
if guesstype in FF_IOModules:
if verbose: logger.info("guessing %s -> %s!\n" % (ffext, guesstype))
fftype = guesstype
else:
if verbose: logger.info("\x1b[91m Warning: \x1b[0m %s not in supported types (%s)!\n" % (fsplit[0],', '.join(FF_IOModules.keys())))
else:
if verbose: logger.info("\x1b[91m Warning: \x1b[0m %s not in supported extensions (%s)!\n" % (ffext,', '.join(FF_Extensions.keys())))
if fftype == None:
if verbose: logger.warning("Force field type not determined!\n")
#sys.exit(1)
return fftype
# Thanks to tos9 from #python on freenode. :)
class BackedUpDict(dict):
def __init__(self, backup_dict):
super(BackedUpDict, self).__init__()
self.backup_dict = backup_dict
def __missing__(self, key):
try:
return self.backup_dict[self['AtomType']][key]
except:
raise KeyError('The key %s does not exist as an atom attribute or as an atom type attribute!' % key)
class FF(forcebalance.BaseClass):
""" Force field class.
This class contains all methods for force field manipulation.
To create an instance of this class, an input file is required
containing the list of force field file names. Everything else
inside this class pertaining to force field generation is self-contained.
For details on force field parsing, see the detailed documentation for addff.
"""
def __init__(self, options, verbose=True):
"""Instantiation of force field class.
Many variables here are initialized to zero, but they are filled out by
methods like addff, rsmake, and mktransmat.
"""
super(FF, self).__init__(options)
#======================================#
# Options that are given by the parser #
#======================================#
## As these options proliferate, the force field class becomes less standalone.
## I need to think of a good solution here...
## The root directory of the project
self.set_option(None, None, 'root', os.getcwd())
## File names of force fields
self.set_option(options,'forcefield','fnms')
## Directory containing force fields, relative to project directory
self.set_option(options,'ffdir')
## Priors given by the user :)
self.set_option(options,'priors')
## Whether to constrain the charges.
self.set_option(options,'constrain_charge')
## Whether to constrain the charges.
self.set_option(options,'logarithmic_map')
## Switch for AMOEBA direct or mutual.
self.set_option(None, None, 'amoeba_pol', options['amoeba_polarization'].lower(), 'direct')
## Switch for rigid water molecules
self.set_option(options, 'rigid_water')
## Bypass the transformation and use physical parameters directly
self.set_option(options, 'use_pvals')
#======================================#
# Variables which are set here #
#======================================#
# A lot of these variables are filled out by calling the methods.
## The content of all force field files are stored in memory
self.ffdata = {}
self.ffdata_isxml = {}
## The mapping of interaction type -> parameter number
self.map = {}
## The listing of parameter number -> interaction types
self.plist = []
## A listing of parameter number -> atoms involved
self.patoms = []
## A list where pfields[pnum] = ['file',line,field,mult,cmd],
## basically a new way to modify force field files; when we modify the
## force field file, we go to the specific line/field in a given file
## and change the number.
self.pfields = []
## List of rescaling factors
self.rs = []
## The transformation matrix for mathematical -> physical parameters
self.tm = None
## The transpose of the transformation matrix
self.tmI = None
## Indices to exclude from optimization / Hessian inversion
self.excision = None
## The total number of parameters
self.np = 0
## Initial value of physical parameters
self.pvals0 = []
## A dictionary of force field reader classes
self.Readers = OrderedDict()
## A list of atom names (this is new, for ESP fitting)
self.atomnames = []
# Read the force fields into memory.
for fnm in self.fnms:
if verbose:
logger.info("Reading force field from file: %s\n" % fnm)
self.addff(fnm)
## WORK IN PROGRESS ##
# This is a dictionary of {'AtomType':{'Mass' : float, 'Charge' : float, 'ParticleType' : string ('A', 'S', or 'D'), 'AtomicNumber' : int}}
self.FFAtomTypes = OrderedDict()
for Reader in self.Readers.values():
for k, v in Reader.AtomTypes.items():
if k in self.FFAtomTypes:
warn_press_key('Trying to insert atomtype %s into the force field, but it is already there' % k)
self.FFAtomTypes[k] = v
# This is an ordered dictionary of {'Molecule' : [{'AtomType' : string, 'ResidueNumber' : int, 'ResidueName' : string,
# 'AtomName' : string, 'ChargeGroup' : int, 'Charge' : float}]}
# Each value in the dictionary is a list of BackedUpDictionaries.
# If we query a BackedUpDictionary and the value does not exist,
# then it will query the backup dictionary using the 'AtomType' value.
# Thus, if we look up the mass of 'HW1' or 'HW2' in the dictionary, it will
# return the mass for 'HW' in the AtomTypes dictionary.
self.FFMolecules = OrderedDict()
for Reader in self.Readers.values():
for Molecule, AtomList in Reader.Molecules.items():
for FFAtom in AtomList:
FFAtomWithDefaults = BackedUpDict(self.FFAtomTypes)
for k, v in FFAtom.items():
FFAtomWithDefaults[k] = v
self.FFMolecules.setdefault(Molecule, []).append(FFAtomWithDefaults)
# Set the initial values of parameter arrays.
## Initial value of physical parameters
self.pvals0 = array(self.pvals0)
# Prepare various components of the class for first use.
## Creates plist from map.
self.list_map()
if verbose:
## Prints the plist to screen.
bar = printcool("Starting parameter indices, physical values and IDs")
self.print_map()
logger.info(bar)
## Make the rescaling factors.
self.rsmake(printfacs=verbose)
## Make the transformation matrix.
self.mktransmat()
## Redirection dictionary (experimental).
self.redirect = {}
## Destruction dictionary (experimental).
self.linedestroy_save = []
self.parmdestroy_save = []
self.linedestroy_this = []
self.parmdestroy_this = []
## Print the optimizer options.
printcool_dictionary(self.PrintOptionDict, title="Setup for force field")
def addff(self,ffname):
""" Parse a force field file and add it to the class.
First, figure out the type of force field file. This is done
either by explicitly specifying the type using for example,
<tt> ffname force_field.xml:openmm </tt> or we figure it out
by looking at the file extension.
Next, parse the file. Currently we support two classes of
files - text and XML. The two types are treated very
differently; for XML we use the parsers in libxml (via the
python lxml module), and for text files we have our own
in-house parsing class. Within text files, there is also a
specialized GROMACS and TINKER parser as well as a generic
text parser.
The job of the parser is to determine the following things:
1) Read the user-specified selection of parameters being fitted
2) Build a mapping (dictionary) of <tt> parameter identifier -> index in parameter vector </tt>
3) Build a list of physical parameter values
4) Figure out where to replace the parameter values in the force field file when the values are changed
5) Figure out which parameters need to be repeated or sign-flipped
Generally speaking, each parameter value in the force field
file has a <tt> unique parameter identifier <tt>. The
identifier consists of three parts - the interaction type, the
parameter subtype (within that interaction type), and the
atoms involved.
--- If XML: ---
The force field file is read in using the lxml Python module. Specify
which parameter you want to fit using by adding a 'parameterize' element
to the end of the force field XML file, like so.
@code
<AmoebaVdwForce type="BUFFERED-14-7">
<Vdw class="74" sigma="0.2655" epsilon="0.056484" reduction="0.910" parameterize="sigma, epsilon, reduction" />
@endcode
In this example, the parameter identifier would look like <tt> Vdw/74/epsilon </tt>.
--- If GROMACS (.itp) or TINKER (.prm) : ---
Follow the rules in the ITP_Reader or Tinker_Reader derived
class. Read the documentation in the class documentation or
the 'feed' method to learn more. In all cases the parameter
is tagged using <tt> # PARM 3 </tt> (where # denotes a comment,
the word PARM stays the same, and 3 is the field number starting
from zero.)
--- If normal text : ---
The parameter identifier is simply built using the file name,
line number, and field. Thus, the identifier is unique but
completely noninformative (which is not ideal for our
purposes, but it works.)
--- Endif ---
@warning My program currently assumes that we are only using one MM program per job.
If we use CHARMM and GROMACS to perform simulations as part of the same TARGET, we will
get messed up. Maybe this needs to be fixed in the future, with program prefixes to
parameters like C_ , G_ .. or simply unit conversions, you get the idea.
@warning I don't think the multiplier actually works for analytic derivatives unless
the interaction calculator knows the multiplier as well. I'm sure I can make this
work in the future if necessary.
@param[in] ffname Name of the force field file
"""
fftype = determine_fftype(ffname)
ffname = ffname.split(':')[0]
# Set the Tinker PRM file, which will help for running programs like "analyze".
if fftype == "tinker":
if hasattr(self, "tinkerprm"):
warn_press_key("There should only be one TINKER parameter file")
else:
self.tinkerprm = ffname
# Set the OpenMM XML file, which will help for running OpenMM.
if fftype == "openmm":
if hasattr(self, "openmmxml"):
warn_press_key("There should only be one OpenMM XML file - confused!!")
else:
self.openmmxml = ffname
# Determine the appropriate parser from the FF_IOModules dictionary.
# If we can't figure it out, then use the base reader, it ain't so bad. :)
Reader = FF_IOModules.get(fftype, forcebalance.BaseReader)
# Open the force field using an absolute path and read its contents into memory.
absff = os.path.join(self.root,self.ffdir,ffname)
# Create an instance of the Reader.
# The reader is essentially a finite state machine that allows us to
# build the pid.
self.Readers[ffname] = Reader(ffname)
if fftype == "openmm":
## Read in an XML force field file as an _ElementTree object
self.ffdata[ffname] = etree.parse(absff)
self.ffdata_isxml[ffname] = True
# Process the file
self.addff_xml(ffname)
else:
## Read in a text force field file as a list of lines
self.ffdata[ffname] = [line.expandtabs() for line in open(absff).readlines()]
self.ffdata_isxml[ffname] = False
# Process the file
self.addff_txt(ffname, fftype)
if hasattr(self.Readers[ffname], 'atomnames'):
if len(self.atomnames) > 0:
sys.stderr.write('Found more than one force field containing atom names; skipping the second one (%s)\n' % ffname)
else:
self.atomnames += self.Readers[ffname].atomnames
def addff_txt(self, ffname, fftype):
""" Parse a text force field and create several important instance variables.
Each line is processed using the 'feed' method as implemented
in the reader class. This essentially allows us to create the
correct parameter identifier (pid), because the pid comes from
more than the current line, it also depends on the section that
we're in.
When 'PARM' or 'RPT' is encountered, we do several things:
- Build the parameter identifier and insert it into the map
- Point to the file name, line number, and field where the parameter may be modified
Additionally, when 'PARM' is encountered:
- Store the physical parameter value (this is permanent; it's the original value)
- Increment the total number of parameters
When 'RPT' is encountered we don't expand the parameter vector
because this parameter is a copy of an existing one. If the
parameter identifier is preceded by MINUS_, we chop off the
prefix but remember that the sign needs to be flipped.
"""
for ln, line in enumerate(self.ffdata[ffname]):
try:
self.Readers[ffname].feed(line)
except:
logger.warning(traceback.format_exc() + '\n')
logger.warning("The force field reader crashed when trying to read the following line:\n")
logger.warning(line + '\n')
warn_press_key("The force field parser got confused! The traceback and line in question are printed above.")
sline = self.Readers[ffname].Split(line)
if 'PARM' in sline:
pmark = (array(sline) == 'PARM').argmax() # The position of the 'PARM' word
pflds = [int(i) for i in sline[pmark+1:]] # The integers that specify the parameter word positions
for pfld in pflds:
# For each of the fields that are to be parameterized (indicated by PARM #),
# assign a parameter type to it according to the Interaction Type -> Parameter Dictionary.
pid = self.Readers[ffname].build_pid(pfld)
# Add pid into the dictionary.
# LPW: Here is a hack to allow duplicate parameter IDs.
if pid in self.map:
pid0 = pid
extranum = 0
while pid in self.map:
pid = "%s%i" % (pid0, extranum)
extranum += 1
logger.info("Encountered an duplicate parameter ID: parameter name has been changed to %s\n" % pid)
self.map[pid] = self.np
# This parameter ID has these atoms involved.
self.patoms.append([self.Readers[ffname].molatom])
# Also append pid to the parameter list
self.assign_p0(self.np,float(sline[pfld]))
self.assign_field(self.np,ffname,ln,pfld,1)
self.np += 1
if "RPT" in sline:
parse = (array(sline)=='RPT').argmax()+1 # The position of the 'RPT' word
if max(array(sline)=='/RPT') > 0:
stopparse = (array(sline)=='/RPT').argmax()
else:
stopparse = len(sline)
while parse < stopparse:
# Between RPT and /RPT, the words occur in pairs.
# First is a number corresponding to the field that contains the dependent parameter.
# Second is a string corresponding to the 'pid' that this parameter depends on.
pfld = int(sline[parse])
try:
prep = self.map[sline[parse+1].replace('MINUS_','')]
except:
sys.stderr.write("Valid parameter IDs:\n")
count = 0
for i in self.map:
sys.stderr.write("%25s" % i)
if count%3 == 2:
sys.stderr.write("\n")
count += 1
sys.stderr.write("\nOffending ID: %s\n" % sline[parse+1])
raise Exception('Parameter repetition entry in force field file is incorrect (see above)')
pid = self.Readers[ffname].build_pid(pfld)
self.map[pid] = prep
# This repeated parameter ID also has these atoms involved.
self.patoms[prep].append(self.Readers[ffname].molatom)
self.assign_field(prep,ffname,ln,pfld,"MINUS_" in sline[parse+1] and -1 or 1)
parse += 2
if "EVAL" in sline:
parse = (array(sline)=='EVAL').argmax()+1 # The position of the 'EVAL' word
if max(array(sline)=='/EVAL') > 0:
stopparse = (array(sline)=='/EVAL').argmax()
else:
stopparse = len(sline)
while parse < stopparse:
# Between EVAL and /EVAL, the words occur in pairs.
# First is a number corresponding to the field that contains the dependent parameter.
# Second is a Python command that determines how to calculate the parameter.
pfld = int(sline[parse])
evalcmd = sline[parse+1] # This string is actually Python code!!
#pid = self.Readers[ffname].build_pid(pfld)
#self.map[pid] = prep
# This repeated parameter ID also has these atoms involved.
#self.patoms[prep].append(self.Readers[ffname].molatom)
self.assign_field(None,ffname,ln,pfld,None,evalcmd)
parse += 2
def addff_xml(self, ffname):
""" Parse an XML force field file and create important instance variables.
This was modeled after addff_txt, but XML and text files are
fundamentally different, necessitating two different methods.
We begin with an _ElementTree object. We search through the tree
for the 'parameterize' and 'parameter_repeat' keywords. Each time
the keyword is encountered, we do the same four actions that
I describe in addff_txt.
It's hard to specify precisely the location in an XML file to
change a force field parameter. I can create a list of tree
elements (essentially pointers to elements within a tree), but
this method breaks down when I copy the tree because I have no
way to refer to the copied tree elements. Fortunately, lxml
gives me a way to represent a tree using a flat list, and my
XML file 'locations' are represented using the positions in
the list.
@warning The sign-flip hasn't been implemented yet. This
shouldn't matter unless your calculation contains repeated
parameters with opposite sign.
"""
fflist = list(self.ffdata[ffname].iter())
for e in self.ffdata[ffname].getroot().xpath('//@parameterize/..'):
parameters_to_optimize = sorted([i.strip() for i in e.get('parameterize').split(',')])
for p in parameters_to_optimize:
pid = self.Readers[ffname].build_pid(e, p)
self.map[pid] = self.np
self.assign_p0(self.np,float(e.get(p)))
self.assign_field(self.np,ffname,fflist.index(e),p,1)
self.np += 1
for e in self.ffdata[ffname].getroot().xpath('//@parameter_repeat/..'):
for field in e.get('parameter_repeat').split(','):
dest = self.Readers[ffname].build_pid(e, field.strip().split('=')[0])
src = field.strip().split('=')[1]
if src in self.map:
self.map[dest] = self.map[src]
else:
warn_press_key(["Warning: You wanted to copy parameter from %s to %s, " % (src, dest),
"but the source parameter does not seem to exist!"])
self.assign_field(self.map[dest],ffname,fflist.index(e),dest.split('/')[1],1)
for e in self.ffdata[ffname].getroot().xpath('//@parameter_eval/..'):
for field in e.get('parameter_eval').split(','):
dest = self.Readers[ffname].build_pid(e, field.strip().split('=')[0])
evalcmd = field.strip().split('=')[1]
self.assign_field(None,ffname,fflist.index(e),dest.split('/')[1],None,evalcmd)
def make(self,vals,use_pvals=False,printdir=None,precision=12):
""" Create a new force field using provided parameter values.
This big kahuna does a number of things:
1) Creates the physical parameters from the mathematical parameters
2) Creates force fields with physical parameters substituted in
3) Prints the force fields to the specified file.
It does NOT store the mathematical parameters in the class state
(since we can only hold one set of parameters).
@param[in] printdir The directory that the force fields are printed to; as usual
this is relative to the project root directory.
@param[in] vals Input parameters. I previously had an option where it uses
stored values in the class state, but I don't think that's a good idea anymore.
@param[in] use_pvals Switch for whether to bypass the coordinate transformation
and use physical parameters directly.
"""
if type(vals)==ndarray and vals.ndim != 1:
raise Exception('Please only pass 1-D arrays')
if len(vals) != self.np:
raise Exception('Input parameter array (%i) not the required size (%i)' % (len(vals), self.np))
if use_pvals or self.use_pvals:
logger.info("Using physical parameters directly!\r")
pvals = vals.copy().flatten()
else:
pvals = self.create_pvals(vals)
OMMFormat = "%%.%ie" % precision
def TXTFormat(number, precision):
SciNot = "%% .%ie" % precision
if abs(number) < 1000 and abs(number) > 0.001:
Decimal = "%% .%if" % precision
Num = Decimal % number
Mum = Decimal % (-1 * number)
if (float(Num) == float(Mum)):
return Decimal % abs(number)
else:
return Decimal % number
else:
Num = SciNot % number
Mum = SciNot % (-1 * number)
if (float(Num) == float(Mum)):
return SciNot % abs(number)
else:
return SciNot % number
pvals = list(pvals)
newffdata = deepcopy(self.ffdata)
# The dictionary that takes parameter names to physical values.
PARM = {i:pvals[self.map[i]] for i in self.map}
#======================================#
# Print the new force field. #
# LPW Note: Is it really reasonable #
# to restrict the length of "pfields"? #
# Perhaps "eval" will make this obso. #
#======================================#
for i in range(len(self.pfields)):
pfld_list = self.pfields[i]
for pfield in pfld_list:
fnm,ln,fld,mult,cmd = pfield
# XML force fields are easy to print.
# Our 'pointer' to where to replace the value
# is given by the position of this line in the
# iterable representation of the tree and the
# field number.
#if type(newffdata[fnm]) is etree._ElementTree:
if cmd != None:
try:
wval = eval(cmd)
except:
logger.error(traceback.format_exc() + '\n')
raise Exception("The command %s (written in the force field file) cannot be evaluated!" % cmd)
else:
wval = mult*pvals[i]
if self.ffdata_isxml[fnm]:
list(newffdata[fnm].iter())[ln].attrib[fld] = OMMFormat % (wval)
# Text force fields are a bit harder.
# Our pointer is given by the line and field number.
# We take care to preserve whitespace in the printout
# so that the new force field still has nicely formated
# columns.
else:
# Split the string into whitespace and data fields.
sline = self.Readers[fnm].Split(newffdata[fnm][ln])
whites = self.Readers[fnm].Whites(newffdata[fnm][ln])
# Align whitespaces and fields (it should go white, field, white, field)
if newffdata[fnm][ln][0] != ' ':
whites = [''] + whites
# Subtract one whitespace, unless the line begins with a minus sign.
if not match('^-',sline[fld]) and len(whites[fld]) > 1:
whites[fld] = whites[fld][:-1]
# Actually replace the field with the physical parameter value.
if precision == 12:
newrd = "% 17.12e" % (wval)
else:
newrd = TXTFormat(wval, precision)
# The new word might be longer than the old word.
# If this is the case, we can try to shave off some whitespace.
Lold = len(sline[fld])
if not match('^-',sline[fld]):
Lold += 1
Lnew = len(newrd)
if Lnew > Lold:
Shave = Lnew - Lold
if Shave < (len(whites[fld+1])+2):
whites[fld+1] = whites[fld+1][:-Shave]
sline[fld] = newrd
# Replace the line in the new force field.
newffdata[fnm][ln] = ''.join([(whites[j] if len(whites[j]) > 0 else ' ')+sline[j] for j in range(len(sline))])+'\n'
if printdir != None:
absprintdir = os.path.join(self.root,printdir)
else:
absprintdir = os.getcwd()
if not os.path.exists(absprintdir):
logger.info('Creating the directory %s to print the force field\n' % absprintdir)
os.makedirs(absprintdir)
for fnm in newffdata:
#if type(newffdata[fnm]) is etree._ElementTree:
if self.ffdata_isxml[fnm]:
with open(os.path.join(absprintdir,fnm),'w') as f: newffdata[fnm].write(f)
else:
with open(os.path.join(absprintdir,fnm),'w') as f: f.writelines(newffdata[fnm])
return pvals
def make_redirect(self,mvals):
Groups = defaultdict(list)
for p, pid in enumerate(self.plist):
if 'Exponent' not in pid or len(pid.split()) != 1:
warn_press_key("Fusion penalty currently implemented only for basis set optimizations, where parameters are like this: Exponent:Elem=H,AMom=D,Bas=0,Con=0")
Data = dict([(i.split('=')[0],i.split('=')[1]) for i in pid.split(':')[1].split(',')])
if 'Con' not in Data or Data['Con'] != '0':
warn_press_key("More than one contraction coefficient found! You should expect the unexpected")
key = Data['Elem']+'_'+Data['AMom']
Groups[key].append(p)
#pvals = self.FF.create_pvals(mvals)
#print "pvals: ", pvals
pvals = self.create_pvals(mvals)
logger.info("pvals:\n")
logger.info(str(pvals) + '\n')
Thresh = 1e-4
for gnm, pidx in Groups.items():
# The group of parameters for a particular element / angular momentum.
pvals_grp = pvals[pidx]
# The order that the parameters come in.
Order = argsort(pvals_grp)
for p in range(len(Order) - 1):
# The pointers to the parameter indices.
pi = pidx[Order[p]]
pj = pidx[Order[p+1]]
# pvals[pi] is the SMALLER parameter.
# pvals[pj] is the LARGER parameter.
dp = log(pvals[pj]) - log(pvals[pi])
if dp < Thresh:
if pi in self.redirect:
pk = self.redirect[pi]
else:
pk = pi
#if pj not in self.redirect:
#print "Redirecting parameter %i to %i" % (pj, pk)
#self.redirect[pj] = pk
#print self.redirect
def find_spacings(self):
Groups = defaultdict(list)
for p, pid in enumerate(self.plist):
if 'Exponent' not in pid or len(pid.split()) != 1:
warn_press_key("Fusion penalty currently implemented only for basis set optimizations, where parameters are like this: Exponent:Elem=H,AMom=D,Bas=0,Con=0")
Data = dict([(i.split('=')[0],i.split('=')[1]) for i in pid.split(':')[1].split(',')])
if 'Con' not in Data or Data['Con'] != '0':
warn_press_key("More than one contraction coefficient found! You should expect the unexpected")
key = Data['Elem']+'_'+Data['AMom']
Groups[key].append(p)
pvals = self.create_pvals(zeros(self.np,dtype=float))
logger.info("pvals:\n")
logger.info(str(pvals) + '\n')
spacdict = {}
for gnm, pidx in Groups.items():
# The group of parameters for a particular element / angular momentum.
pvals_grp = pvals[pidx]
# The order that the parameters come in.
Order = argsort(pvals_grp)
spacs = []
for p in range(len(Order) - 1):
# The pointers to the parameter indices.
pi = pidx[Order[p]]
pj = pidx[Order[p+1]]
# pvals[pi] is the SMALLER parameter.
# pvals[pj] is the LARGER parameter.
dp = log(pvals[pj]) - log(pvals[pi])
spacs.append(dp)
if len(spacs) > 0:
spacdict[gnm] = mean(array(spacs))
return spacdict
def create_pvals(self,mvals):
"""Converts mathematical to physical parameters.
First, mathematical parameters are rescaled and rotated by
multiplying by the transformation matrix, followed by adding
the original physical parameters.
@param[in] mvals The mathematical parameters
@return pvals The physical parameters
"""
#print "mvals = ", mvals,
for p in self.redirect:
mvals[p] = 0.0
if self.logarithmic_map:
try:
pvals = exp(mvals.flatten()) * self.pvals0
except:
logger.exception(mvals + '\n')
raise Exception('What the hell did you do?')
else:
pvals = flat(mat(self.tmI)*col(mvals)) + self.pvals0
concern= ['polarizability','epsilon','VDWT']
# Guard against certain types of parameters changing sign.
for i in range(self.np):
if any([j in self.plist[i] for j in concern]) and pvals[i] * self.pvals0[i] < 0:
#print "Parameter %s has changed sign but it's not allowed to! Setting to zero." % self.plist[i]
pvals[i] = 0.0
# Redirect parameters (for the fusion penalty function.)
for p in self.redirect:
pvals[p] = pvals[self.redirect[p]]
# if not in_fd():
# print pvals
#print "pvals = ", pvals
return pvals
def create_mvals(self,pvals):
"""Converts physical to mathematical parameters.
We create the inverse transformation matrix using SVD.
@param[in] pvals The physical parameters
@return mvals The mathematical parameters
"""
if self.logarithmic_map:
raise Exception('create_mvals has not been implemented for logarithmic_map')
mvals = flat(invert_svd(self.tmI) * col(pvals - self.pvals0))
return mvals
def rsmake(self,printfacs=True):
"""Create the rescaling factors for the coordinate transformation in parameter space.
The proper choice of rescaling factors (read: prior widths in maximum likelihood analysis)
is still a black art. This is a topic of current research.
@todo Pass in rsfactors through the input file
@param[in] printfacs List for printing out the resecaling factors
"""
typevals = {}
rsfactors = {}
rsfac_list = []
## Takes the dictionary 'BONDS':{3:'B', 4:'K'}, 'VDW':{4:'S', 5:'T'},
## and turns it into a list of term types ['BONDSB','BONDSK','VDWS','VDWT']
if any([self.Readers[i].pdict == "XML_Override" for i in self.fnms]):
termtypelist = ['/'.join([i.split('/')[0],i.split('/')[1]]) for i in self.map]
else:
termtypelist = itertools.chain(*sum([[[i+self.Readers[f].pdict[i][j] for j in self.Readers[f].pdict[i] if isint(str(j))] for i in self.Readers[f].pdict] for f in self.fnms],[]))
#termtypelist = sum([[i+self.Readers.pdict[i][j] for j in self.Readers.pdict[i] if isint(str(j))] for i in self.Readers.pdict],[])
for termtype in termtypelist:
for pid in self.map:
if termtype in pid:
typevals.setdefault(termtype, []).append(self.pvals0[self.map[pid]])
for termtype in typevals:
# The old, horrendously complicated rule
# rsfactors[termtype] = exp(mean(log(abs(array(typevals[termtype]))+(abs(array(typevals[termtype]))==0))))
# The newer, maximum rule (thanks Baba)
maxval = max(abs(array(typevals[termtype])))
# When all initial parameter values are zero, it could be a problem...
if maxval == 0:
maxval += 1
rsfactors[termtype] = maxval
rsfac_list.append(termtype)
# Physically motivated overrides
rs_override(rsfactors,termtype)
# Overrides from input file
for termtype in self.priors:
rsfac_list.append(termtype)
rsfactors[termtype] = self.priors[termtype]
# for line in os.popen("awk '/rsfactor/ {print $2,$3}' %s" % pkg.options).readlines():
# rsfactors[line.split()[0]] = float(line.split()[1])
if printfacs:
bar = printcool("Rescaling Factors (Lower Takes Precedence):",color=1)
logger.info(''.join([" %-35s : %.5e\n" % (i, rsfactors[i]) for i in rsfac_list]))
logger.info(bar)
## The array of rescaling factors
self.rs = ones(len(self.pvals0))
for pnum in range(len(self.pvals0)):
for termtype in rsfac_list:
if termtype in self.plist[pnum]:
self.rs[pnum] = rsfactors[termtype]
def mktransmat(self):
""" Create the transformation matrix to rescale and rotate the mathematical parameters.
For point charge parameters, project out perturbations that
change the total charge.
First build these:
'qmap' : Just a list of parameter indices that point to charges.
'qid' : For each parameter in the qmap, a list of the affected atoms :)
A potential target for the molecule-specific thang.
Then make this:
'qtrans2' : A transformation matrix that rotates the charge parameters.
The first row is all zeros (because it corresponds to increasing the charge on all atoms)
The other rows correspond to changing one of the parameters and decreasing all of the others
equally such that the overall charge is preserved.
'qmat2' : An identity matrix with 'qtrans2' pasted into the right place
'transmat': 'qmat2' with rows and columns scaled using self.rs
'excision': Parameter indices that need to be 'cut out' because they are irrelevant and
mess with the matrix diagonalization
@todo Only project out changes in total charge of a molecule, and perhaps generalize to
fragments of molecules or other types of parameters.
@todo The AMOEBA selection of charge depends not only on the atom type, but what that atom is bonded to.
"""
self.qmap = []
self.qid = []
self.qid2 = []
qnr = 1
concern= ['COUL','c0','charge']
qmat2 = eye(self.np,dtype=float)
def insert_mat(qtrans2, qmap):
# Write the qtrans2 block into qmat2.
x = 0
for i in range(self.np):
if i in qmap:
y = 0
for j in qmap:
qmat2[i, j] = qtrans2[x, y]
y += 1
x += 1
def build_qtrans2(tq, qid, qmap):
nq = len(qmap)
cons0 = ones((1,tq),dtype=float)
cons = zeros((cons0.shape[0], nq), dtype=float)
qtrans2 = eye(nq, dtype=float)
for i in range(cons.shape[0]):
for j in range(cons.shape[1]):
cons[i][j] = sum([cons0[i][k-1] for k in qid[j]])
cons[i] /= norm(cons[i])
for j in range(i):
cons[i] = orthogonalize(cons[i], cons[j])
qtrans2[i,:] = 0
for j in range(nq-i-1):
qtrans2[i+j+1, :] = orthogonalize(qtrans2[i+j+1, :], cons[i])
return qtrans2
# Here we build a charge constraint for each molecule.
if any(len(r.adict) > 0 for r in self.Readers.values()):
logger.info("Building charge constraints...\n")
# Build a concatenated dictionary
Adict = OrderedDict()
# This is a loop over files
for r in self.Readers.values():
# This is a loop over molecules
for k, v in r.adict.items():
Adict[k] = v
nmol = 0
for molname, molatoms in Adict.items():
mol_charge_count = zeros(self.np, dtype=float)
tq = 0
qmap = []
qid = []
for i in range(self.np):
qct = 0
qidx = []
for imol, iatoms in self.patoms[i]:
for iatom in iatoms:
if imol == molname and iatom in molatoms:
qct += 1
tq += 1
qidx.append(molatoms.index(iatom))
if any([j in self.plist[i] for j in concern]) and qct > 0:
qmap.append(i)
qid.append(qidx)
logger.info("Parameter %i occurs %i times in molecule %s in locations %s (%s)\n" % (i, qct, molname, str(qidx), self.plist[i]))
#Here is where we build the qtrans2 matrix.
if len(qmap) > 0:
qtrans2 = build_qtrans2(tq, qid, qmap)