# Individual internal coordinates can be fixed in a Cartesian geometry optimization

Because large molecule geometries are normally defined in Cartesian coordinates, it might seem difficult to lock individual bond-lengths or angles so that their values cannot be modified during the optimization.  The answer is to define individual atoms using internal coordinates.  Suppose that, in a large molecule, hydrogen atom 3000 needs to be defined as being 1.1 Å from oxygen atom 2999.  Then using a graphics package, locate H3000 and O2999, and two other atoms that could be used in defining the angle and dihedral in an internal coordinate connectivity.  Suitable atoms might be the atom O2999 is connected to (e.g. C2967) and the next atom (e.g. N2960).  Of course, any atoms could be used provided their atom-number is lower than the atom being defined, i.e. here lower than 3000.  Once the four atoms (here H3000, O2999, C2967, and N2960) are identified, work out the angle and dihedral.  Use these data in constructing the internal coordinate definition for H3000.   If the H3000-O2999-C2967 angle was 120°, and the H3000-O2999-C2967-N2960 torsion was 180°, then the position of the hydrogen atom with a fixed O-H distance would be:

    H 1.1 0 120 1 180 1 2999 2976 2960

If the angle needed to be fixed, then the line would be:

    H 1.1 1 120 0 180 1 2999 2976 2960

and so on.  There is no restriction on the number of internal coordinates that can be used.  Indeed, entire hetero groups can be defined as being rigid, but free to move within a cavity.  In such a case, the first atom in the hetero group would be defined in either unconstrained internal or Cartesian coordinates, the next atom would be in internal coordinates and have its bond-length fixed, the next atom would be internal coordinates and have a frozen bond length and angle, and all the remaining atoms in the hetero group would be in internal coordinates and have all coordinates frozen.

In a small molecule, in this example: toluene, if the molecule is defined in Cartesian coordinates and a C-C bond-length is to be frozen, then a suitable data set would be:

    PM6
    Toluene
      Cartesian definition, but with one C-C distance frozen
      C        -0.00997463 +1   -0.0015231 +1   -0.0000930 +1    
      C         2.79787104 +1   -0.0096765 +1   -0.0012704 +1
      C         0.69635100 +1    1.2055923 +1    0.0410044 +1
      C         0.68930681 +1   -1.2109956 +1   -0.0416784 +1
      C         2.09215830 +1    1.2048435 +1    0.0406317 +1 
      C         2.08705695 +1   -1.2186939 +1   -0.0423481 +1  
      H        -1.09719779 +1    0.0021777 +1    0.0002893 +1  
      H         0.15466169 +1    2.1496180 +1    0.0735817 +1  
      H         0.14334405 +1   -2.1525875 +1   -0.0736970 +1  
      H         2.63556968 +1    2.1465054 +1    0.0727064 +1   
      H         2.62170344 +1   -2.1652386 +1   -0.0748682 +1  
      C         1.5 0 120 1 180 1 2 5 3
      H         4.68822161 +1    0.4932505 +1    0.8985801 +1 
      H         4.72735404 +1   -1.0061093 +1   -0.0329544 +1    
      H         4.68721477 +1    0.5478562 +1   -0.8700774 +1  
 
A GUI is essential for working out the bond-length, angle, and dihedral  to be used.

What to when the atoms to be used in the connectivity occur after the atom whose bond length or angle is to be locked
Quite often, the geometry does not allow a bond-length or angle to be defined, because the atoms that would be used in the connectivity are defined after the atom of interest.  Fortunately, there is an easy way to solve this problem: the order of occurrence of Cartesian atoms is not important, so simply move the atoms to be used in the connectivity to before the atom whose bond length or angle is to be defined.  Alternatively, move the atom whose bond length or angle is to be defined to the end of the data set.

In [32]:
from cclib.io import ccread
from ase.visualize import view
from ase import Atoms
import os
import numpy as np
# os.chdir('Resources/SN2')
mol = ccread('TS_out_test.xyz')
coords = mol.atomcoords[1]
atoms = Atoms(mol.atomnos, positions=coords)
constraints = np.array([[2,13], [7,15]])
view(atoms)
coords

array([[-7.699860e-01, -1.451496e+00, -5.140650e-01],
       [-1.303115e+00, -1.915358e+00, -1.317498e+00],
       [-1.452283e+00, -8.660250e-01,  5.000000e-01],
       [-2.522247e+00, -8.660520e-01,  4.999540e-01],
       [-9.191050e-01, -4.022070e-01,  1.303358e+00],
       [ 7.700340e-01, -1.451514e+00, -5.140970e-01],
       [ 1.303199e+00, -1.915367e+00, -1.317512e+00],
       [ 1.452283e+00, -8.660250e-01,  5.000000e-01],
       [ 9.245810e-01, -4.006030e-01,  1.306134e+00],
       [ 2.522213e+00, -8.691800e-01,  4.945360e-01],
       [-1.690000e-04,  1.993000e+00,  1.451978e+00],
       [-1.163907e+00,  1.578657e+00,  7.343140e-01],
       [ 1.163693e+00,  1.578744e+00,  7.344650e-01],
       [-6.673000e-01,  8.660250e-01, -5.000000e-01],
       [-2.243644e+00,  1.826516e+00,  1.163619e+00],
       [ 6.673000e-01,  8.660250e-01, -5.000000e-01],
       [ 2.243356e+00,  1.826684e+00,  1.163910e+00],
       [-1.367938e+00,  4.542990e-01, -1.213131e+00],
       [ 1.368162e+00,  4.54

In [27]:
set(constraints.ravel())

{2, 7, 13, 15}

In [63]:
from pprint import pprint
from reactive_atoms_classes import pt
from linalg_tools import norm, dihedral
import os
from subprocess import DEVNULL, STDOUT, check_call
from parameters import * 
from cclib.io import ccread
from ase.calculators.mopac import MOPAC

def mopac_swag(coords, atomnos, constrained_indexes, method='PM7', title='TSCoDe candidate'):
    '''
    '''
    s = [method + '\n' + title + '\n\n']
    for i, num in enumerate(atomnos):
        if i not in constrained_indexes:
            s.append(' {} {} 1 {} 1 {} 1\n'.format(pt[num].symbol, coords[i][0], coords[i][1], coords[i][2]))

    free_indexes = list(set(range(len(atomnos))) - set(constrained_indexes.ravel()))
    # print('free indexes are', free_indexes, '\n')

    for a,b in constrained_indexes:
            
            c, d = np.random.choice(free_indexes, 2)
            while c == d:
                c, d = np.random.choice(free_indexes, 2)
            # indexes of reference atoms (staring from 1? Not in constrained_indexes though!)

            dist = np.linalg.norm(coords[a] - coords[b]) # in Angstrom
            # print(f'DIST - {dist} - between {a} {b}')

            angle = np.arccos(norm(coords[a] - coords[b]) @ norm(coords[c] - coords[b]))*180/np.pi # in degrees
            # print(f'ANGLE - {angle} - between {a} {b} {c}')

            d_angle = dihedral([coords[a],
                                coords[b],
                                coords[c],
                                coords[d]])
            d_angle += 360 if d_angle < 0 else 0
            # print(f'D_ANGLE - {d_angle} - between {a} {b} {c} {d}')

            list_len = len(s)
            s.append(' {} {} 1 {} 1 {} 1\n'.format(pt[atomnos[b]].symbol, coords[b][0], coords[b][1], coords[b][2]))
            s.append(' {} {} 0 {} 1 {} 1 {} {} {}\n'.format(pt[atomnos[a]].symbol, dist, angle, d_angle, list_len, free_indexes.index(c)+1, free_indexes.index(d)+1))
            # print(f'Blocked bond between mopac ids {list_len} {list_len+1}\n')

    s = ''.join(s)
    pprint([(a-2,b) for a, b in enumerate(s.split('\n'))])
    with open('temp.mop', 'w') as f:
        f.write(s)

    # check_call(f'{MOPAC_COMMAND} temp.mop'.split(), stdout=DEVNULL, stderr=STDOUT)
    # check_call('obabel temp.out -o xyz -O temp.xyz'.split(), stdout=DEVNULL, stderr=STDOUT)

    # data = ccread('temp.out')
    # return data.atomcoords, data.atomnos


mopac_swag(coords, mol.atomnos, constraints)
os.system('mopac2016 SWAG.mop')
os.system('obabel swag.out -o sdf -O swag_out.sdf')
os.system('gview swag_out.sdf')

[(-2, 'PM7'),
 (-1, 'TSCoDe candidate'),
 (0, ''),
 (1, ' C -0.769986 1 -1.451496 1 -0.514065 1'),
 (2, ' H -1.303115 1 -1.915358 1 -1.317498 1'),
 (3, ' H -2.522247 1 -0.866052 1 0.499954 1'),
 (4, ' H -0.919105 1 -0.402207 1 1.303358 1'),
 (5, ' C 0.770034 1 -1.451514 1 -0.514097 1'),
 (6, ' H 1.303199 1 -1.915367 1 -1.317512 1'),
 (7, ' H 0.924581 1 -0.400603 1 1.306134 1'),
 (8, ' H 2.522213 1 -0.86918 1 0.494536 1'),
 (9, ' O -0.000169 1 1.993 1 1.451978 1'),
 (10, ' C -1.163907 1 1.578657 1 0.734314 1'),
 (11, ' C 1.163693 1 1.578744 1 0.734465 1'),
 (12, ' O -2.243644 1 1.826516 1 1.163619 1'),
 (13, ' O 2.243356 1 1.826684 1 1.16391 1'),
 (14, ' H -1.367938 1 0.454299 1 -1.213131 1'),
 (15, ' H 1.368162 1 0.454402 1 -1.212953 1'),
 (16, ' C -0.6673 1 0.866025 1 -0.5 1'),
 (17,
  ' C 2.1485333399295903 0 106.75585791872425 1 68.65153381798459 1 16 11 6'),
 (18, ' C 0.6673 1 0.866025 1 -0.5 1'),
 (19,
  ' C 2.1485333399295903 0 28.287536980441246 1 127.84120175804122 1 18 7 11'),

0

In [109]:
def read_mop_out(filename):
    '''
    '''
    symbols = []
    coords = []
    with open('temp.out', 'r') as f:
        while True:
            line = f.readline()
            if 'SCF FIELD WAS ACHIEVED' in line:
                    while True:
                        line = f.readline()
                        if 'CARTESIAN COORDINATES' in line:
                            line = f.readline()
                            line = f.readline()
                            while line != '\n':
                                # out.append(line)
                                splitted = line.split()
                                symbols.append(splitted[1])
                                coords.append([float(splitted[2]),
                                               float(splitted[3]),
                                               float(splitted[4])])
                                            
                                line = f.readline()
                            break
                    break
    # pprint(coords)
    # pprint(symbols)
    atomnos = [pt.symbol(i).number for i in symbols]
    return np.array(coords), np.array(atomnos)

read_mop_out('test.out')

(array([[-7.01550773e-01, -1.73867196e+00, -5.20434570e-01],
        [-1.22276892e+00, -2.27777864e+00, -1.31427702e+00],
        [-2.48835175e+00, -8.06767370e-01,  2.46368961e-01],
        [-1.05629049e+00, -6.96595069e-01,  1.34185212e+00],
        [ 6.99881441e-01, -1.73955365e+00, -5.21395260e-01],
        [ 1.21868805e+00, -2.27914799e+00, -1.31631063e+00],
        [ 1.05964369e+00, -6.99880205e-01,  1.34119502e+00],
        [ 2.48927713e+00, -8.11275925e-01,  2.43331983e-01],
        [-1.58429100e-03,  2.14216864e+00,  1.49937772e+00],
        [-1.14610632e+00,  1.72588421e+00,  8.00947157e-01],
        [ 1.14576197e+00,  1.72478298e+00,  8.05288211e-01],
        [-2.22622904e+00,  2.01132882e+00,  1.21445184e+00],
        [ 2.22377716e+00,  2.01051688e+00,  1.22386138e+00],
        [-1.33093259e+00,  8.52485683e-01, -1.25510989e+00],
        [ 1.33796997e+00,  8.52803422e-01, -1.25007509e+00],
        [-6.95369933e-01,  9.51452823e-01, -3.89364633e-01],
        [-1.41625285e+00

In [35]:
with open('SWAG.mop', 'w') as f:
        f.write('PM7\n...\n\nC 0 1 0 1 0 1\nF 0 1 0 1 -2 0 3')
os.system('mopac2016 SWAG.mop')

0

## Full list of KEYWORDS
        & - TURN NEXT LINE INTO KEYWORDS
        + - ADD ANOTHER LINE OF KEYWORDS
        0SCF - READ IN DATA, THEN STOP
        1ELECTRON- PRINT FINAL ONE-ELECTRON MATRIX
        1SCF - DO ONE SCF AND THEN STOP
        AIDER - READ IN AB INITIO DERIVATIVES
        AIGIN - GEOMETRY MUST BE IN GAUSSIAN FORMAT
        AIGOUT - IN ARC FILE, INCLUDE AB-INITIO GEOMETRY
        ANALYT - USE ANALYTICAL DERIVATIVES OF ENERGY WRT GEOMETRY
        AM1 - USE THE AM1 HAMILTONIAN
        BAR=n.n - REDUCE BAR LENGTH BY A MAXIMUM OF n.n
        BIRADICAL- SYSTEM HAS TWO UNPAIRED ELECTRONS
        BONDS - PRINT FINAL BOND-ORDER MATRIX
        C.I. - A MULTI-ELECTRON CONFIGURATION INTERACTION SPECIFIED
        Keywords
        CHARGE=n - CHARGE ON SYSTEM = n (e.g. NH4 => CHARGE=1)
        COMPFG - PRINT HEAT OF FORMATION CALCULATED IN COMPFG
        CONNOLLY - USE CONNOLLY SURFACE
        DEBUG - DEBUG OPTION TURNED ON
        DENOUT - DENSITY MATRIX OUTPUT (CHANNEL 10)
        DENSITY - PRINT FINAL DENSITY MATRIX
        DEP - GENERATE FORTRAN CODE FOR PARAMETERS FOR NEW ELEMENTS
        DEPVAR=n - TRANSLATION VECTOR IS A MULTIPLE OF BOND-LENGTH
        DERIV - PRINT PART OF WORKING IN DERIV
        DFORCE - FORCE CALCULATION SPECIFIED, ALSO PRINT FORCE MATRIX.
        DFP - USE DAVIDON-FLETCHER-POWELL METHOD TO OPTIMIZE GEOMETRIES
        DIPOLE - FIT THE ESP TO THE CALCULATED DIPOLE
        DIPX - X COMPONENT OF DIPOLE TO BE FITTED
        DIPY - Y COMPONENT OF DIPOLE TO BE FITTED
        DIPZ - Z COMPONENT OF DIPOLE TO BE FITTED
        DMAX - MAXIMUM STEPSIZE IN EIGENVECTOR FOLLOWING
        DOUBLET - DOUBLET STATE REQUIRED
        DRC - DYNAMIC REACTION COORDINATE CALCULATION
        DUMP=n - WRITE RESTART FILES EVERY n SECONDS
        ECHO - DATA ARE ECHOED BACK BEFORE CALCULATION STARTS
        EF - USE EF ROUTINE FOR MINIMUM SEARCH
        EIGINV -
        EIGS - PRINT ALL EIGENVALUES IN ITER
        ENPART - PARTITION ENERGY INTO COMPONENTS
        ESP - ELECTROSTATIC POTENTIAL CALCULATION
        ESPRST - RESTART OF ELECTROSTATIC POTENTIAL
        ESR - CALCULATE RHF UNPAIRED SPIN DENSITY
        EXCITED - OPTIMIZE FIRST EXCITED SINGLET STATE
        EXTERNAL - READ PARAMETERS OFF DISK
        FILL=n - IN RHF OPEN AND CLOSED SHELL, FORCE M.O. n
        TO BE FILLED
        FLEPO - PRINT DETAILS OF GEOMETRY OPTIMIZATION
        FMAT - PRINT DETAILS OF WORKING IN FMAT
        FOCK - PRINT LAST FOCK MATRIX
        FORCE - FORCE CALCULATION SPECIFIED
        GEO-OK - OVERRIDE INTERATOMIC DISTANCE CHECK
        GNORM=n.n- EXIT WHEN GRADIENT NORM DROPS BELOW n.n
        GRADIENTS- PRINT ALL GRADIENTS
        GRAPH - GENERATE FILE FOR GRAPHICS
        HCORE - PRINT DETAILS OF WORKING IN HCORE
        HESS=N - OPTIONS FOR CALCULATING HESSIAN MATRICES IN EF
        H-PRIO - HEAT OF FORMATION TAKES PRIORITY IN DRC
        HYPERFINE- HYPERFINE COUPLING CONSTANTS TO BE CALCULATED
        IRC - INTRINSIC REACTION COORDINATE CALCULATION
        ISOTOPE - FORCE MATRIX WRITTEN TO DISK (CHANNEL 9 )
        ITER - PRINT DETAILS OF WORKING IN ITER
        ITRY=N - SET LIMIT OF NUMBER OF SCF ITERATIONS TO N.
        IUPD - MODE OF HESSIAN UPDATE IN EIGENVECTOR FOLLOWING
        K=(N,N) - BRILLOUIN ZONE STRUCTURE TO BE CALCULATED
        KINETIC - EXCESS KINETIC ENERGY ADDED TO DRC CALCULATION
        LINMIN - PRINT DETAILS OF LINE MINIMIZATION
        LARGE - PRINT EXPANDED OUTPUT
        LET - OVERRIDE CERTAIN SAFETY CHECKS
        LOCALIZE - PRINT LOCALIZED ORBITALS
        MAX - PRINTS MAXIMUM GRID SIZE (23*23)
        MECI - PRINT DETAILS OF MECI CALCULATION
        MICROS - USE SPECIFIC MICROSTATES IN THE C.I.
        MINDO/3 - USE THE MINDO/3 HAMILTONIAN
        MMOK - USE MOLECULAR MECHANICS CORRECTION TO CONH BONDS
        MODE=N - IN EF, FOLLOW HESSIAN MODE NO. N
        MOLDAT - PRINT DETAILS OF WORKING IN MOLDAT
        MS=N - IN MECI, MAGNETIC COMPONENT OF SPIN
        MULLIK - PRINT THE MULLIKEN POPULATION ANALYSIS
        NLLSQ - MINIMIZE GRADIENTS USING NLLSQ
        NOANCI - DO NOT USE ANALYTICAL C.I. DERIVATIVES
        NODIIS - DO NOT USE DIIS GEOMETRY OPTIMIZER
        NOINTER - DO NOT PRINT INTERATOMIC DISTANCES
        NOLOG - SUPPRESS LOG FILE TRAIL, WHERE POSSIBLE
        NOMM - DO NOT USE MOLECULAR MECHANICS CORRECTION TO CONH BONDS
        NONR -
        NOTHIEL - DO NOT USE THIEL’S FSTMIN TECHNIQUE
        NSURF=N - NUMBER OF SURFACES IN AN ESP CALCULATION
        NOXYZ - DO NOT PRINT CARTESIAN COORDINATES
        NSURF - NUMBER OF LAYERS USED IN ELECTROSTATIC POTENTIAL
        OLDENS - READ INITIAL DENSITY MATRIX OFF DISK
        OLDGEO - PREVIOUS GEOMETRY TO BE USED
        OPEN - OPEN-SHELL RHF CALCULATION REQUESTED
        ORIDE -
        PARASOK - IN AM1 CALCULATIONS SOME MNDO PARAMETERS ARE TO BE USED
        PI - RESOLVE DENSITY MATRIX INTO SIGMA AND PI BONDS
        PL - MONITOR CONVERGENCE OF DENSITY MATRIX IN ITER
        PM3 - USE THE MNDO-PM3 HAMILTONIAN
        POINT=N - NUMBER OF POINTS IN REACTION PATH
        POINT1=N - NUMBER OF POINTS IN FIRST DIRECTION IN GRID CALCULATION
        POINT2=N - NUMBER OF POINTS IN SECOND DIRECTION IN GRID CALCULATION
        POLAR - CALCULATE FIRST, SECOND AND THIRD ORDER POLARIZABILITIES
        POTWRT - IN ESP, WRITE OUT ELECTROSTATIC POTENTIAL TO UNIT 21
        POWSQ - PRINT DETAILS OF WORKING IN POWSQ
        PRECISE - CRITERIA TO BE INCREASED BY 100 TIMES
        PULAY - USE PULAY’S CONVERGER TO OBTAIN A SCF
        QUARTET - QUARTET STATE REQUIRED
        QUINTET - QUINTET STATE REQUIRED
        RECALC=N - IN EF, RECALCULATE HESSIAN EVERY N STEPS
        RESTART - CALCULATION RESTARTED
        ROOT=n - ROOT n TO BE OPTIMIZED IN A C.I. CALCULATION
        ROT=n - THE SYMMETRY NUMBER OF THE SYSTEM IS n.
        SADDLE - OPTIMIZE TRANSITION STATE
        SCALE - SCALING FACTOR FOR VAN DER WAALS DISTANCE IN ESP
        SCFCRT=n - DEFAULT SCF CRITERION REPLACED BY THE VALUE SUPPLIED
        SCINCR - INCREMENT BETWEEN LAYERS IN ESP
        SETUP - EXTRA KEYWORDS TO BE READ OF SETUP FILE
        SEXTET - SEXTET STATE REQUIRED
        SHIFT=n - A DAMPING FACTOR OF n DEFINED TO START SCF
        SIGMA - MINIMIZE GRADIENTS USING SIGMA
        SINGLET - SINGLET STATE REQUIRED
        SLOPE - MULTIPLIER USED TO SCALE MNDO CHARGES
        SPIN - PRINT FINAL UHF SPIN MATRIX
        STEP - STEP SIZE IN PATH
        Keywords
        STEP1=n - STEP SIZE n FOR FIRST COORDINATE IN GRID CALCULATION
        STEP2=n - STEP SIZE n FOR SECOND COORDINATE IN GRID CALCULATION
        STO-3G - DEORTHOGONALIZE ORBITALS IN STO-3G BASIS
        SYMAVG - AVERAGE SYMMETRY EQUIVALENT ESP CHARGES
        SYMMETRY - IMPOSE SYMMETRY CONDITIONS
        T=n - A TIME OF n SECONDS REQUESTED
        THERMO - PERFORM A THERMODYNAMICS CALCULATION
        TIMES - PRINT TIMES OF VARIOUS STAGES
        T-PRIO - TIME TAKES PRIORITY IN DRC
        TRANS - THE SYSTEM IS A TRANSITION STATE
        (USED IN THERMODYNAMICS CALCULATION)
        TRIPLET - TRIPLET STATE REQUIRED
        TS - USING EF ROUTINE FOR TS SEARCH
        UHF - UNRESTRICTED HARTREE-FOCK CALCULATION
        VECTORS - PRINT FINAL EIGENVECTORS
        VELOCITY - SUPPLY THE INITIAL VELOCITY VECTOR IN A DRC CALCULATION
        WILLIAMS - USE WILLIAMS SURFACE
        X-PRIO - GEOMETRY CHANGES TAKE PRIORITY IN DRC
        XYZ - DO ALL GEOMETRIC OPERATIONS IN CARTESIAN COORDINATES.